Context Navigation

#1201 assigned Feature Requests

Regexify the syntax highlighter

Reported by:	Joel de Guzman	Owned by:	Andreas Pokorny
Milestone:	To Be Determined	Component:	quickbook
Version:	Boost 1.34.1	Severity:	Problem
Keywords:	documentat ibd boost-doc regular expression lexer regex xpressive quickbook lexer syntax highlighting	Cc:

Description

What I'd really want is to Regex-ify the syntax highlighter and have them reconfigurable as user-supplied regex strings from configuration files. This would simplify our life a lot. There'll be only one syntax highlighter grammar and code that can accept various lexer files. Our job then is just to churn out various lexer files for different languages.

Change History (2)

comment:1 by Andreas Pokorny, 15 years ago

Keywords:	documentat ibd boost-doc regular expression lexer regex xpressive quickbook lexer syntax highlighting added
Owner:	changed from Joel de Guzman to Andreas Pokorny
Status:	new → assigned

Syntax idea so far: [sourcemode <NAME_OF_MODE> <' '-separated_LIST_OF_MODES> <rulename> [REGEX] <OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH> ]

The first line defines the start rule, every occurance of a rulename inside the regular expressions will treated as a reference to the regex attached to that rulename. Hence reccursion is possible.

because [] are very common regex characters, we mitght switch to: <rulename> "REGEX" <TEMPLATE_TO_INVOKE_WITH_MATCH>

We will probably use xpressive, because it already allows recursion and has a parser for strings. We would prefer spirit, if there was a "dynamic" spirit, since ebnfs with operator- and eps_p are easier to use than lookahead or lookbehind assertions.

The ' ' separated list of modes should allow reusing existing source mode definitions. We might prefix rules of imported regex...

Stuff to decide: 1) What if the regex defines marks, and grups submatches and so on, should every submatch become a parameter to the template. Shall we then omit the complete match from the parameter list. Or shall we always first submit the compelete match then the first to nth submatch, as a parameter...? 2) Should we implement a kind of binder syntax like in boost.bind, for the various matches? That way we would add a kind of substitution like functionalty. rule "\(#\sdefine \)[^{\n]*" [extendedn_preproc_markup _1.. macro contents are secrets]}

So "#define PI 3.14126.." would turn into a highlighted: "#define macro contents are secrets"

Development currently takes place at: http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/

comment:2 by Andreas Pokorny, 15 years ago

Because the above is so terribly formated:

Syntax idea so far:
[sourcemode <NAME_OF_MODE> <' '-separated_LIST_OF_MODES>
<rulename> [REGEX] <OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH>
]

The first line defines the start rule, every occurance of a
rulename inside the regular expressions will treated as a
reference to the regex attached to that rulename. Hence
recursion is possible.

because [] are very common regex characters, we mitght switch to:
<rulename> "REGEX" <TEMPLATE_TO_INVOKE_WITH_MATCH>

We will probably use xpressive, because it already allows recursion
and has a parser for strings. We would prefer spirit, if there was a
"dynamic" spirit, since ebnfs with operator- and eps_p are easier to use
than lookahead or lookbehind assertions.

The ' ' separated list of modes should allow reusing existing source
mode definitions. We might prefix rules of imported regex...

An untested and incomplete C++ grammar could look like that:
[sourcemode cpp
program "(comment|preprocessor|keyword|identifier|special|string|char|number|.)*"
comment "([NOT\n]*|/\*.*?\*/)" add_comment_markup
preprocessor "#\s[NOT\n]*" add_preproc_markup
keyword "(auto|bool|char|...)(?!\w)" add_keyword_markup
keyword "(auto|and|and_eq|bool|char|...)(?!\w)" add_keyword_markup
special "[\~!%&\*()+={\[}\]:;,<\.>?/\|\-]+" add_special_markup
string "[lL]?\"([NOT\"]|\")*?\"" add_string_markup
char "[lL]?'([NOT']?)'" add_char_markup
number .....
]

Stuff to decide:
1) What if the regex defines marks, and grups submatches and so on, should
every submatch become a parameter to the template. Shall we then omit the
complete match from the parameter list. Or shall we always first submit the
compelete match then the first to nth submatch, as a parameter...?
2) Should we implement a kind of binder syntax like in boost.bind, for the
various matches? That way we would add a kind of substitution like functionalty.
rule "\(#\sdefine \)[NOT\n]*" [extendedn_preproc_markup _1.. macro contents are secrets]

So "#define PI 3.14126.." would turn into a highlighted:
"#define macro contents are secrets"

Development currently takes place at:
http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/[[BR]]

Note: See TracTickets for help on using tickets.

Download in other formats: