Opened 15 years ago

Last modified 15 years ago

#1201 assigned Feature Requests

Regexify the syntax highlighter

Reported by: Joel de Guzman Owned by: Andreas Pokorny
Milestone: To Be Determined Component: quickbook
Version: Boost 1.34.1 Severity: Problem
Keywords: documentat ibd boost-doc regular expression lexer regex xpressive quickbook lexer syntax highlighting Cc:

Description

What I'd really want is to Regex-ify the syntax highlighter and have them reconfigurable as user-supplied regex strings from configuration files. This would simplify our life a lot. There'll be only one syntax highlighter grammar and code that can accept various lexer files. Our job then is just to churn out various lexer files for different languages.

Change History (2)

comment:1 by Andreas Pokorny, 15 years ago

Keywords: documentat ibd boost-doc regular expression lexer regex xpressive quickbook lexer syntax highlighting added
Owner: changed from Joel de Guzman to Andreas Pokorny
Status: newassigned

Syntax idea so far: [sourcemode <NAME_OF_MODE> <' '-separated_LIST_OF_MODES> <rulename> [REGEX] <OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH> ]

The first line defines the start rule, every occurance of a rulename inside the regular expressions will treated as a reference to the regex attached to that rulename. Hence reccursion is possible.

because [] are very common regex characters, we mitght switch to: <rulename> "REGEX" <TEMPLATE_TO_INVOKE_WITH_MATCH>

We will probably use xpressive, because it already allows recursion and has a parser for strings. We would prefer spirit, if there was a "dynamic" spirit, since ebnfs with operator- and eps_p are easier to use than lookahead or lookbehind assertions.

The ' ' separated list of modes should allow reusing existing source mode definitions. We might prefix rules of imported regex...

An untested and incomplete C++ grammar could look like that: [sourcemode cpp program "(comment|preprocessor|keyword|identifier|special|string|char|number|.)*" comment "([\n]*|/\*.*?\*/)" add_comment_markup preprocessor "#\s[\n]*" add_preproc_markup keyword "(auto|bool|char|...)(?!\w)" add_keyword_markup keyword "(auto|and|and_eq|bool|char|...)(?!\w)" add_keyword_markup special "[\~!%&\*()+={\[}\]:;,<\.>?/\|\-]+" add_special_markup string "[lL]?\"([\"]|\")*?\"" add_string_markup char "[lL]?'([']?)'" add_char_markup number ..... ]

Stuff to decide: 1) What if the regex defines marks, and grups submatches and so on, should every submatch become a parameter to the template. Shall we then omit the complete match from the parameter list. Or shall we always first submit the compelete match then the first to nth submatch, as a parameter...? 2) Should we implement a kind of binder syntax like in boost.bind, for the various matches? That way we would add a kind of substitution like functionalty. rule "\(#\sdefine \)[\n]*" [extendedn_preproc_markup _1.. macro contents are secrets]

So "#define PI 3.14126.." would turn into a highlighted: "#define macro contents are secrets"

Development currently takes place at: http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/

comment:2 by Andreas Pokorny, 15 years ago

Because the above is so terribly formated:

Syntax idea so far:
[sourcemode <NAME_OF_MODE> <' '-separated_LIST_OF_MODES>
<rulename> [REGEX] <OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH>
]

The first line defines the start rule, every occurance of a
rulename inside the regular expressions will treated as a
reference to the regex attached to that rulename. Hence
recursion is possible.

because [] are very common regex characters, we mitght switch to:
<rulename> "REGEX" <TEMPLATE_TO_INVOKE_WITH_MATCH>

We will probably use xpressive, because it already allows recursion
and has a parser for strings. We would prefer spirit, if there was a
"dynamic" spirit, since ebnfs with operator- and eps_p are easier to use
than lookahead or lookbehind assertions.

The ' ' separated list of modes should allow reusing existing source
mode definitions. We might prefix rules of imported regex...

An untested and incomplete C++ grammar could look like that:
[sourcemode cpp
program "(comment|preprocessor|keyword|identifier|special|string|char|number|.)*"
comment "([NOT\n]*|/\*.*?\*/)" add_comment_markup
preprocessor "#\s[NOT\n]*" add_preproc_markup
keyword "(auto|bool|char|...)(?!\w)" add_keyword_markup
keyword "(auto|and|and_eq|bool|char|...)(?!\w)" add_keyword_markup
special "[\~!%&\*()+={\[}\]:;,<\.>?/\|\-]+" add_special_markup
string "[lL]?\"([NOT\"]|\")*?\"" add_string_markup
char "[lL]?'([NOT']?)'" add_char_markup
number .....
]

Stuff to decide:
1) What if the regex defines marks, and grups submatches and so on, should
every submatch become a parameter to the template. Shall we then omit the
complete match from the parameter list. Or shall we always first submit the
compelete match then the first to nth submatch, as a parameter...?
2) Should we implement a kind of binder syntax like in boost.bind, for the
various matches? That way we would add a kind of substitution like functionalty.
rule "\(#\sdefine \)[NOT\n]*" [extendedn_preproc_markup _1.. macro contents are secrets]

So "#define PI 3.14126.." would turn into a highlighted:
"#define macro contents are secrets"

Development currently takes place at:
http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/[[BR]]

Note: See TracTickets for help on using tickets.