Boost C++ Libraries: Ticket #1201: Regexify the syntax highlighter https://svn.boost.org/trac10/ticket/1201 <p> What I'd really want is to Regex-ify the syntax highlighter and have them reconfigurable as user-supplied regex strings from configuration files. This would simplify our life a lot. There'll be only one syntax highlighter grammar and code that can accept various lexer files. Our job then is just to churn out various lexer files for different languages. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/1201 Trac 1.4.3 Andreas Pokorny Sun, 26 Aug 2007 09:31:36 GMT owner, status changed; keywords set https://svn.boost.org/trac10/ticket/1201#comment:1 https://svn.boost.org/trac10/ticket/1201#comment:1 <ul> <li><strong>keywords</strong> documentat ibd boost-doc regular expression lexer regex xpressive quickbook lexer syntax highlighting added </li> <li><strong>owner</strong> changed from <span class="trac-author">Joel de Guzman</span> to <span class="trac-author">Andreas Pokorny</span> </li> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> <p> Syntax idea so far: [sourcemode &lt;NAME_OF_MODE&gt; &lt;' '-separated_LIST_OF_MODES&gt; &lt;rulename&gt; [REGEX] &lt;OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH&gt; ] </p> <p> The first line defines the start rule, every occurance of a rulename inside the regular expressions will treated as a reference to the regex attached to that rulename. Hence reccursion is possible. </p> <p> because [] are very common regex characters, we mitght switch to: &lt;rulename&gt; "REGEX" &lt;TEMPLATE_TO_INVOKE_WITH_MATCH&gt; </p> <p> We will probably use xpressive, because it already allows recursion and has a parser for strings. We would prefer spirit, if there was a "dynamic" spirit, since ebnfs with operator- and eps_p are easier to use than lookahead or lookbehind assertions. </p> <p> The ' ' separated list of modes should allow reusing existing source mode definitions. We might prefix rules of imported regex... </p> <p> An untested and incomplete C++ grammar could look like that: [sourcemode cpp program "(comment|preprocessor|keyword|identifier|special|string|char|number|.)*" comment "(<em>[<sup>\n]*|/\*.*?\*/)" add_comment_markup preprocessor "#\s[</sup>\n]*" add_preproc_markup keyword "(auto|bool|char|...)(?!\w)" add_keyword_markup keyword "(auto|and|and_eq|bool|char|...)(?!\w)" add_keyword_markup special "[\~!%<sup>&amp;\*()+={\[}\]:;,&lt;\.&gt;?/\|\-]+" add_special_markup string "[lL]?\"([</sup>\"]|\")*?\"" add_string_markup char "[lL]?'([<sup>']?)'" add_char_markup number ..... ] </sup></em></p> <p> Stuff to decide: 1) What if the regex defines marks, and grups submatches and so on, should every submatch become a parameter to the template. Shall we then omit the complete match from the parameter list. Or shall we always first submit the compelete match then the first to nth submatch, as a parameter...? 2) Should we implement a kind of binder syntax like in boost.bind, for the various matches? That way we would add a kind of substitution like functionalty. rule "\(#\sdefine \)[<sup>\n]*" [extendedn_preproc_markup _1.. macro contents are secrets] </sup></p> <p> So "#define PI 3.14126.." would turn into a highlighted: "#define macro contents are secrets" </p> <p> Development currently takes place at: <a class="ext-link" href="http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/"><span class="icon">​</span>http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/</a> </p> Ticket Andreas Pokorny Sun, 26 Aug 2007 09:37:26 GMT <link>https://svn.boost.org/trac10/ticket/1201#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1201#comment:2</guid> <description> <p> Because the above is so terribly formated:<br /> </p> <p> Syntax idea so far:<br /> [sourcemode &lt;NAME_OF_MODE&gt; &lt;' '-separated_LIST_OF_MODES&gt;<br /> &lt;rulename&gt; [REGEX] &lt;OPTIONAL_TEMPLATE_TO_INVOKE_WITH_MATCH&gt;<br /> ]<br /> <br /> The first line defines the start rule, every occurance of a <br /> rulename inside the regular expressions will treated as a <br /> reference to the regex attached to that rulename. Hence<br /> recursion is possible. <br /> <br /> because [] are very common regex characters, we mitght switch to:<br /> &lt;rulename&gt; "REGEX" &lt;TEMPLATE_TO_INVOKE_WITH_MATCH&gt;<br /> <br /> We will probably use xpressive, because it already allows recursion<br /> and has a parser for strings. We would prefer spirit, if there was a<br /> "dynamic" spirit, since ebnfs with operator- and eps_p are easier to use <br /> than lookahead or lookbehind assertions.<br /> <br /> The ' ' separated list of modes should allow reusing existing source <br /> mode definitions. We might prefix rules of imported regex... <br /> </p> <p> An untested and incomplete C++ grammar could look like that:<br /> [sourcemode cpp <br /> program "(comment|preprocessor|keyword|identifier|special|string|char|number|.)*"<br /> comment "(<em>[NOT\n]*|/\*.*?\*/)" add_comment_markup<br /> preprocessor "#\s[NOT\n]*" add_preproc_markup<br /> keyword "(auto|bool|char|...)(?!\w)" add_keyword_markup<br /> keyword "(auto|and|and_eq|bool|char|...)(?!\w)" add_keyword_markup<br /> special "[\~!%&amp;\*()+={\[}\]:;,&lt;\.&gt;?/\|\-]+" add_special_markup<br /> string "[lL]?\"([NOT\"]|\")*?\"" add_string_markup<br /> char "[lL]?'([NOT']?)'" add_char_markup<br /> number .....<br /> ]<br /> <br /> Stuff to decide:<br /> 1) What if the regex defines marks, and grups submatches and so on, should <br /> every submatch become a parameter to the template. Shall we then omit the <br /> complete match from the parameter list. Or shall we always first submit the<br /> compelete match then the first to nth submatch, as a parameter...?<br /> 2) Should we implement a kind of binder syntax like in boost.bind, for the <br /> various matches? That way we would add a kind of substitution like functionalty.<br /> rule "\(#\sdefine \)[NOT\n]*" [extendedn_preproc_markup _1.. macro contents are secrets]<br /><br /> </em></p> <p> So "#define PI 3.14126.." would turn into a highlighted:<br /> "#define macro contents are secrets"<br /><br /> </p> <p> Development currently takes place at:<br /> <a class="ext-link" href="http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/[[BR"><span class="icon">​</span>http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/[[BR</a>]] </p> </description> <category>Ticket</category> </item> </channel> </rss>