Boost C++ Libraries: Ticket #12076: A couple issues matching with unicode regular expressions (word delimiters, brackets) https://svn.boost.org/trac10/ticket/12076 <p> Hi, </p> <p> The <a class="ext-link" href="https://github.com/mawww/kakoune/"><span class="icon">​</span>kakoune</a> code editor uses boost-regex in order to search through a file using a regular expression, and I've stumbled upon some issues which I think are related to how boost handles unicode codepoints. </p> <p> The syntax used is the Perl one. </p> <p> First, the <code>\b</code> word delimiter doesn't seem to work when involving unicode characters, some strings that should be matched are not e.g. "abc” 123" with the pattern "”\b". </p> <p> Secondly, using the "." pattern on strings that contain unicode seems to select bytes, and not entire codepoints e.g. "”" with the pattern "." will select two bytes. </p> <p> Finally, using bracket around unicode characters does not work, for example "[”“]. This issue is probably related to the one above. </p> <p> I have had a look at the documentation, namely the <a href="http://www.boost.org/doc/libs/1_60_0/libs/regex/doc/html/boost_regex/unicode.html">Unicode &amp; boost.regex</a> / <a href="http://www.boost.org/doc/libs/1_60_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html">Characters classes supported by Unicode regular expressions</a> pages, but I'm not sure if they are related to the issues above (please let me know if I missed something). </p> <p> Thanks. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/12076 Trac 1.4.3 anonymous Sat, 19 Mar 2016 18:14:34 GMT <link>https://svn.boost.org/trac10/ticket/12076#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12076#comment:1</guid> <description> <p> Can you please post a self contained test case so I can see exactly which code you're using? </p> <p> Also "”\b" against "abc” 123" should not match since there is no word boundary *after* the ” character. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Wed, 23 Mar 2016 10:49:16 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12076#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12076#comment:2</guid> <description> <p> I think this issue has no reason to be anymore, the behavior of boost is to be expected in the examples I gave, I just need to use ICU to get what I want. </p> <p> Thanks, closing this now. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Wed, 23 Mar 2016 10:50:47 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12076#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12076#comment:3</guid> <description> <p> Actually I can't close this issue, probably because I'm not logged in, please close the matter. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>John Maddock</dc:creator> <pubDate>Wed, 23 Mar 2016 19:32:45 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/12076#comment:4 https://svn.boost.org/trac10/ticket/12076#comment:4 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> Ticket