id summary reporter owner description type status milestone component version severity resolution keywords cc 12076 A couple issues matching with unicode regular expressions (word delimiters, brackets) anonymous John Maddock "Hi, The [https://github.com/mawww/kakoune/ kakoune] code editor uses boost-regex in order to search through a file using a regular expression, and I've stumbled upon some issues which I think are related to how boost handles unicode codepoints. The syntax used is the Perl one. First, the `\b` word delimiter doesn't seem to work when involving unicode characters, some strings that should be matched are not e.g. ""abc” 123"" with the pattern ""”\b"". Secondly, using the ""."" pattern on strings that contain unicode seems to select bytes, and not entire codepoints e.g. ""”"" with the pattern ""."" will select two bytes. Finally, using bracket around unicode characters does not work, for example ""[”“]. This issue is probably related to the one above. I have had a look at the documentation, namely the [http://www.boost.org/doc/libs/1_60_0/libs/regex/doc/html/boost_regex/unicode.html Unicode & boost.regex] / [http://www.boost.org/doc/libs/1_60_0/libs/regex/doc/html/boost_regex/syntax/character_classes/optional_char_class_names.html Characters classes supported by Unicode regular expressions] pages, but I'm not sure if they are related to the issues above (please let me know if I missed something). Thanks." Bugs closed To Be Determined regex Boost 1.61.0 Problem invalid