Boost C++ Libraries: Ticket #7744: make_u32regex() performs insufficient UTF-8 validation https://svn.boost.org/trac10/ticket/7744 <p> The program below shows a segfault for regular expression ".*\xf6.*". AFAIK the maximum value allowed as leading byte for 4-byte sequences is 0xF4. I would expect an exception. </p> <p> Regular expression ".*\xe4.*" is created without exception. However 0xE4 starts a 3-byte character and no trailing bytes are present. I would expect an exception here too. </p> <p> We use Boost 1.52.0 together with ICU 50.1. The behavior is the same in Linux and Windows. </p> <pre class="wiki">#include &lt;boost/regex/icu.hpp&gt; int main(void) { // this line does not throw an exception although this is not valid UTF-8 boost::u32regex(boost::make_u32regex(".*\xe4.*")); // this line segfaults boost::u32regex(boost::make_u32regex(".*\xf6.*")); return 0; } </pre> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/7744 Trac 1.4.3 John Maddock Wed, 28 Nov 2012 18:28:40 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/7744#comment:1 https://svn.boost.org/trac10/ticket/7744#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> Fixed in Trunk rev <a class="missing ticket">#81614</a> </p> Ticket