Boost C++ Libraries: Ticket #7606: u32regex causes bus error https://svn.boost.org/trac10/ticket/7606 <p> The Unicode regular expression </p> <blockquote> <p> boost::make_u32regex ("[pq]<br />.<br />.[xy]"); </p> </blockquote> <p> causes a Bus Error. The same r.e. as a boost::regex is fine. </p> <p> The r.e. "[pq]<a class="new ticket" href="https://svn.boost.org/trac10/ticket/7606" title="#7606: Bugs: u32regex causes bus error (new)">.</a><a class="new ticket" href="https://svn.boost.org/trac10/ticket/7606" title="#7606: Bugs: u32regex causes bus error (new)">.</a>[xy]" seems to be okay, so it looks like the repeated "<br />." is at least part of the problem. </p> <p> The following program Bus Errors on Solaris 10 with gcc 4.6.1 and Boost 1.51. (and all previous versions of Boost as far as I can tell.) </p> <hr /> <p> # include &lt;boost/regex/icu.hpp&gt; </p> <p> int main (int, char<strong>) { </strong></p> <blockquote> <p> const boost::u32regex re = boost::make_u32regex ("[pq]<br />.<br />.[xy]"); return 0; </p> </blockquote> <p> } </p> <hr /> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/7606 Trac 1.4.3 a.sanders@… Tue, 30 Oct 2012 11:46:24 GMT attachment set https://svn.boost.org/trac10/ticket/7606 https://svn.boost.org/trac10/ticket/7606 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regex.cpp</span> </li> </ul> Ticket anonymous Tue, 30 Oct 2012 11:47:08 GMT <link>https://svn.boost.org/trac10/ticket/7606#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:1</guid> <description> <p> Wiki formatting made a mess of the quoted program so have attached it. </p> </description> <category>Ticket</category> </item> <item> <author>a.sanders@…</author> <pubDate>Thu, 08 Nov 2012 10:04:05 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7606 https://svn.boost.org/trac10/ticket/7606 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regex-crash.cpp</span> </li> </ul> <p> Causes Bus Error when run </p> Ticket a.sanders@… Thu, 08 Nov 2012 10:06:26 GMT <link>https://svn.boost.org/trac10/ticket/7606#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:2</guid> <description> <p> Just noticed I attached the wrong version of the file before (a version that didn't cause a Bus Error.) So I have now attached the version that does cause the Bus Error. At least you now have both versions. </p> <p> Apologies for any confusion. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>John Maddock</dc:creator> <pubDate>Wed, 28 Nov 2012 13:32:51 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:3</guid> <description> <p> I'm unable to reproduce either on Win32 or ubuntu Linux with current SVN Trunk and ICU 49. Are you able to debug locally? </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Wed, 28 Nov 2012 13:58:54 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:4</guid> <description> <p> Probably. What are you after? A stack trace? Anything else? </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Wed, 28 Nov 2012 14:20:52 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:5</guid> <description> <p> Built a debug version of libboost_regex. I'll attach a stack trace from gdb. If you'd like me to investigate further please give me some pointers as to what to look for! </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Wed, 28 Nov 2012 14:24:15 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7606 https://svn.boost.org/trac10/ticket/7606 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regex gdb trace.txt</span> </li> </ul> <p> gdb stack trace </p> Ticket John Maddock Wed, 28 Nov 2012 18:22:41 GMT <link>https://svn.boost.org/trac10/ticket/7606#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:6</guid> <description> <p> Thanks for the stack trace, unfortunately that makes even less sense now :-( </p> <p> Can you set a breakpoint (basic_regex.cpp:399) inside: </p> <pre class="wiki"> template &lt;class InputIterator&gt; basic_regex(InputIterator arg_first, InputIterator arg_last, flag_type f = regex_constants::normal) { typedef typename traits::string_type seq_type; seq_type a(arg_first, arg_last); if(a.size()) assign(static_cast&lt;const charT*&gt;(&amp;*a.begin()), static_cast&lt;const charT*&gt;(&amp;*a.begin() + a.size()), f); else assign(static_cast&lt;const charT*&gt;(0), static_cast&lt;const charT*&gt;(0), f); } </pre><p> What are the contents of "a" after construction? </p> <p> Any chance that your code is compiled using a compiler code page that results in the input string not being valid ASCII/UTF8? </p> <p> Thanks, John. </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Fri, 30 Nov 2012 16:14:34 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:7</guid> <description> <p> Here you are. Not sure that this looks helpful. At the "if" statement following the "a" construction took the true branch. </p> <p> I'm not setting any compiler code page anywhere. </p> <pre class="wiki">Breakpoint 2 at 0x15738: file /export/home/ashley/src/boost_1_51_0/boost/regex/v4/basic_regex.hpp, line 399. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /export/home/ashley/tmp/regex [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [Switching to Thread 1 (LWP 1)] Breakpoint 2, boost::basic_regex&lt;int, boost::icu_regex_traits&gt;::basic_regex&lt;boost::u8_to_u32_iterator&lt;char const*, int&gt; &gt; ( this=0xffbffab0, arg_first=..., arg_last=..., f=0) at /export/home/ashley/src/boost_1_51_0/boost/regex/v4/basic_regex.hpp:400 400 { (gdb) list 395 assign(p, f); 396 } 397 398 template &lt;class InputIterator&gt; 399 basic_regex(InputIterator arg_first, InputIterator arg_last, flag_type f = regex_constants::normal) 400 { 401 typedef typename traits::string_type seq_type; 402 seq_type a(arg_first, arg_last); 403 if(a.size()) 404 assign(static_cast&lt;const charT*&gt;(&amp;*a.begin()), static_cast&lt;const charT*&gt;(&amp;*a.begin() + a.size()), f); (gdb) n 402 seq_type a(arg_first, arg_last); (gdb) n 403 if(a.size()) (gdb) print a $1 = {&lt;std::_Vector_base&lt;int, std::allocator&lt;int&gt; &gt;&gt; = { _M_impl = {&lt;std::allocator&lt;int&gt;&gt; = {&lt;__gnu_cxx::new_allocator&lt;int&gt;&gt; = {&lt;No data fields&gt;}, &lt;No data fields&gt;}, _M_start = 0x27cd0, _M_finish = 0x27d00, _M_end_of_storage = 0x27d00}}, &lt;No data fields&gt;} </pre> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Fri, 30 Nov 2012 16:19:08 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:8</guid> <description> <p> What I meant to say was the "if(a.size())" evaluated true. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Fri, 30 Nov 2012 16:39:29 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:9</guid> <description> <p> You're right, doesn't really help :-( </p> <p> Try: </p> <pre class="wiki">#include &lt;boost/regex/icu.hpp&gt; template &lt;class C&gt; void printout(const C&amp; c) { for(unsigned i = 0; i &lt; c.size(); ++i) std::cout &lt;&lt; std::hex &lt;&lt; (int)c[i] &lt;&lt; " "; std::cout &lt;&lt; std::endl; } int main() { using namespace boost; typedef u32regex::traits_type::string_type st; typedef boost::u8_to_u32_iterator&lt;std::string::const_iterator, UChar32&gt; conv_type; const std::string p = "[pq]\\.\\.[xy]"; st t(conv_type(p.begin(), p.begin(), p.end()), conv_type(p.end(), p.begin(), p.end())); printout(p); printout(t); return 0; } </pre><p> Which should output: </p> <pre class="wiki">5b 70 71 5d 5c 2e 5c 2e 5b 78 79 5d 5b 70 71 5d 5c 2e 5c 2e 5b 78 79 5d </pre><p> Thanks! John. </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Fri, 30 Nov 2012 16:48:40 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:10</guid> <description> <p> It does indeed output </p> <pre class="wiki">5b 70 71 5d 5c 2e 5c 2e 5b 78 79 5d 5b 70 71 5d 5c 2e 5c 2e 5b 78 79 5d </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Fri, 30 Nov 2012 17:30:52 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:11 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:11</guid> <description> <p> Which leaves me stumped again.... what does Valgrind say? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Fri, 30 Nov 2012 17:56:29 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:12 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:12</guid> <description> <p> Some supplementary questions before we get too involved: </p> <p> 1) Is </p> <pre class="wiki">regex e("[pq]\\.\\.[xy]"); </pre><p> OK? </p> <p> 2) Is: </p> <pre class="wiki">wregex we(L"[pq]\\.\\.[xy]"); </pre><p> OK? </p> <p> 3) Are their multiple versions of ICU installed on this system? Any chance there's a mismatch between the headers included and libraries loaded, or between the version used when Boost was built, and the one used by the test program? </p> <p> 4) Do the regex tests run OK? To run, cd into libs/regex/test and do a "bjam toolset=sun". Assuming ICU is installed in the usual location, you should see a message at the start to say it's being used/tested. </p> <p> Thanks again! John. </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Wed, 12 Dec 2012 11:32:47 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7606#comment:13 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:13</guid> <description> <p> Apologies for the delay in doing the stuff you asked for. Life and work get in the way... </p> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/7606#comment:12" title="Comment 12">anonymous</a>: </p> <blockquote class="citation"> <p> Some supplementary questions before we get too involved: </p> <p> 1) Is </p> <pre class="wiki">regex e("[pq]\\.\\.[xy]"); </pre></blockquote> <p> Compiles and runs okay. </p> <blockquote class="citation"> <p> 2) Is: </p> <pre class="wiki">wregex we(L"[pq]\\.\\.[xy]"); </pre></blockquote> <p> Also compiles and runs okay. </p> <blockquote class="citation"> <p> 3) Are their multiple versions of ICU installed on this system? Any chance there's a mismatch between the headers included and libraries loaded, or between the version used when Boost was built, and the one used by the test program? </p> </blockquote> <p> There are multiple version of the .so files, but as far as I can tell there is only one set of header files. I don't think this should be a problem. </p> <blockquote class="citation"> <p> 4) Do the regex tests run OK? To run, cd into libs/regex/test and do a "bjam toolset=sun". Assuming ICU is installed in the usual location, you should see a message at the start to say it's being used/tested. </p> </blockquote> <p> I'm using gcc to compile so I did "bjam toolset=gcc". I'll attach the output separately. There were errors but they are a bit hard to spot from the warnings and errors spat out by the compiler. I'll attach two files. The first the output from running bjam the first time (rather a large file) and the second file is from running bjam again -- less output which hopefully makes it easier to spot the bus error from one of the tests. </p> <p> Ashley. </p> </description> <category>Ticket</category> </item> <item> <author>Ashley Sanders <a.sanders@…></author> <pubDate>Wed, 12 Dec 2012 11:36:24 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7606 https://svn.boost.org/trac10/ticket/7606 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regex-gcc-test.txt.bz2</span> </li> </ul> <p> Output from first run of bjam toolset=gcc </p> Ticket Ashley Sanders <a.sanders@…> Wed, 12 Dec 2012 11:37:06 GMT attachment set https://svn.boost.org/trac10/ticket/7606 https://svn.boost.org/trac10/ticket/7606 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regex-gcc-test-2.txt</span> </li> </ul> <p> Output from second run of bjam toolset=gcc </p> Ticket John Maddock Thu, 20 Dec 2012 16:50:53 GMT <link>https://svn.boost.org/trac10/ticket/7606#comment:14 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7606#comment:14</guid> <description> <p> My turn to apologize for the delay - I blame Christmas! :) </p> <p> Thanks for running the tests, they show the same issue as you reported in your test case, I don't understand why it would work for wregex but not u32regex though :( </p> <p> There must be some memory corruption/overrun going on, but it's going to be hard to diagnose by email! Is Valgrind available for that platform? If so it's output might help a lot, otherwise I'll have to write a special instrumented version for you to test with I guess. </p> <p> Thanks, John. </p> </description> <category>Ticket</category> </item> </channel> </rss>