Boost C++ Libraries: Ticket #12818: regex: badly needs fuzzing https://svn.boost.org/trac10/ticket/12818 <p> Hello, </p> <p> I've applied libFuzzer (<a class="ext-link" href="http://tutorial.libfuzzer.info"><span class="icon">​</span>http://tutorial.libfuzzer.info</a>) to regexp library and found 5 heap-buffer-overflows, stack overflow, assert failure, use of uninitialized data, SIGSEGV, infinite loop, undefined shift, invalid enum value and a bunch of memory leaks in just half an hour: </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:132:10 in char const* boost::re_detail_106300::re_skip_past_null&lt;char&gt;(char const*) </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:221:29 in <span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt; boost::re_detail_106300::re_is_set_member&lt;<span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt;, unsigned int&gt;(<span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, <span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, boost::re_detail_106300::re_set_long&lt;unsigned int&gt; const*, boost::re_detail_106300::regex_data&lt;char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt; const&amp;, bool) </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: heap-buffer-overflow /sanitizer_common_interceptors.inc:278 in <span class="underline">interceptor_strlen </span></p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:166:19 in <span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt; boost::re_detail_106300::re_is_set_member&lt;<span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt;, unsigned int&gt;(<span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, <span class="underline">gnu_cxx::</span>normal_iterator&lt;char const*, std::string&gt;, boost::re_detail_106300::re_set_long&lt;unsigned int&gt; const*, boost::re_detail_106300::regex_data&lt;char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt; const&amp;, bool) </p> <p> a.out: boost/regex/v4/perl_matcher_common.hpp:606: bool boost::re_detail_106300::perl_matcher&lt;<span class="underline">gnu_cxx::</span>normal_iterator&lt;const char *, std::basic_string&lt;char&gt; &gt;, std::allocator&lt;boost::sub_match&lt;__gnu_cxx::__normal_iterator&lt;const char *, std::basic_string&lt;char&gt; &gt; &gt; &gt;, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt;::match_backref() <a class="missing wiki">= __gnu_cxx::__normal_iterator&lt;const char *, std::basic_string&lt;char&gt; &gt;, Allocator = std::allocator&lt;boost::sub_match&lt;__gnu_cxx::__normal_iterator&lt;const char *, std::basic_string&lt;char&gt; &gt; &gt; &gt;, traits = boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt;</a>: Assertion `r.first != r.second' failed. </p> <p> SUMMARY: <a class="missing wiki">MemorySanitizer</a>: use-of-uninitialized-value boost/regex/v4/perl_matcher.hpp:166:13 in std::<span class="underline">1::</span>wrap_iter&lt;char const*&gt; boost::re_detail_106300::re_is_set_member&lt;std::__1::__wrap_iter&lt;char const*&gt;, char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt;, unsigned int&gt;(std::<span class="underline">1::</span>wrap_iter&lt;char const*&gt;, std::<span class="underline">1::</span>wrap_iter&lt;char const*&gt;, boost::re_detail_106300::re_set_long&lt;unsigned int&gt; const*, boost::re_detail_106300::regex_data&lt;char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt; const&amp;, bool) </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: heap-buffer-overflow ./boost/regex/v4/basic_regex_parser.hpp:2599:68 in boost::re_detail_106300::basic_regex_parser&lt;char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt;::parse_perl_extension() </p> <p> boost/regex/v4/basic_regex_parser.hpp:2599:68: runtime error: load of value 56794092, which is not a valid value for type 'boost::re_detail_106300::syntax_element_type' </p> <p> Direct leak of 4096 byte(s) in 1 object(s) allocated from: </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: stack-overflow ./boost/regex/v4/basic_regex_creator.hpp:1054 in boost::re_detail_106300::basic_regex_creator&lt;char, boost::regex_traits&lt;char, boost::cpp_regex_traits&lt;char&gt; &gt; &gt;::create_startmap(boost::re_detail_106300::re_syntax_base*, unsigned char*, unsigned int*, unsigned char) </p> <p> SUMMARY: <a class="missing wiki">AddressSanitizer</a>: SEGV </p> <p> ALARM: working on the last Unit for 17 seconds </p> <p> boost/regex/v4/basic_regex_parser.hpp:904:49: runtime error: shift exponent 325804978 is too large for 32-bit type 'unsigned int' </p> <p> Full reports and triggering inputs for each bug are attached. </p> <p> Test that I used is simply: </p> <p> int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { </p> <blockquote> <p> try { </p> <blockquote> <p> std::string str((char*)Data, Size); boost::regex e(str); boost::match_results&lt;std::string::const_iterator&gt; what; boost::regex_match(str, what, e, boost::match_default | boost::match_partial); </p> </blockquote> <p> } catch (const std::exception&amp;) {} return 0; </p> </blockquote> <p> } </p> <p> I would suggest to rerun the fuzzer after fixing these bugs as fuzzer was mostly choking on the existing bugs as they are easy to trigger. </p> <p> Also it can make sense to set up continuous fuzzing using <a class="ext-link" href="https://github.com/google/oss-fuzz"><span class="icon">​</span>https://github.com/google/oss-fuzz</a> which will automatically test latest code. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/12818 Trac 1.4.3 anonymous Tue, 07 Feb 2017 18:59:57 GMT attachment set https://svn.boost.org/trac10/ticket/12818 https://svn.boost.org/trac10/ticket/12818 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">regexp.crashes.zip</span> </li> </ul> Ticket anonymous Tue, 07 Feb 2017 19:39:51 GMT summary changed https://svn.boost.org/trac10/ticket/12818#comment:1 https://svn.boost.org/trac10/ticket/12818#comment:1 <ul> <li><strong>summary</strong> <span class="trac-field-old">regexp: badly needs fuzzing</span> → <span class="trac-field-new">regex: badly needs fuzzing</span> </li> </ul> Ticket anonymous Thu, 09 Feb 2017 01:46:45 GMT <link>https://svn.boost.org/trac10/ticket/12818#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:2</guid> <description> <p> I strongly suggest running static analysis over regex before trying to find bugs by fuzzing. Coverity is free for open source and IME works well to identify the source of vulnerabilities, rather than providing N crashers that exercise some M&lt;N bugs which need to be reverse-engineered with gdb. </p> </description> <category>Ticket</category> </item> <item> <author>choller@…</author> <pubDate>Thu, 09 Feb 2017 12:14:26 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12818#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:3</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/12818#comment:2" title="Comment 2">anonymous</a>: </p> <blockquote class="citation"> <p> I strongly suggest running static analysis over regex before trying to find bugs by fuzzing. Coverity is free for open source and IME works well to identify the source of vulnerabilities, rather than providing N crashers that exercise some M&lt;N bugs which need to be reverse-engineered with gdb. </p> </blockquote> <p> Working as a Security Engineer I strongly disagree. Fuzzing is far more valuable than static analysis (if testcases are provided by the fuzzer) because static analysis often causes lots of false positives and its results are often hardly actionable for developers (we have tried both with Firefox). The OP already attached a set of testcases, the only thing that is better than that is patches for fixes. You should setup some fuzzing CI as it was suggested already and oss-fuzz even offers the resources for free. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>John Maddock</dc:creator> <pubDate>Fri, 24 Feb 2017 13:16:43 GMT</pubDate> <title>status, milestone changed; resolution set https://svn.boost.org/trac10/ticket/12818#comment:4 https://svn.boost.org/trac10/ticket/12818#comment:4 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> <li><strong>milestone</strong> <span class="trac-field-old">To Be Determined</span> → <span class="trac-field-new">Boost 1.64.0</span> </li> </ul> <p> Confirmed and fixed in develop: most of the issues you identified were duplicates of a couple of core issues, but further fuzzing with a dictionary revealed many more. Thanks for the heads up on this! </p> Ticket John Maddock Sat, 13 May 2017 15:56:07 GMT <link>https://svn.boost.org/trac10/ticket/12818#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:5</guid> <description> <p> Since someone asked, and for future reference, most of the issues fixed here relate to parsing invalid regular expressions. The only runtime issues relate to recursive regular expressions: one infinite loop for some inputs, one memory leak, and one compiler/platform portability issue. The actual fixes are all labelled "de-fuzz" here: <a class="ext-link" href="https://github.com/boostorg/regex/commits/develop"><span class="icon">​</span>https://github.com/boostorg/regex/commits/develop</a> </p> </description> <category>Ticket</category> </item> <item> <dc:creator>kcc</dc:creator> <pubDate>Fri, 22 Sep 2017 04:06:29 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12818#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:6</guid> <description> <p> I've just added boost regex to oss-fuzz (using the fuzz target from this report). <a class="ext-link" href="https://github.com/google/oss-fuzz/pull/851/files"><span class="icon">​</span>https://github.com/google/oss-fuzz/pull/851/files</a> </p> <p> If someone from boost wants to extend fuzzing to more parts of boost (or improve how regex is fuzzed) -- you are welcome! </p> </description> <category>Ticket</category> </item> <item> <dc:creator>kcc</dc:creator> <pubDate>Sun, 24 Sep 2017 02:07:49 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12818#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:7</guid> <description> <p> There are at least 5 more bugs/crashes in the current trunk. Is someone interested in being automatically CC-ed to these and future bugs? </p> <pre class="wiki"> ☆ 3460 boost: Integer-overflow in boost::re_detail_NUMBER::basic_regex_parser&lt;char, boost::regex_traits&lt;char, boos ☆ 3464 boost: Integer-overflow in boost::re_detail_NUMBER::perl_matcher&lt;std::__1::__wrap_iter&lt;char const*&gt;, std::_ ☆ 3469 boost: ASSERT: jmp-&gt;type == syntax_element_jump ☆ 3471 boost: Stack-overflow in boost::re_detail_NUMBER::basic_regex_parser&lt;char, boost::regex_traits&lt;char, boos ☆ 3472 boost: Stack-overflow in boost::re_detail_NUMBER::perl_matcher&lt;std::__1::__wrap_iter&lt;char const*&gt;, std::_ </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>kcc</dc:creator> <pubDate>Sun, 24 Sep 2017 23:30:12 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12818#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12818#comment:8</guid> <description> <p> two more: </p> <pre class="wiki">3478 boost: Stack-buffer-overflow in boost::re_detail_NUMBER::perl_matcher... 3479 boost: Null-dereference READ in boost::re_detail_NUMBER::basic_regex... </pre> </description> <category>Ticket</category> </item> </channel> </rss>