Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#12818 closed Bugs (fixed)

regex: badly needs fuzzing

Reported by: dvyukov@… Owned by: John Maddock
Milestone: Boost 1.64.0 Component: regex
Version: Boost 1.63.0 Severity: Problem
Keywords: Cc: kcc@…, aizatsky@…, mmoroz@…

Description

Hello,

I've applied libFuzzer (http://tutorial.libfuzzer.info) to regexp library and found 5 heap-buffer-overflows, stack overflow, assert failure, use of uninitialized data, SIGSEGV, infinite loop, undefined shift, invalid enum value and a bunch of memory leaks in just half an hour:

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:132:10 in char const* boost::re_detail_106300::re_skip_past_null<char>(char const*)

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:221:29 in gnu_cxx::normal_iterator<char const*, std::string> boost::re_detail_106300::re_is_set_member<gnu_cxx::normal_iterator<char const*, std::string>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(gnu_cxx::normal_iterator<char const*, std::string>, gnu_cxx::normal_iterator<char const*, std::string>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

SUMMARY: AddressSanitizer: heap-buffer-overflow /sanitizer_common_interceptors.inc:278 in interceptor_strlen

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:166:19 in gnu_cxx::normal_iterator<char const*, std::string> boost::re_detail_106300::re_is_set_member<gnu_cxx::normal_iterator<char const*, std::string>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(gnu_cxx::normal_iterator<char const*, std::string>, gnu_cxx::normal_iterator<char const*, std::string>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

a.out: boost/regex/v4/perl_matcher_common.hpp:606: bool boost::re_detail_106300::perl_matcher<gnu_cxx::normal_iterator<const char *, std::basic_string<char> >, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> > > >, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::match_backref() = __gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, Allocator = std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> > > >, traits = boost::regex_traits<char, boost::cpp_regex_traits<char> >: Assertion `r.first != r.second' failed.

SUMMARY: MemorySanitizer: use-of-uninitialized-value boost/regex/v4/perl_matcher.hpp:166:13 in std::1::wrap_iter<char const*> boost::re_detail_106300::re_is_set_member<std::__1::__wrap_iter<char const*>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(std::1::wrap_iter<char const*>, std::1::wrap_iter<char const*>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

SUMMARY: AddressSanitizer: heap-buffer-overflow ./boost/regex/v4/basic_regex_parser.hpp:2599:68 in boost::re_detail_106300::basic_regex_parser<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::parse_perl_extension()

boost/regex/v4/basic_regex_parser.hpp:2599:68: runtime error: load of value 56794092, which is not a valid value for type 'boost::re_detail_106300::syntax_element_type'

Direct leak of 4096 byte(s) in 1 object(s) allocated from:

SUMMARY: AddressSanitizer: stack-overflow ./boost/regex/v4/basic_regex_creator.hpp:1054 in boost::re_detail_106300::basic_regex_creator<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::create_startmap(boost::re_detail_106300::re_syntax_base*, unsigned char*, unsigned int*, unsigned char)

SUMMARY: AddressSanitizer: SEGV

ALARM: working on the last Unit for 17 seconds

boost/regex/v4/basic_regex_parser.hpp:904:49: runtime error: shift exponent 325804978 is too large for 32-bit type 'unsigned int'

Full reports and triggering inputs for each bug are attached.

Test that I used is simply:

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {

try {

std::string str((char*)Data, Size); boost::regex e(str); boost::match_results<std::string::const_iterator> what; boost::regex_match(str, what, e, boost::match_default | boost::match_partial);

} catch (const std::exception&) {} return 0;

}

I would suggest to rerun the fuzzer after fixing these bugs as fuzzer was mostly choking on the existing bugs as they are easy to trigger.

Also it can make sense to set up continuous fuzzing using https://github.com/google/oss-fuzz which will automatically test latest code.

Attachments (1)

regexp.crashes.zip (17.9 KB ) - added by anonymous 6 years ago.

Download all attachments as: .zip

Change History (9)

by anonymous, 6 years ago

Attachment: regexp.crashes.zip added

comment:1 by anonymous, 6 years ago

Summary: regexp: badly needs fuzzingregex: badly needs fuzzing

comment:2 by anonymous, 6 years ago

I strongly suggest running static analysis over regex before trying to find bugs by fuzzing. Coverity is free for open source and IME works well to identify the source of vulnerabilities, rather than providing N crashers that exercise some M<N bugs which need to be reverse-engineered with gdb.

in reply to:  2 comment:3 by choller@…, 6 years ago

Replying to anonymous:

I strongly suggest running static analysis over regex before trying to find bugs by fuzzing. Coverity is free for open source and IME works well to identify the source of vulnerabilities, rather than providing N crashers that exercise some M<N bugs which need to be reverse-engineered with gdb.

Working as a Security Engineer I strongly disagree. Fuzzing is far more valuable than static analysis (if testcases are provided by the fuzzer) because static analysis often causes lots of false positives and its results are often hardly actionable for developers (we have tried both with Firefox). The OP already attached a set of testcases, the only thing that is better than that is patches for fixes. You should setup some fuzzing CI as it was suggested already and oss-fuzz even offers the resources for free.

comment:4 by John Maddock, 6 years ago

Milestone: To Be DeterminedBoost 1.64.0
Resolution: fixed
Status: newclosed

Confirmed and fixed in develop: most of the issues you identified were duplicates of a couple of core issues, but further fuzzing with a dictionary revealed many more. Thanks for the heads up on this!

comment:5 by John Maddock, 5 years ago

Since someone asked, and for future reference, most of the issues fixed here relate to parsing invalid regular expressions. The only runtime issues relate to recursive regular expressions: one infinite loop for some inputs, one memory leak, and one compiler/platform portability issue. The actual fixes are all labelled "de-fuzz" here: https://github.com/boostorg/regex/commits/develop

comment:6 by kcc, 5 years ago

I've just added boost regex to oss-fuzz (using the fuzz target from this report). https://github.com/google/oss-fuzz/pull/851/files

If someone from boost wants to extend fuzzing to more parts of boost (or improve how regex is fuzzed) -- you are welcome!

comment:7 by kcc, 5 years ago

There are at least 5 more bugs/crashes in the current trunk. Is someone interested in being automatically CC-ed to these and future bugs?

 ☆	3460	boost: Integer-overflow in boost::re_detail_NUMBER::basic_regex_parser<char, boost::regex_traits<char, boos 
 ☆	3464	boost: Integer-overflow in boost::re_detail_NUMBER::perl_matcher<std::__1::__wrap_iter<char const*>, std::_ 
 ☆	3469	boost: ASSERT: jmp->type == syntax_element_jump 
 ☆	3471	boost: Stack-overflow in boost::re_detail_NUMBER::basic_regex_parser<char, boost::regex_traits<char, boos 
 ☆	3472	boost: Stack-overflow in boost::re_detail_NUMBER::perl_matcher<std::__1::__wrap_iter<char const*>, std::_ 

comment:8 by kcc, 5 years ago

two more:

3478      boost: Stack-buffer-overflow in boost::re_detail_NUMBER::perl_matcher...
3479      boost: Null-dereference READ in boost::re_detail_NUMBER::basic_regex...
Note: See TracTickets for help on using tickets.