Opened 13 years ago

Closed 13 years ago

#3299 closed Bugs (wontfix)

boost regex regex_search crash

Reported by: ufwt@… Owned by: John Maddock
Milestone: Boost 1.40.0 Component: regex
Version: Boost 1.37.0 Severity: Problem
Keywords: Cc:

Description

i use tmp.txt size >5m, only one line include bbbbbbbb

use regex "[a-z].*(xxxxx)' and call regex_search

then boost throw except

Change History (10)

comment:1 by Steven Watanabe, 13 years ago

Component: Noneregex
Owner: set to John Maddock

Can you provide minimal code and input that reproduces the problem? What exception is thrown exactly?

comment:2 by ufwt@…, 13 years ago

crash example:

boost::regex re("[a-z].*(d_notexiststring)",boost::regex::perl|boost::regex::no_except);

std::string s="f"; s.append(1024*100,'a'); s.append("dd"); boost::match_results<std::string::const_iterator> what; boost::match_flag_type flags = boost::match_default;

std::string::const_iterator start, end; start = s.begin(); end = s.end(); boost::regex_search(start,end,what,re,flags);

except:

uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >

  • Memory exhausted

raise at state_count > max_state_count

comment:3 by John Maddock, 13 years ago

Status: newassigned

The code doesn't crash: it throws an exception that it's documented to throw.

In this case it's because the number of states visited in the FSM has grown too large and it bails out to prevent an "eternal" match attempt.

Perl manages to optimise this particular expression, but if you make it trivially more complex, say '[a-z].*(xxxxx)|x', then it takes a very long time indeed to return from a match attempt on your string.

I'll leave this open for now though, as it looks like this special case can be optimised a little more.

John.

comment:4 by ufwt@…, 13 years ago

so i use try catch to avoid program crash!!

comment:5 by ufwt@…, 13 years ago

i also change re_repeat's max let re_pepeat max - min < 1024 i can change this ?

comment:6 by anonymous, 13 years ago

"i also change re_repeat's max let re_pepeat max - min < 1024 i can change this ? "

Sorry I don't understand what you are asking. Change what code precisely?

comment:7 by ufwt@…, 13 years ago

i chanage re_repeat'max , code:

void changeRegexRepeatNum(boost::regex &regex,size_t max) {

boost::re_detail::re_syntax_base* state=regex.get_data().m_first_state; while(state){

switch(state->type){

case boost::re_detail::syntax_element_rep: case boost::re_detail::syntax_element_dot_rep: case boost::re_detail::syntax_element_char_rep: case boost::re_detail::syntax_element_short_set_rep: case boost::re_detail::syntax_element_long_set_rep: {

boost::re_detail::re_repeat *repeat=static_cast<boost::re_detail::re_repeat*>(state); if(repeat->max - repeat->min >max){

repeat->max = repeat->min + max;

}

} break; default:

break;

} state = state->next.p;

}

comment:8 by anonymous, 13 years ago

Sure you can do that, but I'm not sure why you would want to?

John.

comment:9 by ufwt@…, 13 years ago

because my txt is very long ,i do this can regex_search speed quick!

comment:10 by John Maddock, 13 years ago

Resolution: wontfix
Status: assignedclosed
Note: See TracTickets for help on using tickets.