Opened 10 years ago

Closed 10 years ago

#7959 closed Bugs (wontfix)

regex lookahead fails at about 95,000 chars

Reported by: michael@… Owned by: John Maddock
Milestone: To Be Determined Component: regex
Version: Boost 1.52.0 Severity: Problem
Keywords: Cc:

Description

The lookahead system fails if it must traverse about 95000 chars without a match.

Test pattern: start((?!end).)*stuff

Input:

start nothing end start some stuff end start nothing end start more stuff end

This will match the two start-end regions containing "stuff". However, insert about 95000 random characters into the first start-end region, and the search fails.

Debugging on VC2008 shows a Microsoft C++ exception: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >

I traced as far as finding that the error occurs on the first call to match_all_states().

Change History (1)

comment:1 by John Maddock, 10 years ago

Resolution: wontfix
Status: newclosed

If you catch the exception you will see that it says:

"Ran out of stack space trying to match the regular expression."

Basically Regex sets an upper limit on how much memory it will grab when trying to find a match. If you up the values of BOOST_REGEX_MAX_BLOCKS and/or BOOST_REGEX_BLOCKSIZE in boost/regex/user.hpp and then rebuild everything (including the library), then the issue will disappear - or at least get shifted to larger texts before you hit the limit. Obviously if you set BOOST_REGEX_MAX_BLOCKS to something like INT_MAX then there is no limit - whether you think that's a good idea I'll leave up to you!

But basically this is by design.

Note: See TracTickets for help on using tickets.