Opened 16 years ago

Closed 16 years ago

#620 closed Bugs (Wont Fix)

"memory exhausted" in regex matching

Reported by: nobody Owned by: John Maddock
Milestone: Component: regex
Version: None Severity:
Keywords: Cc:

Description

Using the following regex:
(a*ba*)*c
to match the following text:
abababababababababababab
will throw an exception saying "Memory exhausted"
(tried with g++ 4.0.4 on linux with boost 1.32 and
1.33. Similar problem on Windows/mingw with 1.32)

The following, more useful expression has the same problem:
(a*ba*d?)*c

Charles-Henri Gros <chgros@coverity.com>

Change History (1)

comment:1 by John Maddock, 16 years ago

Status: assignedclosed
Logged In: YES 
user_id=14804

It's a deliberate feature: it's possible to write
expressions that take *forever* to find a match.  Most
perl-regex engines don't protect you against this,
Boost.Regex deliberately keeps track of how many states the
finite state machine has visited, and if it looks like the
machine is "thrashing" as a result of a badly written
expression will throw an expression.  That's what you're
seeing here, and yes the error message is misleading :-(

It's possible to hack the regex source to up the limits used
(see estimate_max_state_count in perl_matcher_common.hpp),
but it's better to fix the regular expression used: problems
occur when you have something that looks like (a*)*. The
expression you posted could be improved by using something
like (a*b(?:a*d)?)*c which I *think* cuts out the recursive
paths through the state machine.  This is also the sort of
problem for which (?>...) independent sub-expressions were
invented.

HTH, John.
Note: See TracTickets for help on using tickets.