Opened 13 years ago

Closed 13 years ago

#3513 closed Support Requests (invalid)

regex_match very slow example

Reported by: Fernando Pelliccioni <fpelliccioni@…> Owned by: Eric Niebler
Milestone: Boost 1.41.0 Component: xpressive
Version: Boost 1.40.0 Severity: Problem
Keywords: Cc:

Description

Hi,

I am having an issue with a dynamic regular expresion.

I have this source code.

sregex tempRE = sregex::compile("(?:.*
r?
n)*var wlanPara = new Array
(
r?
n
d{0,4},
r?
n\"(?P<ssid>(?:
w+))\",
r?
n(?P<channel>(?:
d{0,4})),
r?
n
d{0,4},
r?
n\"[
w-]+\",
r?
n\"[
w
.]+\",
r?
n
d{0,4},
r?
n
d{0,4},
r?
n\"(?P<signal>(?:
d{0,4})) dB\",
r?
n
d{0,4},
d{0,4}
);(?:.*
r?
n)*$");

std::string htmlText; filled using text in html file attached

smatch what;

if(regex_match(htmlText, what, tempRE)) {

...

}

When the program enters to the regex_match function, the process consumes 50% of the processor and the function never returns.

I was attach the html file that contains the text. The source code was tested using VisualStudio 2008.

If I use the static variant of regex, it's works perfectly..

sregex tempRE = bos >> *_ >> "var wlanPara = new Array(" >> _ln >> _d >> commonDigit >> ',' >> _ln >> '"' >> (s1= +_w) >> "\"," >> _ln >> (s2= commonDigit) >> ',' >> _ln >> commonDigit >> ',' >> _ln >> '"' >> +(_w | '-') >> "\"," >> _ln >> '"' >> +(_w | '.') >> "\"," >> _ln >> commonDigit >> ',' >> _ln >> commonDigit >> ',' >> _ln >> '"' >> (s3= commonDigit) >> " dB\"," >> _ln >> commonDigit >> ',' >> commonDigit >> " );" >> *_ >> eos;

Thanks,

Fernando Pelliccioni

Attachments (1)

text.html (13.6 KB ) - added by Fernando Pelliccioni <fpelliccioni@…> 13 years ago.
HTML file that contains the text to parse

Download all attachments as: .zip

Change History (2)

by Fernando Pelliccioni <fpelliccioni@…>, 13 years ago

Attachment: text.html added

HTML file that contains the text to parse

comment:1 by Eric Niebler, 13 years ago

Resolution: invalid
Status: newclosed

Your dynamic regex is slow because you are needlessly using nested quantifiers in two places. The regex begins and ends with "(?:.*
r?
n)*". This case is explicitly called out in xpressive's docs as a common pitfall. Please read this:

http://www.boost.org/doc/libs/1_40_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers

Your static regex does not use nested quantifiers. That explains the difference.

Note: See TracTickets for help on using tickets.