Boost C++ Libraries: Ticket #101: regex performance issue https://svn.boost.org/trac10/ticket/101 <pre class="wiki">Is boost::regex_merge able to parse more than 20kb data per sec? I'm using a function like the following, and cant get it above that rate.. std::string RegExpBinReplace(LPCSTR szWhat, std::string szWhere, DWORD len, LPCSTR szReplacement) { const boost::regex e (szWhat,boost::regbase::normal); std::ostringstream t(std::ios::out | std::ios::binary); std::ostream_iterator&lt;char, char&gt; oi(t); boost::regex_merge(oi, szWhere.begin(), (szWhere.begin() + len), e, szReplacement); return t.str(); } Compiled with visual c++ 6, running on a AMD xp2000+... </pre> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/101 Trac 1.4.3 nobody Sat, 17 Aug 2002 17:54:51 GMT <link>https://svn.boost.org/trac10/ticket/101#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/101#comment:1</guid> <description> <pre class="wiki">Logged In: NO ops... i asked the same to Dr. John Maddock, directly... here it is his answer: &gt;&gt; Second, I found a "leak" in your documentation: you often use &gt;&gt; ... &gt;&gt; std::ostreamstring t; &gt;&gt; std::ostream_iterator&lt;char&gt; oi(t); &gt;&gt; boost::regex_merge(oi, in.begin(), in,end(), expr, format); &gt;&gt; ... &gt;&gt; I implemented a tiny sed(1) facility, but using ostream_iterator slows my &gt;&gt; program down terribly while executing the sed substitution command &gt;&gt; (s/regex/replacement/) many times on a 14Kb-sized html file (it takes some &gt;&gt; seconds to accomplish the tasks, while sed(1) takes less than one second). &gt; &gt;&gt; I found the problem is in re_details::re_copy_out function, which seems fast &gt;&gt; but it is not fast at all with ostream_iterators. &gt; Try using a ostreambuf iterator instead (my docs should be changed to do the &gt; same): its a lot quicker. &gt; Make sure you turn on all optimisations before making comparisons: the &gt; stream iterators are excrusiatingly slow until optimisatioms are turned on &gt; (at which point they can actually be pretty fast). &gt;&gt; So why don't make re_details::string_out_iterator public (in documentation)? &gt; There shouldn't be any need for that: it's a workaround for broken std &gt; libraries, you should really be able to use std::back_inserter with strings. &gt; regards, &gt; John Maddock since i am a beginner in c++, i am still using re_details::string_out_iterator that is faster than ostream_iterator (but not fast enough): now my own sed searches&amp;replaces a &gt;30Kb-sized html file more than 25 times in 0.63 secs on a PIII/500 under Linux 2.4.18 + gcc 2.95.3. bye Claudio </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>John Maddock</dc:creator> <pubDate>Fri, 30 May 2003 11:21:04 GMT</pubDate> <title>status changed https://svn.boost.org/trac10/ticket/101#comment:2 https://svn.boost.org/trac10/ticket/101#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> </ul> <pre class="wiki">Logged In: YES user_id=14804 A couple of points: * The current cvs code has just been updated with a much faster version, however the new version still won't improve performance for long searches, which was already substantially faster than the GNU regex library for example. * Using C++ iostreams can be rather slow, unless you are very careful: check that synch_with_stdio is false for example (otherwise the performance hit can be *huge*). * outputting the result to a string, and then copying to stdout is going to be a heck of a lot slower than just copying to output (due to memory allocation requirements) John Maddock. </pre> Ticket