Opened 16 years ago

Closed 16 years ago

#712 closed Support Requests (None)

Regex: multi-processor scalability, user-defined allocators

Reported by: r_emek Owned by: John Maddock
Milestone: Component: regex
Version: None Severity:
Keywords: Cc:

Description

I'm using boost-regex library, and I'm running into
multi-threading scalability issues. When tested on a
machine with several CPUs, my performance tests show
improvement of about 60-70% when moving from a single
thread to 2 threads. 

Question 1: Is anyone aware of this problem? Is there a
known solution?

From some testing we've done, it seems that the cause
of the scalability issue is problematic implementation
of std:allocator. This is a known issue on some
operating systems (e.g., Solaris 8). Some of the
boost::regex classes (e.g., match_results) accept a
user-defined allocator. However, as far as I can
understand there's no way to completely override the
use of std::allocator. 

Question 2: is there a way to completely prevent boost
/ regex from using std::allocator? If no, can such a
capability be added?

(One solution to this problem would be global
overriding the 'new' and 'delete' operators - but for
various reasons this cannot be done in the application
I'm developing). 

Change History (4)

comment:1 by John Maddock, 16 years ago

Logged In: YES 
user_id=14804

Actually it used to be there but I was asked to remove it :-(

The question you need to ask is which part of the regex lib
is causing problems, there are three main areas that use
memory allocation:

1) Regex construction: uses std::basic_string and other STL
classes along with shared_ptr etc etc, there's no way to
change the allocator here.  However, the question you need
to ask is "do my threads need to construct regexes at all?"
 Boost.Regex is designed so that multiple threads can share
the same immutable regex object.

2) Regex matching: the library needs a stack for the FSM to
work on.  On Unix like systems this memory is cached and
obtained from the routines near the end of regex.cpp, you
could replace these with thread specific allocators if this
is the overhead.  Or.... you could define
BOOST_REGEX_RECURSIVE and rebuild everything: regex will
then use a program-stack-recursive algorithm that saves on
memory allocations, but runs the risk of stack overflow.  If
you are on a platform that can protect you from stack
overruns then this can speed things up a little for single
threaded apps, and maybe rather more for multithreaded ones.

3) The final match_results allocation: only once per
match/search operation does match_results actually allocate
any memory - right at the end when the object is written to
with the results.  You can avoid even that, if you reuse
match_results objects so that they already contain a large
enough buffer when they're actually used.

HTH, John.

comment:2 by r_emek, 16 years ago

Logged In: YES 
user_id=1581119

A few comments: 
(1) The scalability issue I'm referring to is run-time only
- we compile the regular expressions during system
initialization, and only then create a few threads and match
/ search for regular expressions. 

(2) We tried to replace the memory caching mechanism, but it
didn't help much. 

(3) I don't know if that's what you were referring to in
item #3, but it seems that boost does perform a few memory
allocations during run-time. For example: 
 
  lines 71-74 of perl_matcher_common.hpp: 
   if(m_match_flags & match_posix)
   {
      m_temp_match.reset(new match_results<BidiIterator,
Allocator>());
      m_presult = m_temp_match.get();
   } 

  lines 97 - 100 of perl_matcher_non_recursive.hpp: 

      *end =
reinterpret_cast<saved_state*>(reinterpret_cast<char*>(*base)+BOOST_REGEX_BLOCKSIZE);
      --(*end);
      (void) new (*end)saved_state(0);
      BOOST_ASSERT(*end > *base)

Finally, we will really appreciate any additional help /
ideas of how we can further improve scalability. 

Thanks in advance, 
    Roy 

comment:3 by John Maddock, 16 years ago

Logged In: YES 
user_id=14804

Apologies for the delay in getting back to you: for some
reason sourceforge didn't send me the usual email when the
tracker-item was updated.  Feel free to email me directly if
that would be easier in future.

Regarding the points you raise, I think we need some
specific information on which allocations are causing your
slowdown (if that is indeed the cause), without that we're
just shooting in the dark.  Are you able to profile the code
at all?

Of the allocations you point out: the first occurs only for
POSIX style regexes and not for Perl style ones.

The second occurs only when regex's internal cache of memory
runs out.  You can adjust some #defines to increase the
amount cached, but if replacing the memory caching mechanism
didn't help, I doubt this will either.

You could try defining BOOST_REGEX_RECURSIVE and see what
effect that has on your times: that would cut out all memory
allocations except:

* One call to new if the expression is a POSIX rather than a
Perl one.  There is no way round this I'm afraid: except
compile your expressions as Perl ones - they'll generally
match faster anyway.
* One call to std::vector<>::allocator_type::allocate inside
match_results.  You can avoid that if you use the same
match_results instance for multiple calls to the regex
algorithms.

HTH, John.

comment:4 by sf-robot, 16 years ago

Status: assignedclosed
Logged In: YES 
user_id=1312539

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
Note: See TracTickets for help on using tickets.