Opened 11 years ago

Closed 9 years ago

#6324 closed Bugs (fixed)

Flyweight performance improvement

Reported by: Koh Ohnishi <k_onishi@…> Owned by: Joaquín M López Muñoz
Milestone: To Be Determined Component: multi_index
Version: Boost 1.48.0 Severity: Problem
Keywords: flyweight, multi_index, hashed_unique, performance Cc:

Description

Hi, I am using flyweight for storing char_string, and when I do an erase one of the flyweight entry when there are a lot of entries, I am facing the performance issue, like ticket #4264 mentioned, in multi_index hashed_unique erase in boost/flyweight/hashed_factory.hpp hashed_factory_class::erase(handle_type) .

There are two reasons.

  1. hashed_factory_class::erase should call erase with key, like

cont.erase(cont.key_extractor()(*(cont.iterator_to(*h)))); instead of calling cont.erase(cont.iterator_to(*h)); as Change History of #4264 mentions.

  1. boost::multi_index hashed_unique erase takes linear time against the bucketsize, even if you use erase method with key, as ticket #4264's Change History mentioned. It calculates begin() and do linear search the top of the node when erasing some node.

These two behavior cause significant performance penalty when a lot of entries are stored in flyweight.

I think most of users of multi_index hashed_unique are expecting not O(1) begin(), but O(1) erase(key). I would like the erase method of hashed_unique should perform O(1).

Thanks,

Change History (4)

comment:1 by Joaquín M López Muñoz, 11 years ago

The problem with erasing by key rather than by iterator is that the former can throw (when calculating the hash value of the key, for instance), which Boost.Flyweight can't afford.

Have you tried using set_factory? Insertion is probably slower but deletion does not have the problems you describe.

comment:2 by k_onishi@…, 11 years ago

Thanks for your comment.

I understood the design policy of Boost.Flyweight. But I think you can avoid exception by catch clause in erase method of the factory. Once you catch the exception, then erasing by iterator.

I had adapted set_factory and improve performance by using erase(key) in the erase of the factory before posting this ticket. To solve whole problem completely, I modified multi_index/hashed_index.hpp as well as using set_factory of flyweight.

I think these problems always stem from iterator manipulation of multi_index (and unordered). Does anyone have any idea of work around in multi_index.hashed_index performance issues?

My problem is, or I am worried about, the bad reputation of boost library. Some commercial projects are using boost::flyweight and boost::multi_index::hashed_unique expecting O(1) inserting, erasing, and finding entry, but we had faced on the performance problem of erasing both flyweight and multi_index at late test phase.

Users cannot imagine erasing of hashed container is time-consuming. Documentation (tell us which method has performance issue) or work around or customization (like set_factory) may help my problems.

Thanks,

comment:3 by Joaquín M López Muñoz, 9 years ago

Starting with Boost.1.56, hashed indices will use a new internal data structure allowing for O(1) erase(iterator).

https://svn.boost.org/trac/boost/changeset/86264

A preview can be obtained by just downloading from the current Boost trunk branch. Any feedback on the performance within real-life scenarios is greatly appreciated. Waiting to such feedback before closing the bug.

comment:4 by Joaquín M López Muñoz, 9 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.