Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#4220 closed Support Requests (invalid)

Performance of erase in multi-index-container

Reported by: Rohit Joshi Owned by: Joaquín M López Muñoz
Milestone: Boost 1.43.0 Component: multi_index
Version: Boost 1.42.0 Severity: Optimization
Keywords: Cc:

Description

Hello,

I have a multi-index-container with hash_unique key to store the session objects. The insert and retrieval is amazingly fast but when I erase the elements, it is really slow. It degrades with number of elements in the container. I thought it would be constant because I use index_by hash_unique key. Is it expected behavior or am I missing something?

for 10,000 elements, it takes around 2.53927016 seconds

for 15,000 elements, it takes around 7.137100068 seconds

for 20,000 elements, it takes around 21.391720757 seconds

I have posted a question with source code at http://stackoverflow.com/questions/2831138/boost-multi-index-container-and-erase-performance

Change History (8)

comment:1 by Steven Watanabe, 12 years ago

I believe this is because computing the iterator to return can be expensive.

comment:2 by Joaquín M López Muñoz, 12 years ago

That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type& x) version of erase, not the problematic one. I've just posted there a guess along a different line.

in reply to:  2 ; comment:3 by anonymous, 12 years ago

Replying to joaquin:

That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type& x) version of erase, not the problematic one. I've just posted there a guess along a different line.

Thx for your reply. yes, possibility of m_nTransactionHandle could be same value and that's why I have used it as hash_non_unique. But primary index m_nId is hashed_unique (no duplicate) and I am using that to erase from container. I think non-unique/secondary index values shouldn't impact performance while erasing entry via a primary hashed index. Anyway, I will try that out and let you know.

in reply to:  3 comment:4 by Rohit Joshi, 12 years ago

Replying to anonymous:

Replying to joaquin:

That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type& x) version of erase, not the problematic one. I've just posted there a guess along a different line.

Thx for your reply. yes, possibility of m_nTransactionHandle could be same value and that's why I have used it as hash_non_unique. But primary index m_nId is hashed_unique (no duplicate) and I am using that to erase from container. I think non-unique/secondary index values shouldn't impact performance while erasing entry via a primary hashed index. Anyway, I will try that out and let you know.

Yes, when I make 2nd index hashed_unique, performance increased drastically. Than I tried 2ns index as ordered_unique and ordered_non_unique and performance was similar to hashed_unique. So I don't understand why hash_non_unique looses performance even though I use primary key hashed_unique to erase the objects. I have posted a link on stackoverflow (It seems I can't post a link here) for performance test results. Thx for your help.

comment:5 by anonymous, 12 years ago

Resolution: invalid
Status: newclosed

Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements.

comment:6 by anonymous, 12 years ago

Resolution: invalid
Status: closedreopened

comment:7 by Joaquín M López Muñoz, 12 years ago

Resolution: invalid
Status: reopenedclosed

Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements.

in reply to:  7 comment:8 by anonymous, 12 years ago

Replying to joaquin:

Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements.

Thanks for your explanation. I just found that the performance for hashed_non_unique versus hashed_unique for 2nd index is the almost same except slight overhead of checking duplicate. The bottleneck was with boost::object_pool. I don't know internal implementation but it seem it is a list where it iterate through the list to find objects.

To delete 10,000 objects from object_pool:0.480829439

To delete 20,000 objects from object_pool:5.37241036

To delete 30,000 objects from object_pool:21.4259488218

I think we need to create a bug for boost::object_pool.

Note: See TracTickets for help on using tickets.