Boost C++ Libraries: Ticket #4220: Performance of erase in multi-index-container https://svn.boost.org/trac10/ticket/4220 <p> Hello, </p> <blockquote> <p> I have a multi-index-container with hash_unique key to store the session objects. The insert and retrieval is amazingly fast but when I erase the elements, it is really slow. It degrades with number of elements in the container. I thought it would be constant because I use index_by hash_unique key. Is it expected behavior or am I missing something? </p> </blockquote> <p> for 10,000 elements, it takes around 2.53927016 seconds </p> <p> for 15,000 elements, it takes around 7.137100068 seconds </p> <p> for 20,000 elements, it takes around 21.391720757 seconds </p> <p> I have posted a question with source code at <a class="ext-link" href="http://stackoverflow.com/questions/2831138/boost-multi-index-container-and-erase-performance"><span class="icon">​</span>http://stackoverflow.com/questions/2831138/boost-multi-index-container-and-erase-performance</a> </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/4220 Trac 1.4.3 Steven Watanabe Sat, 15 May 2010 02:00:18 GMT <link>https://svn.boost.org/trac10/ticket/4220#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/4220#comment:1</guid> <description> <p> I believe this is because computing the iterator to return can be expensive. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Sat, 15 May 2010 08:19:37 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/4220#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/4220#comment:2</guid> <description> <p> That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type&amp; x) version of erase, not the problematic one. I've just posted there a guess along a different line. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Sat, 15 May 2010 12:59:45 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/4220#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/4220#comment:3</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/4220#comment:2" title="Comment 2">joaquin</a>: </p> <blockquote class="citation"> <p> That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type&amp; x) version of erase, not the problematic one. I've just posted there a guess along a different line. </p> </blockquote> <p> Thx for your reply. yes, possibility of m_nTransactionHandle could be same value and that's why I have used it as hash_non_unique. But primary index m_nId is hashed_unique (no duplicate) and I am using that to erase from container. I think non-unique/secondary index values shouldn't impact performance while erasing entry via a primary hashed index. Anyway, I will try that out and let you know. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Rohit Joshi</dc:creator> <pubDate>Sun, 16 May 2010 19:00:25 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/4220#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/4220#comment:4</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/4220#comment:3" title="Comment 3">anonymous</a>: </p> <blockquote class="citation"> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/4220#comment:2" title="Comment 2">joaquin</a>: </p> <blockquote class="citation"> <p> That was my hunch at first also, but the test shown as stackoverflow uses the size_type erase(const key_type&amp; x) version of erase, not the problematic one. I've just posted there a guess along a different line. </p> </blockquote> <p> Thx for your reply. yes, possibility of m_nTransactionHandle could be same value and that's why I have used it as hash_non_unique. But primary index m_nId is hashed_unique (no duplicate) and I am using that to erase from container. I think non-unique/secondary index values shouldn't impact performance while erasing entry via a primary hashed index. Anyway, I will try that out and let you know. </p> </blockquote> <p> Yes, when I make 2nd index hashed_unique, performance increased drastically. Than I tried 2ns index as ordered_unique and ordered_non_unique and performance was similar to hashed_unique. So I don't understand why hash_non_unique looses performance even though I use primary key hashed_unique to erase the objects. I have posted a link on stackoverflow (It seems I can't post a link here) for performance test results. Thx for your help. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Mon, 17 May 2010 06:19:34 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/4220#comment:5 https://svn.boost.org/trac10/ticket/4220#comment:5 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> <p> Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements. </p> Ticket anonymous Mon, 17 May 2010 06:21:03 GMT status changed; resolution deleted https://svn.boost.org/trac10/ticket/4220#comment:6 https://svn.boost.org/trac10/ticket/4220#comment:6 <ul> <li><strong>status</strong> <span class="trac-field-old">closed</span> → <span class="trac-field-new">reopened</span> </li> <li><strong>resolution</strong> <span class="trac-field-deleted">invalid</span> </li> </ul> Ticket Joaquín M López Muñoz Mon, 17 May 2010 10:58:20 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/4220#comment:7 https://svn.boost.org/trac10/ticket/4220#comment:7 <ul> <li><strong>status</strong> <span class="trac-field-old">reopened</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> <p> Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements. </p> Ticket anonymous Mon, 17 May 2010 17:56:06 GMT <link>https://svn.boost.org/trac10/ticket/4220#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/4220#comment:8</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/4220#comment:7" title="Comment 7">joaquin</a>: </p> <blockquote class="citation"> <p> Closing this report as a non-bug. Hashed indices aren't designed to work efficiently when there are *many* different equal elements. </p> </blockquote> <p> Thanks for your explanation. I just found that the performance for hashed_non_unique versus hashed_unique for 2nd index is the almost same except slight overhead of checking duplicate. The bottleneck was with boost::object_pool. I don't know internal implementation but it seem it is a list where it iterate through the list to find objects. </p> <p> To delete 10,000 objects from object_pool:0.480829439 </p> <p> To delete 20,000 objects from object_pool:5.37241036 </p> <p> To delete 30,000 objects from object_pool:21.4259488218 </p> <p> I think we need to create a bug for boost::object_pool. </p> </description> <category>Ticket</category> </item> </channel> </rss>