Boost C++ Libraries: Ticket #9839: Improve flat_containers performance during insertion phases https://svn.boost.org/trac10/ticket/9839 <p> The problem: </p> <p> A lot of applications have "look-up" phases and "insertion" phases. In each phase, lots of lookups/insertions take place. However, inserting elements into a flat_ container is slow. </p> <p> The current way to do this is: copy data out, insert, sort, unique, copy data back in (using the constructor that takes a sorted_uniqued_range). This requires two times the memory necessary, and performs two unnecessary O(N) traversals (the two copies). </p> <p> In the mailing list this issue was discussed <a class="missing changeset" title="No changeset 0 in the repository">[0]</a>, but no action was taken. </p> <p> The solutions: </p> <p> Proposal 1) Allow push_back_unsorted + sort Proposal 2) Allow underlying vector to be moved out/push_back/sort/unique/move back in/assert its good in debug mode </p> <p> I prefer Proposal 2) since I think is safer, so I'll elaborate on it. </p> <p> I propose two expand the interface of flat_containers with the following two methods: </p> <ul><li>Data&amp;&amp; retrieve_data(); </li><li>void adquire_data(Data&amp;&amp; d); </li></ul><p> retrieve_data moves the underlying vector container out of the flat_container, makes the flat_container an empty container (s.t. it remains valid, are invariants are fine). </p> <p> adquire_data copies/moves a vector container of the same type as the one used by flat_containers, it assert in debug mode only that the container is sorted (and uniqued if necessary), and sets all container invariants to match the new data. </p> <p> IMO the underlying vector type should be configurable, but that is a different topic. </p> <p> <a class="missing changeset" title="No changeset 0 in the repository">[0]</a> google: Containers-Should-flat-expose-implementation-vector-td3722533, trac doesn't like links :( </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9839 Trac 1.4.3 Piotr Podusowski <piotr.podusowski@…> Thu, 24 May 2018 13:08:36 GMT <link>https://svn.boost.org/trac10/ticket/9839#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:1</guid> <description> <p> Why can't flat_* sort it by itself when pushing the data back? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Thu, 24 May 2018 13:56:40 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9839#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:2</guid> <description> <p> That would mean that each push back has a complexity of <code>O(N)</code>, which means that push_back in a loop has a complexity of O(N<sup>2). The approach proposed in this issue makes the complexity of bulk insertion O(NlogN) which is much better than O(N</sup>2). </p> </description> <category>Ticket</category> </item> <item> <author>Piotr Podusowski <piotr.podusowski@…></author> <pubDate>Thu, 24 May 2018 14:03:56 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9839#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:3</guid> <description> <p> No no, I meant sort it inside void adquire_data(Data&amp;&amp; d) itself. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Thu, 24 May 2018 14:27:07 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9839#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:4</guid> <description> <p> Ah, I see. Well, it can. I wrote that it should just assert in debug mode that the data is sorted, which is O(N). This allows the user to choose the sorting algorithm that suits best whatever it's doing. </p> <p> An alternative is to always sort the data, or to provide two methods: </p> <ul><li>adquire_data which always sorts the data, </li><li>adquire_data_sorted which assumes the data is sorted (and checks it in debug mode) </li></ul> </description> <category>Ticket</category> </item> <item> <dc:creator>Ion Gaztañaga</dc:creator> <pubDate>Thu, 24 May 2018 20:36:03 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9839#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:5</guid> <description> <p> This was already implemented: </p> <p> <a class="ext-link" href="https://www.boost.org/doc/libs/1_67_0/doc/html/boost/container/flat_map.html"><span class="icon">​</span>https://www.boost.org/doc/libs/1_67_0/doc/html/boost/container/flat_map.html</a> </p> <p> See members: </p> <blockquote> <p> sequence_type extract_sequence(); void adopt_sequence(sequence_type &amp;&amp;); void adopt_sequence(ordered_unique_range_t, sequence_type &amp;&amp;); </p> </blockquote> <p> using these functions a user can use a sequence_type (vector, by default), fill it, and do a single "adopt_sequence" call to sort data. </p> </description> <category>Ticket</category> </item> <item> <author>Piotr Podusowski <piotr.podusowski@…></author> <pubDate>Fri, 25 May 2018 06:41:16 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9839#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9839#comment:6</guid> <description> <p> OMG, thanks, this is the second time I missed something. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Ion Gaztañaga</dc:creator> <pubDate>Sun, 17 Jun 2018 10:50:30 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/9839#comment:7 https://svn.boost.org/trac10/ticket/9839#comment:7 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> Resolving it as it was implemented in Boost 1.67. Thanks for the report. </p> Ticket