Opened 9 years ago
Closed 4 years ago
#9839 closed Feature Requests (fixed)
Improve flat_containers performance during insertion phases
Reported by: | Owned by: | Ion Gaztañaga | |
---|---|---|---|
Milestone: | To Be Determined | Component: | container |
Version: | Boost Development Trunk | Severity: | Optimization |
Keywords: | Cc: |
Description
The problem:
A lot of applications have "look-up" phases and "insertion" phases. In each phase, lots of lookups/insertions take place. However, inserting elements into a flat_ container is slow.
The current way to do this is: copy data out, insert, sort, unique, copy data back in (using the constructor that takes a sorted_uniqued_range). This requires two times the memory necessary, and performs two unnecessary O(N) traversals (the two copies).
In the mailing list this issue was discussed [0], but no action was taken.
The solutions:
Proposal 1) Allow push_back_unsorted + sort Proposal 2) Allow underlying vector to be moved out/push_back/sort/unique/move back in/assert its good in debug mode
I prefer Proposal 2) since I think is safer, so I'll elaborate on it.
I propose two expand the interface of flat_containers with the following two methods:
- Data&& retrieve_data();
- void adquire_data(Data&& d);
retrieve_data moves the underlying vector container out of the flat_container, makes the flat_container an empty container (s.t. it remains valid, are invariants are fine).
adquire_data copies/moves a vector container of the same type as the one used by flat_containers, it assert in debug mode only that the container is sorted (and uniqued if necessary), and sets all container invariants to match the new data.
IMO the underlying vector type should be configurable, but that is a different topic.
[0] google: Containers-Should-flat-expose-implementation-vector-td3722533, trac doesn't like links :(
Change History (7)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
That would mean that each push back has a complexity of O(N)
, which means that push_back in a loop has a complexity of O(N2). The approach proposed in this issue makes the complexity of bulk insertion O(NlogN) which is much better than O(N2).
comment:4 by , 4 years ago
Ah, I see. Well, it can. I wrote that it should just assert in debug mode that the data is sorted, which is O(N). This allows the user to choose the sorting algorithm that suits best whatever it's doing.
An alternative is to always sort the data, or to provide two methods:
- adquire_data which always sorts the data,
- adquire_data_sorted which assumes the data is sorted (and checks it in debug mode)
comment:5 by , 4 years ago
This was already implemented:
https://www.boost.org/doc/libs/1_67_0/doc/html/boost/container/flat_map.html
See members:
sequence_type extract_sequence(); void adopt_sequence(sequence_type &&); void adopt_sequence(ordered_unique_range_t, sequence_type &&);
using these functions a user can use a sequence_type (vector, by default), fill it, and do a single "adopt_sequence" call to sort data.
comment:7 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Resolving it as it was implemented in Boost 1.67. Thanks for the report.
Why can't flat_* sort it by itself when pushing the data back?