Opened 9 years ago

Closed 4 years ago

#9839 closed Feature Requests (fixed)

Improve flat_containers performance during insertion phases

Reported by: gonzalobg88@… Owned by: Ion Gaztañaga
Milestone: To Be Determined Component: container
Version: Boost Development Trunk Severity: Optimization
Keywords: Cc:

Description

The problem:

A lot of applications have "look-up" phases and "insertion" phases. In each phase, lots of lookups/insertions take place. However, inserting elements into a flat_ container is slow.

The current way to do this is: copy data out, insert, sort, unique, copy data back in (using the constructor that takes a sorted_uniqued_range). This requires two times the memory necessary, and performs two unnecessary O(N) traversals (the two copies).

In the mailing list this issue was discussed [0], but no action was taken.

The solutions:

Proposal 1) Allow push_back_unsorted + sort Proposal 2) Allow underlying vector to be moved out/push_back/sort/unique/move back in/assert its good in debug mode

I prefer Proposal 2) since I think is safer, so I'll elaborate on it.

I propose two expand the interface of flat_containers with the following two methods:

  • Data&& retrieve_data();
  • void adquire_data(Data&& d);

retrieve_data moves the underlying vector container out of the flat_container, makes the flat_container an empty container (s.t. it remains valid, are invariants are fine).

adquire_data copies/moves a vector container of the same type as the one used by flat_containers, it assert in debug mode only that the container is sorted (and uniqued if necessary), and sets all container invariants to match the new data.

IMO the underlying vector type should be configurable, but that is a different topic.

[0] google: Containers-Should-flat-expose-implementation-vector-td3722533, trac doesn't like links :(

Change History (7)

comment:1 by Piotr Podusowski <piotr.podusowski@…>, 4 years ago

Why can't flat_* sort it by itself when pushing the data back?

comment:2 by anonymous, 4 years ago

That would mean that each push back has a complexity of O(N), which means that push_back in a loop has a complexity of O(N2). The approach proposed in this issue makes the complexity of bulk insertion O(NlogN) which is much better than O(N2).

comment:3 by Piotr Podusowski <piotr.podusowski@…>, 4 years ago

No no, I meant sort it inside void adquire_data(Data&& d) itself.

comment:4 by anonymous, 4 years ago

Ah, I see. Well, it can. I wrote that it should just assert in debug mode that the data is sorted, which is O(N). This allows the user to choose the sorting algorithm that suits best whatever it's doing.

An alternative is to always sort the data, or to provide two methods:

  • adquire_data which always sorts the data,
  • adquire_data_sorted which assumes the data is sorted (and checks it in debug mode)

comment:5 by Ion Gaztañaga, 4 years ago

This was already implemented:

https://www.boost.org/doc/libs/1_67_0/doc/html/boost/container/flat_map.html

See members:

sequence_type extract_sequence(); void adopt_sequence(sequence_type &&); void adopt_sequence(ordered_unique_range_t, sequence_type &&);

using these functions a user can use a sequence_type (vector, by default), fill it, and do a single "adopt_sequence" call to sort data.

comment:6 by Piotr Podusowski <piotr.podusowski@…>, 4 years ago

OMG, thanks, this is the second time I missed something.

comment:7 by Ion Gaztañaga, 4 years ago

Resolution: fixed
Status: newclosed

Resolving it as it was implemented in Boost 1.67. Thanks for the report.

Note: See TracTickets for help on using tickets.