Opened 7 years ago

Closed 7 years ago

#11701 closed Bugs (fixed)

Regression in boost::intrusive::set::equal_range

Reported by: nyh@… Owned by: Ion Gaztañaga
Milestone: To Be Determined Component: intrusive
Version: Boost 1.57.0 Severity: Regression
Keywords: Cc: pumpkingod@…

Description

This bug was discovered by Daniel Peebles in https://github.com/cloudius-systems/osv/issues/635

equal_range() is supposed to be a slightly more efficient way to obtain both lower_bound() and upper_bound(). But as this issue demonstrates, boost::intrusive::set :: equal_range() was correct in Boost 1.55 (tested on a Fedora 21 installation) but is broken in Boost 1.57 (tested on a Fedora 22 installation).

The attached simplified reproducer program builds a boost::intrusive::set of a "myrange" type - simple disjoint integer ranges, and then uses a separate user-defined comparator "comp" in finding the lower and upper bound, as well as equal_range.

The output in Boost 1.55 is the expected output:

equal_range: (5, 8) - (10, 13)
lower_bound: (5, 8)
upper_bound: (10, 13)

Indeed, "(5,8)" is the first item to match, and "(10,13)" is the one past the last.

But, the output in Boost 1.57 is different:

equal_range: (5, 8) - (8, 10)
lower_bound: (5, 8)
upper_bound: (10, 13)

Note how equal_range() is now incorrect ("(8,10)" still does not compare less according to "comp"), and moreover, different than what upper_bound() returns (which is the correct response).

Attachments (1)

q1.c (2.1 KB ) - added by nyh@… 7 years ago.
Simple reproducer for the equal_range bug

Download all attachments as: .zip

Change History (8)

by nyh@…, 7 years ago

Attachment: q1.c added

Simple reproducer for the equal_range bug

comment:1 by pumpkingod@…, 7 years ago

Cc: pumpkingod@… added

comment:2 by Ion Gaztañaga, 7 years ago

In your example, "comp" does not induce a strict weak order because both x < y and y < x are true for two ranges x = (2, 2) and y = (2, 2).

And the comparison passed as argument shall induce the same strict weak ordering as key_compare (this precondition is missing in equal_range but it's the same as required by "erase", "insert_", etc...

In a set, two elements can't be equivalent according to the predicate, so with any comparison function that leads to the same strict ordering, equal_range must return the only element equivalent to "key" as pair.first and an iterator to the next element as pair.second. The tree is ordered using the predicate so using any ordering function that does not lead to the same order will be broken by definition, even using lower-bound() and upper_bound might return false replies.

I guess the problem appeared when set::equal_range was optimized assuming those preconditions. I think your use case is invalid, and missing "comp" preconditions should be added to all functions that take it as argument.

comment:3 by pumpkingod@…, 7 years ago

My understanding is that indeed it is not a valid strict weak order if you allow zero-width ranges, but that is disallowed in other ways, and is not the case in the sample data in the reproduction.

Re: the compatibility of the two orderings, that seems like the more likely culprit here.

comment:4 by Ion Gaztañaga, 7 years ago

Typically, interval trees can be implemented as ranges ordered by the low endpoint, but each node is augmented to store an extra annotation holding the maximum upper endpoint among all child nodes. An special search function is used to find the overlap.

Without augmentation, I suspect that the equal range search would not detect any overlapping range. Is algorithmically guaranteed that with a tree ordered by the low endpoint, a lower_bound search using (x._end <= y._start) will find the start of the overlapping range?

comment:5 by anonymous, 7 years ago

Hi igaztanaga, thanks for the detailed response, although I have to admit I'm a little confused about all the "preconditions" on the given comparator which you mention. Can you please point me to some documentation which lays them out explicitly?

But I have to admit I'm surprised by your statements about the various "preconditions" and "compatibility" of the two comparators and the fact that the second one also needs to define a set. To explain why I'm surprised, let's step back for a moment and ask ourselves - why am I using lower_bound (or upper_bound or equal_range) at all? Why am I not just doing a simple iteration on the set to compare the items in the set with the given item, using the given comparator?

The answer is complexity: Simple iteration has linear complexity (O(N)), while upper_bound and friends are supposed to have logarithmic complexity O(ln N). BUT, there is one snag: while std::upper_bound (http://en.cppreference.com/w/cpp/algorithm/upper_bound) indeed does a logarithmic number of *comparisons*, it still needs to do a linear number of increments of the iterator when the iterator is not random-access, because it cannot skip half the list at once. Other than this snag (linear number of increments), std::upper_bound does exactly what I want, and does not have any of the "preconditions" you mention on the comparator - all you need a guarantee that the comparator in fact divides (partitions) the given iteratable list into "smaller than the given item" (according to the comparactor in the parameter) and "higher than the given item". That's it. And in my example, this condition definitely holds.

Now, what I expected in boost::intrusive::set::upper_bound and friends, is to implement std::upper_bound with one added benefit: Because the set is not just a sorted list, but actually a tree, we can skip big parts of the list and get ourselves a logarithmic number of iterator updates instead of linear, as we hoped. In other words, I expected boost::intrusive::set::upper_bound to not be more limited - just better performing - than std::upper_bound.

What most surprised me in your answer is that you seem to say that because the given list is known to be a set (according to its intrinsic comparator), with non-equal values, then the result of equal_range with a different comparator should always be a single value - no matter what the second comparator is (and in particular we're not allowed to choose a comparator which declares several items in the set to be equal under this comparator). This is really surprising, because std::equal_range was designed exactly for the purpose of choosing a comparator which does declare multiple items to be equal, and the idea is to find all the equal items.

By the way, interestingly, std::set's upper_bound and friends do not let you specify the comparator at all. They always use the set's intrinsic comparator. I guess that while boost::intrusive::set's implementation does let you specify the comparator, it's really unsafe to do so unless you really know what you're doing. Perhaps http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3465.pdf is relevant here. But it is apparently not useful to invent your own comparator and belive it would work - even if it would have worked just fine in std::upper_bound. The rules for a comparator for std::upper_bound and for boost::intrusive::set::upper_bound are apparently very different in surprising ways.

comment:6 by Ion Gaztañaga, 7 years ago

Boost.Intrusive's custom comparators were designed before heterogeneous comparisons were proposed. Intrusive's "advanced lookup" functions (http://www.boost.org/doc/libs/1_59_0/doc/html/intrusive/advanced_lookups_insertions.html) were designed to use a search function without the need of constructing a whole "value_type". That is, if your value_type is ordered by a std::string, these functions offer a way to search the set with a std::string or a "const char *" key. But the idea was that the comparison object had to be strictly compatible (same strict weak order) with the predicate used to define the container. With that precondition, a lower bound (in a set, not a multiset) can only return a single value (because there are no two "equivalent" objects in the set).

This precondition was only documented in some functions, like "insert_check" (http://www.boost.org/doc/libs/1_59_0/doc/html/boost/intrusive/set.html#idp50496544-bb) and was missing in lower/upper/equal_bound family.

Heterogeneous lookups are more flexible. The requirement for the supplied comparison object is that it must be "compatible" with the set's predicate only in the sense that the container must be also "partitioned" with respect to the supplied comparison object. E.g.: This allows sorting a container by surname first and by name if the surname was equivalent, and then do a search to find a range of surnames (as the container is ordered with a predicate that also fulfills the requirements of a container sorted by surname).

In your case, I don't see how a set ordered by x._start is also partitioned with respect to the custom comparison "(x._end <= y._start)". E.g: [0, 1), [2, 6), [3, 5), [6, 7) is ordered by the "_start" member but I fail to see how is also partitioned with respecto to "(x._end <= y._start)", so I can do a binary search.

In any case, it might be interesting to update Boost.Intrusive "advanced lookup" functions to support heterogeneous lookups. This requires changing the precondition

"Requires: comp must be a comparison function that induces the same strict weak ordering as key_compare. The difference is that comp compares an arbitrary key with the contained values."

to something like:

"Requires: key is a value such that this container is partitioned with respect to comp(node, key)"

and change the implementation assuming the new requirement.

comment:7 by Ion Gaztañaga, 7 years ago

Resolution: fixed
Status: newclosed

New requirements for comparison objects in searches added. The same strict weak ordering is no longer required, new requirements added to all search functions. This should restore the old behaviour, in equal_range. Commited in develop:

https://github.com/boostorg/intrusive/commit/b11720f52714f340ab1869c158ffba9fb8e2751b

Many thanks for the report and the detailed discussion.

Note: See TracTickets for help on using tickets.