Opened 9 years ago

Closed 9 years ago

#8698 closed Bugs (fixed)

Boost.Intrusive unordered_set should use different value for end()

Reported by: jody_boost@… Owned by: Ion Gaztañaga
Milestone: To Be Determined Component: intrusive
Version: Boost 1.53.0 Severity: Problem
Keywords: Cc:

Description

The current implementation uses one-past the end of the provided buckets array as the special value for end().

While this is generally OK, there are cases where it is problematic (i.e., the collection is now unusable). Specifically, consider where the buckets array is created on the stack, and the objects are also on the stack. The compiler can (and gcc does) reorder stack variables. If this happens in just the right way, you can end up with an object that you add to the collection living at the address of the special end marker.

If this happens, then the insert is successful (the cached size even increments), but since the pointer to the inserted value has the same value as end(), then finding and iterating is now broken.

Furthermore, if both the bucket array and objectes inserted are allocated from the heap, then the possibility of the above is still present (though unlikely).

Unfortunately, this has actually happened to me so that I now allocate an extra bucket in the bucket array and pass size as N-1. This wastes a bucket that is never used, but prevents the horrible bug of inserting an object, but not finding it...

Change History (6)

comment:1 by viboes, 9 years ago

Component: Noneintrusive
Owner: set to Ion Gaztañaga

comment:2 by Ion Gaztañaga, 9 years ago

Thanks for the report. I don't fully understand your explanation, though. Could you provide an small compilable example of your scenario so that I can reproduce the issue and solve it quickly? Thanks.

comment:3 by jody_boost@…, 9 years ago

Here is a contrived example to illustrate the problem. Note that while it is contrived, the actual problem can occur if the memory allocator allocates a node in the first memory location after the buckets array... or both the node and the buckets array are on the stack... the compiler has freedom to reorder stack objects and if it reorders the objects in this special arrangement, then the collection breaks.

#include <boost/intrusive/unordered_set.hpp>

struct Foo
{
  using By_Id_Hook = boost::intrusive::unordered_set_member_hook<
      boost::intrusive::link_mode<boost::intrusive::safe_link>>;
  struct Id_Equal {
    bool operator()(Foo const &lhs, Foo const &rhs) const {
      return lhs.id_ == rhs.id_;
    }
  };
  struct Id_Hash {
    std::size_t operator()(Foo const &x) const {
      return x.id_;
    }
  };
  By_Id_Hook by_id_hook_;
  std::size_t id_;

  using By_Id = boost::intrusive::unordered_set<
      Foo,
      boost::intrusive::member_hook<
          Foo,
          By_Id_Hook,
          &Foo::by_id_hook_>,
      boost::intrusive::equal<Id_Equal>,
      boost::intrusive::hash<Id_Hash>,
      boost::intrusive::power_2_buckets<true>,
      boost::intrusive::constant_time_size<true>,
      boost::intrusive::cache_begin<true>
      >;

  Foo() {
    static std::size_t nextid;
    id_ = nextid++;
  }
};


int main()
{
  // This test demonstrates the problem exactly, though it is a
  // contrived use case.  The problem is that the unordered_set
  // collection uses one-past the end of the buckets array to
  // denote the end of the collection.  So, this example reproduces
  // the problem...
  constexpr size_t num_buckets = 1024;
  struct b_t {
    Foo::By_Id::bucket_type buckets[num_buckets];
    Foo f; // To force node memory location
  } b;
  Foo::By_Id by_id(Foo::By_Id::bucket_traits(b.buckets, num_buckets));

  assert(by_id.size() == 0u);
  assert(by_id.begin() == by_id.end());

  // b.f resides in memory at the same location as one-past the
  // end of the buckets array -- which is also the same value
  // used to denote the "end" of the unordered_set.  That means
  // once we insert b.f into the collection, the collection is
  // broken because the iterator to the object is the same value
  // as "end"
  assert(!b.f.by_id_hook_.is_linked());
  by_id.insert(b.f);
  assert(by_id.size() == 1u); // Collection says it has the value
  assert(b.f.by_id_hook_.is_linked()); // Node says it is linked
  // This assertion will fail...
  assert(by_id.begin() != by_id.end()); // Iterators are broken
}

comment:4 by jody_boost@…, 9 years ago

FWIW, I contrived the example to consistently demonstrate the problem. I actually encountered it in real code.

Consider the case where the compiler reorders stack variables.

Also, consider the case where you want to create the buckets and objects in sequential memory, to keep the buckets and memory objects close together.

As a workaround, I currently construct the unordered set with BucketSize-1, so the one-past the end of the buffer address is never used for anything else.

comment:5 by Ion Gaztañaga, 9 years ago

Thanks for the test, it's definitely a bug. An alternative would be to a pointer to the first bucket as the end iterator as this element will never be an inserted type. I'll try this approach for the next release as it might require several changes in the class.

comment:6 by Ion Gaztañaga, 9 years ago

Resolution: fixed
Status: newclosed

(In [85165]) * Big refactoring in order to reduce template and debug symbol bloat.

  • Fixes #8698
  • Implemented SCARY iterators
Note: See TracTickets for help on using tickets.