Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#12861 closed Bugs (fixed)

Segmentation fault when creating R-tree with packing algorithm with gcc4.8.2

Reported by: michael.moessner@… Owned by: awulkiew
Milestone: Boost 1.64.0 Component: geometry
Version: Boost 1.63.0 Severity: Problem
Keywords: rtree Cc:

Description

When creating an rtree with packing algorithm I get a segmentation fault. This error is extremly rare and only appears with 4.8.2 (no older compiler tested) and not on all computers.

Boost versions tested: 1.61.0, 1.63.0 Tested compiler where error occurs: gcc-4.8.2

Tested compiler where error doesn't occur: gcc-4.9.3

Compiler options tested (always fails for gcc-4.8.2): "-std=c++11 -O0 -ggdb" "-std=c++11 -O3"

Rtree-configurations tested:

  • Do not use packing, but insert elements (works)
  • Use quadratic/linear instead of rstar (fails always)
  • Use higher numbers for MaxElements (works for the given example but can also fail if the rtree elements are changed, especially if the element number increases)
  • Instead of using std::pair as rtree-elements I tried std::tuple and boost::tuple. The error occurs for all elements.

Additional notes: This error occured for me first when I used rstar with MaxElements = 16 where the element number was about 2 Mio. In this case the error is extremly sensible: e.g. Using one element more or less makes the error disappear. Also changing an element can make the error disappear. By changing the parameters I was able to reduce the problem size. For the given case the minimum number of elements I can use without error is 30684. For higher number the error occurs. However, using a complete other set of elements the error can appear at completely different element numbers.

Attachments (8)

CMakeLists.txt (301 bytes ) - added by michael.moessner@… 6 years ago.
CMake file to compile the code
test_tree.cpp (2.0 KB ) - added by michael.moessner@… 6 years ago.
Example code where error occurs
valgrindout.txt (131.5 KB ) - added by michael.moessner@… 6 years ago.
Valgrind output
testfile.dat.tar.gz.part1 (195.3 KB ) - added by michael.moessner@… 6 years ago.
First part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)
testfile.dat.tar.gz.part2 (157.0 KB ) - added by michael.moessner@… 6 years ago.
Second part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)
valgrind-error.txt (153.2 KB ) - added by michael.moessner@… 6 years ago.
valgrind output on other system and with track-origins active
gdb-backtrace (9.7 KB ) - added by michael.moessner@… 6 years ago.
gdb backtrace on new system
gdb-walkthrough (50.3 KB ) - added by michael.moessner@… 6 years ago.
Insights into some variable given by gdb on new system

Download all attachments as: .zip

Change History (16)

by michael.moessner@…, 6 years ago

Attachment: CMakeLists.txt added

CMake file to compile the code

by michael.moessner@…, 6 years ago

Attachment: test_tree.cpp added

Example code where error occurs

by michael.moessner@…, 6 years ago

Attachment: valgrindout.txt added

Valgrind output

by michael.moessner@…, 6 years ago

Attachment: testfile.dat.tar.gz.part1 added

First part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)

by michael.moessner@…, 6 years ago

Attachment: testfile.dat.tar.gz.part2 added

Second part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)

comment:1 by Barend Gehrels, 6 years ago

Owner: changed from Barend Gehrels to awulkiew

comment:2 by awulkiew, 6 years ago

Thanks for the extensive report!

I tested code/data with with gcc-4.8.2-19ubuntu1 (Ubuntu 14.04) and don't get the same result as you've got. I get an assertion failure regarding the validity of the boxes (when min > max) passed into the rtree however after compiling the code with -DNDEBUG (disabling assertions) I don't get any segfault (though the rtree probably contains garbage), valgrind (--tool=memcheck) also doesn't complain. Both with -O0 and -O3. I also tried all possible numbers of boxes between 30685 and 40000 (max) in a loop with -O3. And the same with valgrind but this time with a step 100 between the numbers of boxes.

The funny thing is that I don't get the assertion failure with gcc-4.8.5-2ubuntu1 (Linux Mint). It seems like the data is being serialized differently somehow. To be sure I put a check into the loop filling bounding_boxes vector:

    if(! bgi::detail::is_valid(box))
    {
        std::cout << bg::dsv(box) << std::endl;
        invalid_count++;
    }

and detected a lot of invalid boxes.

Could you check if that's happening on your machine as well?

Have you tried to run the program on a different machine?

EDIT: I managed to test the data with gcc-4.8.2 by getting rid of Serialization by generating textfile containing WKTs of boxes on a different machine and then loading them from textfile instead of binaryfile. The program works without segaults.

Last edited 6 years ago by awulkiew (previous) (diff)

comment:3 by michael.moessner@…, 6 years ago

Thank you for the detailed and quick response.

First of all, sorry for the trouble regarding the serialization. Due to the file size limitation this appeared to be the simplest solution and I hoped it would be portable enough.

I compiled gcc4.8.2 on another computer and was able to reproduce the error. Unfortunately, the underlying operating systems are similar (SUSE Linux Enterprise Desktop 11). The system setups basically differ in hardware and that on one system I used the preconfigured gcc4.8.2 and on the other system I compiled it myself.

I now also tried gcc4.8.1 and 5.2.0 on the new system which worked fine both.

Adding the valid-box check always passes without any complain.

I wonder if I should just switch my default compiler and assume that the error is due to a erroneous system setting?

For completeness I will attach the gdb backtrace, valgrind output with track-origins active, and a gdb walkthrough showing some variable values. Apparently one can see some "nan"s here (the input should be ok nonetheless since using "insert" instead of packing works fine). Note, that the outputs are different than in the files before since I created the on the other system.

by michael.moessner@…, 6 years ago

Attachment: valgrind-error.txt added

valgrind output on other system and with track-origins active

by michael.moessner@…, 6 years ago

Attachment: gdb-backtrace added

gdb backtrace on new system

by michael.moessner@…, 6 years ago

Attachment: gdb-walkthrough added

Insights into some variable given by gdb on new system

comment:4 by awulkiew, 6 years ago

AFAIU these NaNs are inside not yet initialized m_box inside expandable_box:

https://svn.boost.org/trac/boost/attachment/ticket/12861/gdb-walkthrough#L106

created on stack:

https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp#L273

per_level function is called recursively. Is it possible that the stack size is too small? I'm also concerned about the <error reading variable>:

https://svn.boost.org/trac/boost/attachment/ticket/12861/gdb-walkthrough#L105

Is it possible that this is caused by optimization enabled? I can see that there are some <optimized out> in the GDB output. If that's the case could you generate similar file with optimization disabled?

Or try to debug it in file pack_create.hpp around line 273, e.g. see if it always fails for the same data and then try to catch it. Maybe this compiler has something against this expression:

translator(*(first->second))

where first->second should be an iterator of Range you passed into the rtree ctor (it takes Range const&, so the iterator type should be std::vector<rtree_elem>::const_iterator), and translator's operator() should take const reference to Value type (std::pair<boost_box, int> const&) and return const reference to boost_box const&.

If you don't want to play with it, then I'll find some time in the near future to install SUSE ED 11 SP4 and gcc-4.8.2, and maybe I'll be able to reproduce it.

comment:5 by anonymous, 6 years ago

I tried setting the stack size to unlimited but the behaviour stays the same. I guess this would also be compiler independent?

It is not clear to me why there are <optimized out> variables since I compiled with -O0 and -ggdb. And I was not able to figure out yet where this comes from.

I tried investigating the error further and was able to track it down to the code section

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Tp, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
			  _RandomAccessIterator __last,
			  const _Tp& __pivot, _Compare __comp)
    {
      while (true)
	{
	  while (__comp(*__first, __pivot))
	    ++__first;
	  --__last;
	  while (__comp(__pivot, *__last))
	    --__last;
	  if (!(__first < __last))
	    return __first;
	  std::iter_swap(__first, __last);
	  ++__first;
	}
    }

in std_algo.h. The Loop

while (__comp(*__first, __pivot))
	    ++__first;

goes on and on until __first becomes invalid. (It seems to go on for an actually large number over 6000 elements).

If there are further investigations I can do, please let me know.

comment:6 by awulkiew, 6 years ago

I was able to reproduce it. It is a bug in libstdc++ in std::nth_element() function, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800

For your data __unguarded_partition goes out of bounds for a case similar to this (values below are z coordinates of centroids of some boxes):

std::vector<float> vec = {3.94167948, 2.87522459, 4.17142391, 3.00844622};
std::nth_element(vec.begin(), vec.begin() + 2, vec.end());

It seems that std::sort doesn't have this problem so it could be called instead for specific version of gcc/libstdc++. So instead of a SEGFAULT the rtree would be slower. If I figured out how reliable is the timestamp defined by __GLIBCXX__ in this case I could conditionally call std::sort. At the GCC bug page they wrote that several GCC versions were affected (Fixed for 4.7.4/4.8.3/4.9.0.) and the timestamp may be set to different value for different compilers, so IDK. At the moment I could enable it for this specific version I've used, __GLIBCXX__ is 20131016. Is it the same for you? Do you have any suggestions?

As a workaround you can either update gcc/libstdc++ or alter your local copy of Boost by replacing std::nth_element calls with std::sort calls. Just use grep to find them in boost/geometry. There are some calls in the rstar balancing algorithm and one call in packing algorithm, here: https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp#L70 which could be replaced with:

std::sort(first, last, point_entries_comparer<I>());

EDIT: You mentioned earlier that it works fine with gcc 4.8.1. Are you sure about it? According to this bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59116 this version is also affected. And the timestamp for 4.8.1 is later than for 4.7.3 (https://gcc.gnu.org/develop.html#timeline) which AFAIU is affected as well.

EDIT2: I checked the code of several versions of libstdc++ (shipped with gcc 4.7.2-4, gcc 4.8.0-3 and gcc 4.9.0-1) and it seems that the only version containing the bug is gcc 4.8.2.

Last edited 6 years ago by awulkiew (previous) (diff)

comment:7 by awulkiew, 6 years ago

Milestone: To Be DeterminedBoost 1.64.0
Resolution: fixed
Status: newclosed

comment:8 by anonymous, 6 years ago

Thank you very much for investigating the error so quickly and tracking down the actual cause. I'm really glad that I can use the rtree now without any concerns of failing because of this unpredicted behaviour.

You guessed right about my specific GCC version to be 20131016.

For my part I will just switch to another gcc to solve this problem.

Note: See TracTickets for help on using tickets.