Opened 14 years ago

Closed 14 years ago

Last modified 13 years ago

#2217 closed Bugs (fixed)

serialization 1.36.0 extended_type_info exit issue(s)

Reported by: timothysc@… Owned by: Robert Ramey
Milestone: Boost 1.37.0 Component: serialization
Version: Boost 1.36.0 Severity: Problem
Keywords: extended_type_info Cc:

Description

I'm running some serialization unit-tests which pass with flying colors under 1.35.0 and earlier, however they seg-fault on exit when run through 1.36.0.

The error appears to deal with destruction of extended_type_info_typeid objects, and *only* manifests itself when I load an archive through a shared_ptr<T>. The load will appear to function properly, then on exit of the application, <<Death>>. This occurs even if I remove BOOST_CLASS_EXPORT && BOOST_CLASS_TYPE_INFO as the shared_ptr is to the derived element, and I'm not testing through a base.

snippet: BOOST_AUTO_TEST_CASE( MyTestClass_Serialization ) {

MyTestClass is an uses intrusive serialization (You could use any object)

boost::shared_ptr<MyTestClass> outtie ( new MyTestClass() ); boost::shared_ptr<MyTestClass> innie;

std::ofstream os( "MyTestClass_Serialization.xml", std::ios_base::out ); boost::archive::xml_oarchive xmlao(os); xmlao << boost::serialization::make_nvp( "testKey", outtie ); os.close();

std::ifstream is( "MyTestClass_Serialization.xml", std::ios_base::in ); boost::archive::xml_iarchive xmlai( is ); xmlai >>boost::serialization::make_nvp( "testKey", innie ); is.close();

BOOST_CHECK_NO_THROW();

}

Ran on Ubuntu(Linux) 8.04 using gcc 4.2.3 gdb output: * No errors detected [New Thread 0xb73e26c0 (LWP 4715)]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb73e26c0 (LWP 4715)] 0xb76bb56b in std::_Rb_tree_rebalance_for_erase () from /usr/lib/libstdc++.so.6 (gdb) bt #0 0xb76bb56b in std::_Rb_tree_rebalance_for_erase () from /usr/lib/libstdc++.so.6 #1 0xb7a38221 in std::_Rb_tree<boost::serialization::detail::extended_type_info_typeid_0 const*, boost::serialization::detail::extended_type_info_typeid_0 const*, std::_Identity<boost::serialization::detail::extended_type_info_typeid_0 const*>, boost::serialization::detail::type_compare, std::allocator<boost::serialization::detail::extended_type_info_typeid_0 const*> >::erase (this=0xb7a93c94, position={_M_node = 0x810a3f8}) at /usr/include/c++/4.2/bits/stl_tree.h:1261 #2 0xb7a38270 in std::multiset<boost::serialization::detail::extended_type_info_typeid_0 const*, boost::serialization::detail::type_compare, std::allocator<boost::serialization::detail::extended_type_info_typeid_0 const*> >::erase (this=0xb7a93c94, position={_M_node = 0x810a3f8}) at /usr/include/c++/4.2/bits/stl_multiset.h:346 #3 0xb7a37482 in boost::serialization::detail::extended_type_info_typeid_0::type_unregister (this=0x810749c) at /opt/tomodev/env/boost/libs/serialization/src/extended_type_info_typeid.cpp:93 #4 0x080b001a in ~extended_type_info_typeid (this=0x810749c) at /opt/tomodev/env/boost/boost/serialization/extended_type_info_typeid.hpp:80 #5 0x0809ebe2 in tcf_49 () at /opt/tomodev/env/boost/boost/serialization/singleton.hpp:104 #6 0xb74f7084 in exit () from /lib/tls/i686/cmov/libc.so.6 #7 0xb74df458 in libc_start_main () from /lib/tls/i686/cmov/libc.so.6 #8 0x0809e691 in _start ()

Attachments (2)

extended_type_info_files.patch (1.0 KB ) - added by Brandon Kohn 14 years ago.
Patches for extended_type_info_typeid.cpp and extended_type_info.cpp
extended_type_info_files2.patch (1.0 KB ) - added by Ryan Mulder <rjmyst3@…> 14 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 by benjamin.sternlieb@…, 14 years ago

We also have the same issue. SEGV associated with

#0 std::_Rb_tree_rebalance_for_erase (z=0x93834c8, header=@0x71fa08) at ../../.././libstdc++-v3/src/tree.cc:337 #1 0x006ec048 in boost::serialization::detail::extended_type_info_typeid_0::type_unregister ()

from /sbcimp/dyn/data/superqa/ultrabond/third_party/4.1.2/boost_1_36/lib/libboost_serialization-gcc41-mt-1_36.so.1.36.0

#2 0x0828d66e in ~extended_type_info_typeid (this=0x84490c0)

at /sbcimp/dyn/data/superqa/ultrabond/third_party/4.1.2/boost_1_36/include/boost-1_36/boost/serialization/extended_type_info_typeid.hpp:80

We're rolling back to 1.34.1

comment:2 by Ryan Mulder <rjmyst3@…>, 14 years ago

With MinGW 4.2.1-dw2, in a debug build, there is an assertion failure at close:

File: libs\serialization\src\extended_type_info.cpp Line: 47

Expression: NULL != l

in reply to:  2 comment:3 by Brandon Kohn, 14 years ago

Replying to Ryan Mulder <rjmyst3@gmail.com>:

With MinGW 4.2.1-dw2, in a debug build, there is an assertion failure at close:

File: libs\serialization\src\extended_type_info.cpp Line: 47

Expression: NULL != l

I think I have found and fixed an issue with the type_unregister and key_unregister functions. The problem seems to be with the removal of items from the multiset containing them. The items are inserted into the multiset via ptr and sorted on fields m_key or m_ti respectively. When the items are deleted from the multiset, a lower/upper bound search is performed to find the range of keys which contain the specified value (m_key, m_ti) and then subsequently the iterators on that range are subject to:

detail::ktmap::iterator start = x.lower_bound(this); detail::ktmap::iterator end = x.upper_bound(this); assert(start != end);

remove entry in map which corresponds to this type do{

if(this == *start){

x.erase(start); break;

}

}while(++start != end);

m_key = NULL;

Note that it breaks after removing the first item. It then sets the key to NULL. This means that on subsequent compares inside the set, if there are other instances of this key (which are held via ptr), the value of m_key is NULL. Inside the key_compare there are assert statements to check that m_key isn't NULL... after that all hell breaks lose :D.

The same pattern is in extended_type_info_typeid.cpp. I fixed my local copy by changing the loop to use:

do{

if(this == *start){

start = x.erase(start);

} else {

++start;

}

}while(start != end);

So all copies of an item with the same m_key or m_ti value are removed. After that everything seemed to run fine.

by Brandon Kohn, 14 years ago

Patches for extended_type_info_typeid.cpp and extended_type_info.cpp

comment:4 by Ryan Mulder <rjmyst3@…>, 14 years ago

Slight bug in your patch which prevents it from compiling with MinGW 4.2.1: multiset::erase returns void.

changing

start = x.erase(start); 

to

x.erase(start++); 

fixes it.

by Ryan Mulder <rjmyst3@…>, 14 years ago

comment:5 by matthew.downey@…, 14 years ago

I am getting a similar problem: a SIGSEGV is thrown on exit. The supplied patch did not fix the problem.

MSVC8 call stack:

msvcr80d.dll!_free_base(void * pBlock=0x007938e0)  Line 109 + 0x13 bytes	C
commonTest.exe!boost::detail::allocator_impl<20,4>::dealloc(void * pv=0x002c8870)  Line 149	C++
commonTest.exe!boost::detail::sp_counted_impl_pd<boost::thread_specific_ptr<Common::ThreadSpecificIoServicePosts>::run_custom_cleanup_function *,boost::detail::do_heap_delete<boost::thread_specific_ptr<Common::ThreadSpecificIoServicePosts>::run_custom_cleanup_function> >::operator delete(void * p=0x002c8870)  Line 175 + 0x9 bytes	C++
commonTest.exe!boost::detail::sp_counted_impl_pd<boost::thread_specific_ptr<Common::ThreadSpecificIoServicePosts>::run_custom_cleanup_function *,boost::detail::do_heap_delete<boost::thread_specific_ptr<Common::ThreadSpecificIoServicePosts>::run_custom_cleanup_function> >::`scalar deleting destructor'()  + 0x27 bytes	C++
commonTest.exe!boost::detail::sp_counted_base::destroy()  Line 66 + 0x22 bytes	C++
commonTest.exe!boost::detail::sp_counted_base::weak_release()  Line 116 + 0xf bytes	C++
commonTest.exe!boost::detail::sp_counted_base::release()  Line 105	C++
commonTest.exe!boost::detail::shared_count::~shared_count()  Line 220	C++
commonTest.exe!boost::shared_ptr<boost::detail::tss_cleanup_function>::~shared_ptr<boost::detail::tss_cleanup_function>()  + 0x19 bytes	C++
commonTest.exe!boost::detail::tss_data_node::~tss_data_node()  + 0x12 bytes	C++
commonTest.exe!boost::detail::tss_data_node::`scalar deleting destructor'()  + 0xf bytes	C++
commonTest.exe!boost::detail::heap_delete<boost::detail::tss_data_node>(boost::detail::tss_data_node * data=0x00054ad8)  Line 380	C++
commonTest.exe!boost::`anonymous namespace'::run_thread_exit_callbacks()  Line 140 + 0x9 bytes	C++
commonTest.exe!on_thread_exit()  Line 580	C++
msvcr80d.dll!doexit(int code=201, int quick=0, int retcaller=0)  Line 553	C
msvcr80d.dll!exit(int code=201)  Line 398 + 0xd bytes	C
commonTest.exe!__tmainCRTStartup()  Line 610	C
commonTest.exe!mainCRTStartup()  Line 414	C

Is there any more information I can supply to help debugging this problem?

comment:6 by Robert Ramey, 14 years ago

Resolution: fixed
Status: newclosed

I believe I have addressed a coding error in the unregistering of types as DLLs are or main programs are unloaded. I've checked the fix in the trunk. However, none of the test platforms have manifested this error, so it will by up to you test this when it migrates to the release branch.

Robert Ramey

comment:7 by Ross Lippert <ross.lippert@…>, 14 years ago

While I believe that the iterator invalidation issues were important for this problem, this issue should be reopened, because the story does not end there.

I believe that the 1.36 implementation of extended_type_info_typeid suffers from a static destructor fiasco. In particular, the tkmap singleton which is local to the extended_type_info_typeid.cpp file, functioning as a registry of type info, can be accessed by typeid destructors after the tkmap has been destructed.

I have not been able to produce a small example, and I doubt that any example I could produce would behave likewise across gcc versions. However, I did instrument my code well enough to make a solid case.

It began with

*** glibc detected *** free(): invalid pointer: 0x00002b3004540dc0 ***

at exit.

Running the program in valgrind produced, as the first error,

==20744== Invalid read of size 8
==20744==    at 0x5DC69BC: std::_Rb_tree<boost::serialization::detail::extended_
type_info_typeid_0 const*, boost::serialization::detail::extended_type_info_type
id_0 const*, std::_Identity<boost::serialization::detail::extended_type_info_typ
eid_0 const*>, boost::serialization::detail::type_compare, std::allocator<boost:
:serialization::detail::extended_type_info_typeid_0 const*> >::lower_bound(boost
::serialization::detail::extended_type_info_typeid_0 const* const&) (stl_tree.h:
1371)
==20744==    by 0x5DC673A: boost::serialization::detail::extended_type_info_typeid_0::type_unregister() (stl_multiset.h:451)
==20744==    by 0x40DA49: __tcf_12 (extended_type_info_typeid.hpp:85)
==20744==    by 0x3759830CD4: exit (in /lib64/tls/libc-2.3.4.so)
==20744==    by 0x375981C4C1: (below main) (in /lib64/tls/libc-2.3.4.so)
==20744==  Address 0x6EE09B0 is 32 bytes inside a block of size 40 free'd
==20744==    at 0x4905D7A: operator delete(void*) (vg_replace_malloc.c:244)
==20744==    by 0x5DC6D37: std::_Rb_tree<boost::serialization::detail::extended_type_info_typeid_0 const*, boost::serialization::detail::extended_type_info_typeid_0 const*,std::_Identity<boost::serialization::detail::extended_type_info_typeid_0const*>, oost::serialization::detail::type_compare, std::allocator<boost::serialization::detail::extended_type_info_typeid_0 const*>>::_M_erase(std::_Rb_tree_node<boost::serialization::detail::extended_type_info_typeid_0  const*>*) (new_allocator.h:94)
==20744==    by 0x5DC63E8: __tcf_0 (stl_tree.h:578)
==20744==    by 0x3759830FBA: __cxa_finalize (in /lib64/tls/libc-2.3.4.so)
==20744==    by 0x5DC6242: (within /d/en/lippertr-0/p4/desmond/desmond2_lippertr/base/objs/Linux/x86_64/Release/lib/libdesmond_util.so)
==20744==    by 0x5E46AE0: (within /d/en/lippertr-0/p4/desmond/desmond2_lippertr/base/objs/Linux/x86_64/Release/lib/libdesmond_util.so)
==20744==    by 0x3759830CD4: exit (in /lib64/tls/libc-2.3.4.so)
==20744==    by 0x375981C4C1: (below main) (in /lib64/tls/libc-2.3.4.so)

As we see, the underlying rep of the tkmap is corrupt or invalid because lower_bound is failing (this version of 1.36 includes the patches already mentioned).

With a little bit of effort, I recompiled boost serialization to have a destructor for the tkmap which prints to the screen and put print messages in the extended_type_info_typeids.

in typeid destructor
unregistering m_key= 0x5f6ef20 N71_GLOBAL__N_desmond_src_util_timekeeper_timekeeper.cxx_00000000_435F3CA51_E
done unregistering.
done destructor.
tkmap is destroyed!
in typeid destructor
unregistering m_key= 0x517220 N3Ark3arkE
==20744== Invalid read of size 8
. . .

My subsequent prints show that the tkmap is being destructed in the middle of a series of extended_type_info_typeid destructions. This is the static destructor fiasco. The initializer fiasco is overcome by the usual function-scoped static trick, but destructor calls are not so easily serialized. When a destructor has to talk to an instance of another object, you have to be sure that object hasn't destructed yet.

To further test, I altered the tkmap implementation to set a live bit.

(in extended_type_info_typeid.cpp

struct tkmap : std::multiset<
    const extended_type_info_typeid_0 *,
    type_compare
> {
  static int live;
  tkmap() { live=1; }
  ~tkmap() { live=0; std::cerr << "tkmap is destroyed!" << std::endl; }
};
int tkmap::live=0;

and added

    if (!tkmap::live) return;

to the type_unregister method.

in typeid destructor
unregistering m_key= 0x5f6ef40 N71_GLOBAL__N_desmond_src_util_timekeeper_timekeeper.cxx_00000000_435F3CA51_E
done unregistering.
done destructor.
tkmap is destroyed!
in typeid destructor
unregistering m_key= 0x517220 N3Ark3arkE
done destructor.
in typeid destructor
unregistering m_key= 0x517ec0 N7Desmond6EngineE
done destructor.
==21051== 
==21051== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 9 from 4)

I believe that this makes for a substantial case that the addition of tkmap to boost 1.36 (not in 1.35) needs to be rethought.

comment:8 by Robert Ramey, 14 years ago

Thanks for looking into this.

This is a fix I orginally made - but I took it when the iterator mistake was pointed out to me. I had hoped that the order of distructor invocation would be the inverse of constructor invocation - but I guess that is not the case in all compilers. I'm surprised that this is so hard to reproduce. In any case, we'll add in this fix which should fix this case and does no harm anywhere else.

Thanks again.

Robert Ramey

comment:9 by eric.woodruff@…, 13 years ago

FYI - I needed to alter the tkmap implementation as suggested by Ross Lippert in 1.39, but it appears in 1.40 I'm not reproducing the problem.

Note: See TracTickets for help on using tickets.