Opened 13 years ago

Last modified 7 years ago

#3926 reopened Bugs

thread_specific_ptr + dlopen library causes a SIGSEGV.

Reported by: pluto@… Owned by: viboes
Milestone: Component: thread
Version: Boost 1.42.0 Severity: Problem
Keywords: Cc:

Description

hi,

i've discovered that using thread_specific_ptr in shared library dlopened from thread causes a gpf.
please consider following scenario:

spawn thread
dlopen a shared library with thread_specific_ptr
dlclose the library
terminate the thread

observe the gpf

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff62b7400 in ?? ()
(gdb) bt
#0  0x00007ffff62b7400 in ?? ()
#1  0x00007ffff7221f79 in __nptl_deallocate_tsd () at pthread_create.c:154
#2  0x00007ffff722291b in start_thread (arg=<value optimized out>) at pthread_create.c:304
#3  0x00007ffff6f9293d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

afaics the pthread (user provided) destructor keyed to pthread specific data is called on dlunloaded code.

BR,
Pawel.

Attachments (3)

tls-test.boost.tgz (1.7 KB ) - added by anonymous 13 years ago.
boost-tls.patch (1.3 KB ) - added by pluto@… 12 years ago.
proposed patch for proper cleanup of the static libboost_thread linked into shared objects.
boost_thread.patch (1.7 KB ) - added by dadrus 11 years ago.
patch for linux and mac

Download all attachments as: .zip

Change History (29)

by anonymous, 13 years ago

Attachment: tls-test.boost.tgz added

comment:1 by Anthony Williams, 13 years ago

Resolution: wontfix
Status: newclosed

When you dlclose a library, you must ensure that no code references that library afterwards. If you have a live thread_specific_ptr with a destructor that belongs to that library then the library must be loaded when the destructor is called. You must therefore ensure that the thread_specific_ptr does not have a value on any thread when the library is unloaded.

in reply to:  1 comment:2 by pluto@…, 12 years ago

Resolution: wontfix
Status: closedreopened

Replying to anthonyw:

When you dlclose a library, you must ensure that no code references that library afterwards. If you have a live thread_specific_ptr with a destructor that belongs to that library then the library must be loaded when the destructor is called. You must therefore ensure that the thread_specific_ptr does not have a value on any thread when the library is unloaded.

the thread_specific_ptr dtor called from dlclose() calls set_tss_data() with boost default cleanup function and *NULL* new_value. what's the point of deleting NULL pointer in cleanup function? this accidentally works in shared libs enviroment beacause the cleanup function exists in memory and 'delete 0' is valid in c++. with dynamically loaded libs glibc tries to call useless cleanup callback from dl-unloaded code and ends with gpf. solution: the thread_specific_ptr dtor should delete current_value and call set_tss_data() in the same way as the release() does.

comment:3 by pluto@…, 12 years ago

currently there's a crash in boost.thread core. please follow this scenario:

start thread.
  dlopen lib.
    run function from lib which uses thread_specific_ptr.
      boost::detail::create_epoch_tss_key registers delete_epoch_tss_data destructor.
    exit from lib function.
  dlclose lib (unload delete_epoch_tss_data code).
exit thread
  __nptl_deallocate_tsd tries to call unloaded key destructor and crashes.

i think that libs/thread/src/pthread/once.cpp should contain alternative cleanup function with attributte(( destructor )):

if ( pthread_getspecific( epoch_tss_key ) )
  pthread_key_delete( epoch_tss_key )

by pluto@…, 12 years ago

Attachment: boost-tls.patch added

proposed patch for proper cleanup of the static libboost_thread linked into shared objects.

comment:4 by dadrus, 11 years ago

The patch provided above is not applicable on apple. At least on Mac OSX 10.5 PTHREAD_ONCE_INIT is defined as

#define PTHREAD_ONCE_INIT {_PTHREAD_COND_SIG_init, {0}}

and not as

#define PTHREAD_ONCE_INIT 0

like on Linux, so one can not use simple compare operator. Here comes the patch, which works on both, linux and mac. I'm still using 1.38 version of boost. So one has to adjust it for newer versions.

by dadrus, 11 years ago

Attachment: boost_thread.patch added

patch for linux and mac

comment:5 by dadrus, 11 years ago

There is a small typo in my last comment. On Mac PTHREAD_ONCE_INIT is defined as follows:

#define PTHREAD_ONCE_INIT {_PTHREAD_ONCE_SIG_init, {0}}

I would like also to ask you Anthony to tackle this ticket soon. Hopefully for the next boost release.

Regards

Dimitrij

comment:6 by tkramer@…, 11 years ago

I ran into this recently as well. Any updates on getting this patch committed?

comment:7 by viboes, 11 years ago

Milestone: Boost 1.43.0To Be Determined
Owner: changed from Anthony Williams to viboes
Status: reopenednew
Type: BugsPatches

comment:8 by viboes, 11 years ago

I don't know nothing about dl_open/dl_close. Please, could you explain me when and why delete_epoch_tss_key_on_dlclose will be called?

comment:9 by viboes, 11 years ago

Oh, I see now it is the gcc attributte(( destructor )) that make it be called after main().

Shouldn't this function be added conditionally (when this attribute is available)?

What about compilers that don't support this attribute? How the the key will be deleted?

comment:10 by viboes, 11 years ago

Status: newassigned

comment:11 by viboes, 11 years ago

See also #4639 boost thread library leaks pthread_key

Last edited 9 years ago by viboes (previous) (diff)

in reply to:  11 ; comment:12 by pluto@…, 11 years ago

Replying to viboes:

See also #4639 boost thread library leaks pthread_key

yes, the #4639 describes the same problem but there's one major issue with both patches. the desctruction order of global objects is not specified in general, especially in these days when recent gnu toolchain can sort .init/.fini-array sections.

e.g. the ~delete_epoch_tss_key_on_dlclose_t() from #4639 may release pthread_key and after this the ~thread_specific_ptr() may accuire key again, set null handler and finally leak/gpf as usual.

Last edited 9 years ago by viboes (previous) (diff)

comment:13 by viboes, 11 years ago

See also #2470 Problem with threads started in a dynamically loaded bundle under Mac OS X

in reply to:  12 comment:14 by viboes, 9 years ago

Replying to pluto@…:

Replying to viboes:

See also #4639 boost thread library leaks pthread_key

yes, the #4639 describes the same problem but there's one major issue with both patches. the desctruction order of global objects is not specified in general, especially in these days when recent gnu toolchain can sort .init/.fini-array sections.

e.g. the ~delete_epoch_tss_key_on_dlclose_t() from #4639 may release pthread_key and after this the ~thread_specific_ptr() may accuire key again, set null handler and finally leak/gpf as usual.

comment:15 by kai.dietrich@…, 9 years ago

Hi all,

I also got bitten by that one. Our application compiles boost statically into a shared library (.so/linux). Users are supposed to dl_open/dl_close (potentially often). Each open/close cycle leads to a key/handle leak due to boost not freeing the handle and not giving any control over the tls handles to the user.

At the moment the tls keys in thread.cpp and once.cpp are internal and not accessible from the outside. Would it be possible to give out a way to the user. The windows implementation also has some pretty verbose tls cleanup code.

This bug can be a real production showstopper. I'd give it a way higher priority.

in reply to:  15 comment:16 by viboes, 9 years ago

Replying to kai.dietrich@…:

Hi all,

I also got bitten by that one. Our application compiles boost statically into a shared library (.so/linux). Users are supposed to dl_open/dl_close (potentially often). Each open/close cycle leads to a key/handle leak due to boost not freeing the handle and not giving any control over the tls handles to the user.

At the moment the tls keys in thread.cpp and once.cpp are internal and not accessible from the outside. Would it be possible to give out a way to the user. The windows implementation also has some pretty verbose tls cleanup code.

This bug can be a real production showstopper. I'd give it a way higher priority.

I know and any help in this direction is welcome.

comment:17 by anonymous, 9 years ago

As said, the win32 implementation has this:

  namespace boost {
    namespace detail {

      BOOST_THREAD_DECL void __cdecl on_process_enter()
      {}

      BOOST_THREAD_DECL void __cdecl on_thread_enter()
      {}

      BOOST_THREAD_DECL void __cdecl on_process_exit()
      {
        boost::cleanup_tls_key();
      }

      BOOST_THREAD_DECL void __cdecl on_thread_exit()
      {
        ...
      }

    }
  }

These functions are then somehow added into the PE image destructor sections (tss_dll.cpp and tss_pe.cpp). This was also buggy in some boost versions and messed up MFC cleanup (which also a tls handle leak) so we just disabled that and call the on_process_exit() function manually.

comment:18 by viboes, 7 years ago

It seems that #11302 contains a patch that solves the issue.

Please, could you check it?

in reply to:  18 comment:20 by pawels@…, 7 years ago

Replying to viboes:

It seems that #11302 contains a patch that solves the issue.

Please, could you check it?

i've reported this testcase many years ago :-) it works with recent patch. please commit and close both tickets.

comment:22 by viboes, 7 years ago

So, I can define by default BOOST_THREAD_PATCH.

comment:23 by viboes, 7 years ago

Milestone: To Be DeterminedBoost 1.60.0

comment:24 by viboes, 7 years ago

Type: PatchesBugs

comment:26 by viboes, 7 years ago

Milestone: Boost 1.60.0
Resolution: fixed
Status: closedreopened

This last change has been rolled back as it is introduce a regression in #12049

https://github.com/boostorg/thread/commit/47357de276fe4fc01469f34c1dbf8b26fdbc1c4b

Please, add BOOST_THREAD_PATCH if you need it absolutely.

Note: See TracTickets for help on using tickets.