Opened 12 years ago

Closed 12 years ago

Last modified 9 years ago

#4978 closed Bugs (fixed)

Deadlock on interrupt() all threads if they are in wait()

Reported by: d.schneider@… Owned by: Anthony Williams
Milestone: To Be Determined Component: thread
Version: Boost 1.46.0 Severity: Showstopper
Keywords: Cc:

Description

After the change in changeset 66228 a deadlock can happen, if many threads are in waiting state and then all threads will be interrupted. I just saw the problem on a 4-core 64-bit machine (gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3). On an older 2-core machine (gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-11)) this newer occurred. It only happened with more than 5 threads, in average in every 2nd or 3th run.

In condition_variable::wait() the following two mutex get locked:

  1. in interrupt_checker c-tor internal_mutex, there also thread_info::cond_mutex = internal_mutex get set
  2. in this_thread::interrupt_point thread_info->data_mutex to protect thread_info::interrupt_requested

In thread::interrupt() the same two mutex get locked in reversed order:

  1. local_thread_info->data_mutex
  2. boost::pthread::pthread_mutex_scoped_lock internal_lock(local_thread_info->cond_mutex);

I attached my test code.

Attachments (2)

thread.cpp (1.8 KB ) - added by d.schneider@… 12 years ago.
Example code
deadlock.patch (3.0 KB ) - added by kosse@… 12 years ago.
Experimental fix

Download all attachments as: .zip

Change History (12)

by d.schneider@…, 12 years ago

Attachment: thread.cpp added

Example code

comment:1 by moraleda@…, 12 years ago

Version: Boost 1.45.0Boost 1.46.0

I am seeing this bug as well. The bug is still present in 1.46. To work around it, I have reordered the locks inside thread::interrupt(), so they are acquired in the same order as in condition_variable::wait() I am not submitting a patch because this does not seem like a clean solution since it involves acquiring the cond_mutex even if local_thread_info->current_cond is false.

comment:2 by himmes@…, 12 years ago

We see that problem under MacosX with 1.45.0 as well.

in reply to:  1 comment:3 by moraleda@…, 12 years ago

Replying to moraleda@…:

To work around it, I have reordered the locks inside thread::interrupt(), so they are acquired in the same order as in condition_variable::wait()

My proposed work around does not actually work. The reason is that cond_mutex is mutable and could be changed by another thread if data_mutex is not locked. Thus changing the order of the lock could (and does) result in a segmentation fault. I don't see a trivial fix, so I am reverting to an earlier version of boost thread until this problem is fixed.

by kosse@…, 12 years ago

Attachment: deadlock.patch added

Experimental fix

comment:4 by kosse@…, 12 years ago

I have attached an experimental patch. It solves the problem by reducing the scope of the interruption_checker in the wait and timed_wait functions of the condition_variable(_any).

in reply to:  4 comment:5 by jochen.seidel@…, 12 years ago

Replying to kosse@…:

I have attached an experimental patch. It solves the problem by reducing the scope of the interruption_checker in the wait and timed_wait functions of the condition_variable(_any).

I ran into this issue as well. Your supplied patch fixed the problem - thanks!

comment:6 by Anthony Williams, 12 years ago

Resolution: fixed
Status: newclosed

Thank you for the patch. This should indeed fix the problem --- the interruption_point() call needs to happen after the interruption_checker destructor.

Patch committed to trunk, revision 69547

comment:7 by anonymous, 12 years ago

Will this bug be fixed in 1.46.1 or do we have to wait until 1.47?

comment:8 by adam, 11 years ago

We just encountered this issue, for anyone else coming across this ticket - the deadlock is still present in 1.46.1 but fixed in 1.47.

comment:9 by anonymous, 11 years ago

Found this in 1.43 too. Sadly the patch is on a version that is very different from 1.43, going to have to figure out when we can upgrade to 1.47.

comment:10 by anonymous, 9 years ago

Same bug still present ( I assume that GCC headers comes from boost, so this one is related to "#include <condition_variable>" when compiling with -std=c++11 flag).. It causing me deadlock in a thread pool..

If anyone still have the same problem a temporary fix is to use "condition_variable_any" instead. the "any" version seems to work (I still see no reason to maintain a specialized version just for unique_lock... over engineering?)

I'd liked to post the link the the discussion on Github, but spam filters blocked it u.u

Note: See TracTickets for help on using tickets.