#4978 closed Bugs (fixed)
Deadlock on interrupt() all threads if they are in wait()
Reported by: | Owned by: | Anthony Williams | |
---|---|---|---|
Milestone: | To Be Determined | Component: | thread |
Version: | Boost 1.46.0 | Severity: | Showstopper |
Keywords: | Cc: |
Description
After the change in changeset 66228 a deadlock can happen, if many threads are in waiting state and then all threads will be interrupted. I just saw the problem on a 4-core 64-bit machine (gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3). On an older 2-core machine (gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-11)) this newer occurred. It only happened with more than 5 threads, in average in every 2nd or 3th run.
In condition_variable::wait() the following two mutex get locked:
- in interrupt_checker c-tor internal_mutex, there also thread_info::cond_mutex = internal_mutex get set
- in this_thread::interrupt_point thread_info->data_mutex to protect thread_info::interrupt_requested
In thread::interrupt() the same two mutex get locked in reversed order:
- local_thread_info->data_mutex
- boost::pthread::pthread_mutex_scoped_lock internal_lock(local_thread_info->cond_mutex);
I attached my test code.
Attachments (2)
Change History (12)
by , 12 years ago
Attachment: | thread.cpp added |
---|
follow-up: 3 comment:1 by , 12 years ago
Version: | Boost 1.45.0 → Boost 1.46.0 |
---|
I am seeing this bug as well. The bug is still present in 1.46. To work around it, I have reordered the locks inside thread::interrupt(), so they are acquired in the same order as in condition_variable::wait() I am not submitting a patch because this does not seem like a clean solution since it involves acquiring the cond_mutex even if local_thread_info->current_cond is false.
comment:3 by , 12 years ago
Replying to moraleda@…:
To work around it, I have reordered the locks inside thread::interrupt(), so they are acquired in the same order as in condition_variable::wait()
My proposed work around does not actually work. The reason is that cond_mutex is mutable and could be changed by another thread if data_mutex is not locked. Thus changing the order of the lock could (and does) result in a segmentation fault. I don't see a trivial fix, so I am reverting to an earlier version of boost thread until this problem is fixed.
follow-up: 5 comment:4 by , 12 years ago
I have attached an experimental patch. It solves the problem by reducing the scope of the interruption_checker in the wait and timed_wait functions of the condition_variable(_any).
comment:5 by , 12 years ago
Replying to kosse@…:
I have attached an experimental patch. It solves the problem by reducing the scope of the interruption_checker in the wait and timed_wait functions of the condition_variable(_any).
I ran into this issue as well. Your supplied patch fixed the problem - thanks!
comment:6 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thank you for the patch. This should indeed fix the problem --- the interruption_point() call needs to happen after the interruption_checker destructor.
Patch committed to trunk, revision 69547
comment:8 by , 11 years ago
We just encountered this issue, for anyone else coming across this ticket - the deadlock is still present in 1.46.1 but fixed in 1.47.
comment:9 by , 11 years ago
Found this in 1.43 too. Sadly the patch is on a version that is very different from 1.43, going to have to figure out when we can upgrade to 1.47.
comment:10 by , 9 years ago
Same bug still present ( I assume that GCC headers comes from boost, so this one is related to "#include <condition_variable>" when compiling with -std=c++11 flag).. It causing me deadlock in a thread pool..
If anyone still have the same problem a temporary fix is to use "condition_variable_any" instead. the "any" version seems to work (I still see no reason to maintain a specialized version just for unique_lock... over engineering?)
I'd liked to post the link the the discussion on Github, but spam filters blocked it u.u
Example code