Opened 14 years ago

Closed 14 years ago

#1951 closed Bugs (fixed)

boost::interprocess::named_condition::do_wait() releases mutex prematurely, may miss notification

Reported by: Stas Maximov <smaximov@…> Owned by: Ion Gaztañaga
Milestone: Boost 1.36.0 Component: interprocess
Version: Boost 1.35.0 Severity: Showstopper
Keywords: named_condition_variable interprocess sync mutex condition_variable Cc:

Description

Implementation of boost::interprocess::named_condition::do_wait() is shown below:

template <class Lock> void do_wait(Lock& lock) {

lock_inverter<Lock> inverted_lock(lock); unlock internal first to avoid deadlock with near simultaneous waits scoped_lock<lock_inverter<Lock> > external_unlock(inverted_lock); scoped_lock<interprocess_mutex> internal_lock(*this->mutex()); this->condition()->wait(internal_lock);

}

Lock lock is associated with the condition variable and must be released only inside wait(). This implementation releases the lock before it enters the wait(). If lock released before wait() is entered, this function may be pre-empted by a notifier thread, just after the lock has been released, but before wait() has been entered. In such case notification will be missed by the wait().

Having a mutex associated with condition variable is a fundamental requirement. Use of another mutex (internal_lock in this case), not known to the notifier is not a solution. Both the receiver and the notifier, both must use the same CV *and* MX.

This is like a root cause of hangs in named_condition_test on Fedora9 running on single-cpu system under vmware-server-1.0.5.

Attachments (1)

named_condition.hpp.patch (1.9 KB ) - added by Stas Maximov <smaximov@…> 14 years ago.

Download all attachments as: .zip

Change History (3)

by Stas Maximov <smaximov@…>, 14 years ago

Attachment: named_condition.hpp.patch added

comment:1 by Stas Maximov <smaximov@…>, 14 years ago

Update on the bug report.

Hang in regression test named_condition_test has been tracked down to a missed notification due to release of external mutex before internal mutex has been acquired.

Attached patch fixes the problem by acquiring internal mutex before releasing the external one. This eliminates potential race with a notifier and thus avoids missed notifications.

As before, release of internal mutex is done prior to re-acquisition of external mutex. This is to avoid deadlock with another waiter on the same condition variable. Exception safety has been preserved.

comment:2 by Ion Gaztañaga, 14 years ago

Resolution: fixed
Status: newclosed

Applied patch in revision 45814. Thanks for the report.

Note: See TracTickets for help on using tickets.