Opened 11 years ago

Closed 11 years ago

#5727 closed Bugs (invalid)

race condition between ~basic_condition_variable() and notify_all()

Reported by: sbear.powell@… Owned by: viboes
Milestone: To Be Determined Component: thread
Version: Boost 1.46.1 Severity: Problem
Keywords: Cc: viboes

Description

I was writing a quick parfor function when I found this. I think there's a race between ~basic_condition_variable() and notify_all() (notify_one() hasn't failed for me). The following code fails an assertion on my platform (32 bit MSVC 2010):

void foo() {
    condition_variable notifier;
    mutex m, m2;
    mutex::scoped_lock lock(m);
    bool done=false;
    thread t([&]{
        mutex::scoped_lock lock(m);
        done = true;
        lock.unlock();
        notifier.notify_all();
    });
    while(!done) notifier.wait(lock);
}
"Assertion failed: CloseHandle(handle_to_manage), file c:\boost_1_46_1\boost\thread\win32\thread_primitives.hpp, line 237"

A more detailed stack trace says the exception comes from the worker thread at:

boost::detail::win32::handle_manager::cleanup() boost\thread\win32\thread_primitives.hpp 237: BOOST_VERIFY(CloseHandle(handle_to_manage));
boost::detail::win32::handle_manager::operator=(void * new_handle=0x00000000)  boost\thread\win32\thread_primitives.hpp 251: cleanup();
boost::detail::basic_condition_variable::notify_all()  boost\thread\win32\condition_variable.hpp 288: wake_sem=detail::win32::handle(0);
notifier.notify_all();
...

My workaround is to add a second mutex to keep foo() from returning before notify_all() has returned:

void foo() {
    condition_variable notifier;
    mutex m, m2;
    mutex::scoped_lock lock(m);
    bool done=false;
    thread t([&]{
        mutex::scoped_lock lock(m);
        mutex::scoped_lock lock2(m2);
        done = true;
        lock.unlock();
        notifier.notify_all();
    });
    while(!done) notifier.wait(lock);
    mutex::scoped_lock lock2(m2);
}

I think (haven't tested) that the real fix is to lock basic_condition_variable's internal_mutex in ~basic_condition_variable() so that it will block until any concurrently running member functions have finished. (an educated guess... this is the first time I've looked at the guts of boost.thread)

Change History (8)

comment:1 by viboes, 11 years ago

I'm surely misunderstanding your code

void foo() {
    condition_variable notifier;
    mutex m, m2;
    mutex::scoped_lock lock(m); //1
    bool done=false;
    thread t([&]{
        mutex::scoped_lock lock(m); //2
        done = true;
        lock.unlock();
        notifier.notify_all();
    });
    while(!done) notifier.wait(lock); //3
}

Doesn't this code deadlock. In 1 foo owns the lock m. In 2 the lambda thread is blocked until the lock variable exits his scope (at the end of foo). In 3 As done is false thread foo waits for notifier, which can not be notified by the lambda as blocked.

What am I missing?

comment:2 by viboes, 11 years ago

Cc: viboes added
Owner: changed from Anthony Williams to viboes
Status: newassigned

in reply to:  1 ; comment:3 by sbear.powell@…, 11 years ago

That's mostly correct, but it won't deadlock because notifier.wait(lock) will unlock foo's lock while it's waiting.

So: In 1, foo owns m. In 2, the lamda thread will block until m is available. In 3, done is false, so notifier.wait(lock) is called--releasing ownership of m. The lambda proceeds, setting done to true, unlocking m, and notifying any waiting threads (ie. foo). foo returns from its call to notifier.wait(), checks done (which is now true) and can return.

(If you want experimental proof that it doesn't deadlock, try running my workaround code... the logic is pretty much the same)

Replying to viboes:

I'm surely misunderstanding your code

void foo() {
    condition_variable notifier;
    mutex m, m2;
    mutex::scoped_lock lock(m); //1
    bool done=false;
    thread t([&]{
        mutex::scoped_lock lock(m); //2
        done = true;
        lock.unlock();
        notifier.notify_all();
    });
    while(!done) notifier.wait(lock); //3
}

Doesn't this code deadlock. In 1 foo owns the lock m. In 2 the lambda thread is blocked until the lock variable exits his scope (at the end of foo). In 3 As done is false thread foo waits for notifier, which can not be notified by the lambda as blocked.

What am I missing?

in reply to:  3 comment:4 by viboes, 11 years ago

Replying to sbear.powell@…:

That's mostly correct, but it won't deadlock because notifier.wait(lock) will unlock foo's lock while it's waiting.

So: In 1, foo owns m. In 2, the lamda thread will block until m is available. In 3, done is false, so notifier.wait(lock) is called--releasing ownership of m. The lambda proceeds, setting done to true, unlocking m, and notifying any waiting threads (ie. foo). foo returns from its call to notifier.wait(), checks done (which is now true) and can return.

You are right. I forget always that wait unlocks the lock.

(If you want experimental proof that it doesn't deadlock, try running my workaround code... the logic is pretty much the same)

Replying to viboes:

I'm surely misunderstanding your code

void foo() {
    condition_variable notifier;
    mutex m, m2;
    mutex::scoped_lock lock(m); //1
    bool done=false;
    thread t([&]{
        mutex::scoped_lock lock(m); //2
        done = true;
        lock.unlock();
        notifier.notify_all();
    });
    while(!done) notifier.wait(lock); //3
}

Doesn't this code deadlock. In 1 foo owns the lock m. In 2 the lambda thread is blocked until the lock variable exits his scope (at the end of foo). In 3 As done is false thread foo waits for notifier, which can not be notified by the lambda as blocked.

What am I missing?

comment:5 by viboes, 11 years ago

I think that Boost.Thread can not do any think here. You are sharing the variable notifier allocated on the stack between two thread. When the function foo returns, the lambda thread will continue using a destroyed instance.

Note that the same issue will occur with any shared variable. IMO, it is up to the application to manage with this kind of errors.

Let me know if you share my point of view.

in reply to:  5 ; comment:6 by sbear.powell@…, 11 years ago

Replying to viboes:

I think that Boost.Thread can not do any think here. You are sharing the variable notifier allocated on the stack between two thread. When the function foo returns, the lambda thread will continue using a destroyed instance.

The only potential fix I can think of on Boost.Thread's side would be to lock the condition_variable's internal mutex in its destructor--that way nothing will be destroyed until after notify_all() is out of its critical section.

Note that the same issue will occur with any shared variable. IMO, it is up to the application to manage with this kind of errors.

Right. Is there any general rule of thumb on thread-safety of destructors? What do the other Boost.Thread classes do in regards to that?

Let me know if you share my point of view.

Yeah, after some thought I agree with you--it's not really a bug in Boost.Thread, unless the rest of the Boost.Thread library keeps destructors thread-safe. In the end, it's not so hard to add the second mutex, or to just stick t.join() before foo() returns.

in reply to:  6 comment:7 by viboes, 11 years ago

Replying to sbear.powell@…:

Replying to viboes:

I think that Boost.Thread can not do any think here. You are sharing the variable notifier allocated on the stack between two thread. When the function foo returns, the lambda thread will continue using a destroyed instance.

The only potential fix I can think of on Boost.Thread's side would be to lock the condition_variable's internal mutex in its destructor--that way nothing will be destroyed until after notify_all() is out of its critical section.

Maybe I'm wrong, but I think this is out of the philosophy "don't pay for what you don't use".

Note that the same issue will occur with any shared variable. IMO, it is up to the application to manage with this kind of errors.

Right. Is there any general rule of thumb on thread-safety of destructors? What do the other Boost.Thread classes do in regards to that?

AFAIK no destructor takes care of these kind of thread-safety. If you want you can start a thread on the ML so others could give their advice.

Let me know if you share my point of view.

Yeah, after some thought I agree with you--it's not really a bug in Boost.Thread, unless the rest of the Boost.Thread library keeps destructors thread-safe. In the end, it's not so hard to add the second mutex, or to just stick t.join() before foo() returns.

Yes the user has everything to prevent the deadlock.

comment:8 by viboes, 11 years ago

Resolution: invalid
Status: assignedclosed
Note: See TracTickets for help on using tickets.