Opened 12 years ago
Closed 10 years ago
#4882 closed Bugs (fixed)
Win32 shared_mutex does not handle timeouts correctly
Reported by: | Owned by: | viboes | |
---|---|---|---|
Milestone: | Boost 1.54.0 | Component: | thread |
Version: | Boost 1.45.0 | Severity: | Problem |
Keywords: | Cc: |
Description
I have identified two problems with the Win32 implementation of boost::shared_mutex in case of timeouts:
Imagine the following scenario:
- Three threads A, B, C compete for a shared_mutex in exclusive mode with timeouts (timed_lock).
- Thread A acquires the mutex. Thread B and C wait for it in WaitForMultipleObjects, so exclusive_waiting is 2.
- Thread A decides to release the mutex at the same time that thread B returns from WaitForMultipleObjects with a timeout.
- Thread A still sees exclusive_waiting == 2 because thread B has not yet adjusted the mutex state. It releases the mutex (exclusive = false), decrements exclusive_waiting to 1 and releases the unlock_sem and exclusive_sem.
- Thread B sees that no threads own the mutex for shared access (shared_count == 0) and exclusive is false. So it sets exclusive to true and now owns the mutex. Notice that it does not do anything with exclusive_waiting, so it is still 1.
- Thread C is woken up due to thread A releasing the semaphores. It loops back to retry to get the mutex but this fails because the mutex is already owned by thread B. However, it still increments exclusive_waiting to 2.
After this process, exclusive_waiting is 2 although only one thread (C) is waiting for the mutex. If this is repeated often enough, timed_lock() will throw a lock_error because exclusive_waiting reached 127 and wrapped to 0. The attached program triggers this problem by simply starting 20 threads which compete for the shared_mutex with a timeout in a loop. On my dual-core 3GHz PC, the exception is thrown in less than a minute.
There is another similar failure mode:
- Two threads A and B compete for a shared_mutex in exclusive mode with timeouts (timed_lock).
- Thread A owns the mutex and thread B waits for it in WaitForMultipleObjects (exclusive_waiting == 1).
- Again thread A releases the mutex at the same time that thread B returns from WaitForMultipleObjects with a timeout.
- Thread A again still sees exclusive_waiting == 1, so it releases the mutex, decrements exclusive_waiting to zero, and releases the semaphores.
- Thread B again sees that the mutex is available and acquires it without waiting for the semaphores because it already returned from WaitForMultipleObjects with a timeout.
- Thread A tries to acquire the mutex again in exclusive mode. It sees that the mutex is locked, increments exclusive_waiting to 1 and calls WaitForMultipleObjects to wait for thread B to release the mutex.
- However, the semaphores are still released from the time thread A released the mutex before but thread B never decremented them because it has already returned from WaitForMultipleObjects with a timeout.
- This means that thread A returns immediately from WaitForMultipleObjects without a timeout. It tries again to acquire the mutex which is still locked by thread B, and it increments again exclusive_waiting which is now 2.
This is again a bug because exclusive_waiting is 2 although only thread A waits for the mutex.
To be honest, I do not know how to fix this bug. I think the basic problem is that changing the mutex state and releasing the semaphores is not a single atomic operation. A thread which returned from WaitForMultipleObjects with a timeout cannot know whether another thread decremented exclusive_waiting for it, so it does not know whether or not it has to decrement exclusive_waiting itself to restore the mutex state.
Attachments (1)
Change History (15)
by , 12 years ago
Attachment: | shrmut.cpp added |
---|
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 11 years ago
Milestone: | To Be Determined → Boost 1.49.0 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Closed as the test is passing now on trunk regression.
comment:4 by , 11 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Let close it when merged to release
comment:5 by , 10 years ago
Milestone: | Boost 1.49.0 → To Be Determined |
---|
comment:6 by , 10 years ago
Hi, since 1.50 there is the possibility to use a common implementation for shared_mutex defining BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN or BOOST_THREAD_VERSION=3.
Please, could you tell me if this solves the issue?
comment:7 by , 10 years ago
Milestone: | To Be Determined |
---|---|
Resolution: | → wontfix |
Status: | reopened → closed |
Please reopen if it is present while defining BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN or BOOST_THREAD_VERSION=3.
comment:8 by , 10 years ago
Milestone: | → To Be Determined |
---|---|
Resolution: | wontfix |
Status: | closed → reopened |
The problem is still there on MacOs/clang/trunk as show the regression test
Test output: Sandia-darwin-clang-trunk-c++11 - thread - test_4882_lib / clang-darwin-trunk Rev 81145 / Fri, 2 Nov 2012 10:21:52 +0000 Report Time: Fri, 2 Nov 2012 17:47:25 +0000 Compile [2012-11-02 10:57:51 UTC]: succeed "/scratch/kbelco/llvm_darwin/build/Release+Asserts/bin/clang++" -x c++ -isystem /scratch/kbelco/llvm_darwin/libcxx/include -U__STRICT_ANSI__ -O0 -g -Wextra -Wno-long-long -pedantic -std=c++11 -stdlib=libc++ -O0 -fno-inline -Wall -pedantic -g -DBOOST_ALL_NO_LIB=1 -DBOOST_CHRONO_STATIC_LINK=1 -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_SYSTEM_STATIC_LINK=1 -DBOOST_THREAD_BUILD_LIB=1 -DBOOST_THREAD_POSIX -DBOOST_THREAD_THROW_IF_PRECONDITION_NOT_SATISFIED -DBOOST_THREAD_USE_LIB=1 -I".." -c -o "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882.o" "../libs/thread/test/test_4882.cpp" Link [2012-11-02 10:57:52 UTC]: succeed "/scratch/kbelco/llvm_darwin/build/Release+Asserts/bin/clang++" -stdlib=libc++ -stdlib=libc++ -o "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882_lib" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882.o" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/tss_null.o" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/chrono/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_chrono.a" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_thread.a" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/system/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_system.a" -g -L/scratch/kbelco/llvm_darwin/libcxx/lib Run [2012-11-02 10:57:52 UTC]: fail ../libs/thread/test/test_4882.cpp:48 ... ../libs/thread/test/test_4882.cpp:30../libs/thread/test/test_4882.cpp:27 EXIT STATUS: 139
comment:9 by , 10 years ago
Cc: | added |
---|
comment:10 by , 10 years ago
Cc: | removed |
---|
comment:11 by , 10 years ago
Milestone: | To Be Determined |
---|---|
Resolution: | → wontfix |
Status: | reopened → closed |
comment:12 by , 10 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Reopened as BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN will not be defined by default and as #7906 states there is a lost in performances when this define is defined.
comment:13 by , 10 years ago
Milestone: | → Boost 1.54.0 |
---|
comment:14 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Committed revision [83525].
Sample program to reproduce the problem of increasing exclusive_waiting