Opened 12 years ago

Closed 10 years ago

#4882 closed Bugs (fixed)

Win32 shared_mutex does not handle timeouts correctly

Reported by: martin.jerabek@… Owned by: viboes
Milestone: Boost 1.54.0 Component: thread
Version: Boost 1.45.0 Severity: Problem
Keywords: Cc:

Description

I have identified two problems with the Win32 implementation of boost::shared_mutex in case of timeouts:

Imagine the following scenario:

  1. Three threads A, B, C compete for a shared_mutex in exclusive mode with timeouts (timed_lock).
  2. Thread A acquires the mutex. Thread B and C wait for it in WaitForMultipleObjects, so exclusive_waiting is 2.
  3. Thread A decides to release the mutex at the same time that thread B returns from WaitForMultipleObjects with a timeout.
    1. Thread A still sees exclusive_waiting == 2 because thread B has not yet adjusted the mutex state. It releases the mutex (exclusive = false), decrements exclusive_waiting to 1 and releases the unlock_sem and exclusive_sem.
    2. Thread B sees that no threads own the mutex for shared access (shared_count == 0) and exclusive is false. So it sets exclusive to true and now owns the mutex. Notice that it does not do anything with exclusive_waiting, so it is still 1.
    3. Thread C is woken up due to thread A releasing the semaphores. It loops back to retry to get the mutex but this fails because the mutex is already owned by thread B. However, it still increments exclusive_waiting to 2.

After this process, exclusive_waiting is 2 although only one thread (C) is waiting for the mutex. If this is repeated often enough, timed_lock() will throw a lock_error because exclusive_waiting reached 127 and wrapped to 0. The attached program triggers this problem by simply starting 20 threads which compete for the shared_mutex with a timeout in a loop. On my dual-core 3GHz PC, the exception is thrown in less than a minute.

There is another similar failure mode:

  1. Two threads A and B compete for a shared_mutex in exclusive mode with timeouts (timed_lock).
  2. Thread A owns the mutex and thread B waits for it in WaitForMultipleObjects (exclusive_waiting == 1).
  3. Again thread A releases the mutex at the same time that thread B returns from WaitForMultipleObjects with a timeout.
    1. Thread A again still sees exclusive_waiting == 1, so it releases the mutex, decrements exclusive_waiting to zero, and releases the semaphores.
    2. Thread B again sees that the mutex is available and acquires it without waiting for the semaphores because it already returned from WaitForMultipleObjects with a timeout.
  4. Thread A tries to acquire the mutex again in exclusive mode. It sees that the mutex is locked, increments exclusive_waiting to 1 and calls WaitForMultipleObjects to wait for thread B to release the mutex.
  5. However, the semaphores are still released from the time thread A released the mutex before but thread B never decremented them because it has already returned from WaitForMultipleObjects with a timeout.
  6. This means that thread A returns immediately from WaitForMultipleObjects without a timeout. It tries again to acquire the mutex which is still locked by thread B, and it increments again exclusive_waiting which is now 2.

This is again a bug because exclusive_waiting is 2 although only thread A waits for the mutex.

To be honest, I do not know how to fix this bug. I think the basic problem is that changing the mutex state and releasing the semaphores is not a single atomic operation. A thread which returned from WaitForMultipleObjects with a timeout cannot know whether another thread decremented exclusive_waiting for it, so it does not know whether or not it has to decrement exclusive_waiting itself to restore the mutex state.

Attachments (1)

shrmut.cpp (923 bytes ) - added by martin.jerabek@… 12 years ago.
Sample program to reproduce the problem of increasing exclusive_waiting

Download all attachments as: .zip

Change History (15)

by martin.jerabek@…, 12 years ago

Attachment: shrmut.cpp added

Sample program to reproduce the problem of increasing exclusive_waiting

comment:1 by anonymous, 12 years ago

Any update on this issue? This is an extremely annoying bug.

comment:2 by viboes, 11 years ago

Owner: changed from Anthony Williams to viboes
Status: newassigned

comment:3 by viboes, 11 years ago

Milestone: To Be DeterminedBoost 1.49.0
Resolution: fixed
Status: assignedclosed

Closed as the test is passing now on trunk regression.

comment:4 by viboes, 11 years ago

Resolution: fixed
Status: closedreopened

Let close it when merged to release

comment:5 by viboes, 10 years ago

Milestone: Boost 1.49.0To Be Determined

comment:6 by viboes, 10 years ago

Hi, since 1.50 there is the possibility to use a common implementation for shared_mutex defining BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN or BOOST_THREAD_VERSION=3.

Please, could you tell me if this solves the issue?

comment:7 by viboes, 10 years ago

Milestone: To Be Determined
Resolution: wontfix
Status: reopenedclosed

Please reopen if it is present while defining BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN or BOOST_THREAD_VERSION=3.

comment:8 by viboes, 10 years ago

Milestone: To Be Determined
Resolution: wontfix
Status: closedreopened

The problem is still there on MacOs/clang/trunk as show the regression test

Test output: Sandia-darwin-clang-trunk-c++11 - thread - test_4882_lib / clang-darwin-trunk
Rev 81145 / Fri, 2 Nov 2012 10:21:52 +0000
Report Time: Fri, 2 Nov 2012 17:47:25 +0000

Compile [2012-11-02 10:57:51 UTC]: succeed

"/scratch/kbelco/llvm_darwin/build/Release+Asserts/bin/clang++" -x c++ -isystem /scratch/kbelco/llvm_darwin/libcxx/include -U__STRICT_ANSI__ -O0 -g -Wextra -Wno-long-long -pedantic -std=c++11 -stdlib=libc++ -O0 -fno-inline -Wall -pedantic -g -DBOOST_ALL_NO_LIB=1 -DBOOST_CHRONO_STATIC_LINK=1 -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_SYSTEM_STATIC_LINK=1 -DBOOST_THREAD_BUILD_LIB=1 -DBOOST_THREAD_POSIX -DBOOST_THREAD_THROW_IF_PRECONDITION_NOT_SATISFIED -DBOOST_THREAD_USE_LIB=1 -I".." -c -o "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882.o" "../libs/thread/test/test_4882.cpp"

Link [2012-11-02 10:57:52 UTC]: succeed

"/scratch/kbelco/llvm_darwin/build/Release+Asserts/bin/clang++" -stdlib=libc++ -stdlib=libc++  -o "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882_lib" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/test_4882.o" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/test/test_4882_lib.test/clang-darwin-trunk/debug/threading-multi/tss_null.o" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/chrono/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_chrono.a" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/thread/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_thread.a" "/Volumes/scratch/kbelco/boost/results/boost/bin.v2/libs/system/build/clang-darwin-trunk/debug/link-static/threading-multi/libboost_system.a"    -g -L/scratch/kbelco/llvm_darwin/libcxx/lib

Run [2012-11-02 10:57:52 UTC]: fail

../libs/thread/test/test_4882.cpp:48
...
../libs/thread/test/test_4882.cpp:30../libs/thread/test/test_4882.cpp:27

EXIT STATUS: 139

comment:9 by viboes, 10 years ago

Cc: viboes added

comment:10 by viboes, 10 years ago

Cc: viboes removed

comment:11 by viboes, 10 years ago

Milestone: To Be Determined
Resolution: wontfix
Status: reopenedclosed

comment:12 by viboes, 10 years ago

Resolution: wontfix
Status: closedreopened

Reopened as BOOST_THREAD_PROVIDES_GENERIC_SHARED_MUTEX_ON_WIN will not be defined by default and as #7906 states there is a lost in performances when this define is defined.

comment:13 by viboes, 10 years ago

Milestone: Boost 1.54.0

comment:14 by viboes, 10 years ago

Resolution: fixed
Status: reopenedclosed

Committed revision [83525].

Note: See TracTickets for help on using tickets.