Opened 10 years ago

Closed 10 years ago

#7571 closed Bugs (wontfix)

Mutex fails to unlock and causes deadlock

Reported by: yalon-l@… Owned by: Anthony Williams
Milestone: To Be Determined Component: thread
Version: Boost 1.51.0 Severity: Problem
Keywords: mutex deadlock VC7.1 Cc:

Description

I'm using mutex and scoped_lock to lock and unlock it. Once a thread locks the mutex, it never gets unlocked, even though the lock goes out of scope and even when the thread terminates. As a result it causes a deadlock.

The problem shows only when the operation the thread is doing is long enough to allow context switching between the threads.

Problem appears on Visual Studio 2003 (VC7.1)

Attachments (2)

Main.cpp (1.9 KB ) - added by yalon-l@… 10 years ago.
Bug demonstrator
Tests.exe (204.0 KB ) - added by yalon-l@… 10 years ago.
Executable of the bug demo

Download all attachments as: .zip

Change History (9)

by yalon-l@…, 10 years ago

Attachment: Main.cpp added

Bug demonstrator

comment:1 by yalon-l@…, 10 years ago

Results when running the attached demo file:

D:\Projects\BoostMutexBug\BoostMutexBug\Tests\Debug>Tests 5000000
Thread 1, iteration 0
Thread 2, iteration 0
Thread 1, iteration 1
Thread 1, iteration 2
Thread 1, iteration 3
Thread 1, iteration 4
Deadlock occurred

D:\Projects\BoostMutexBug\BoostMutexBug\Tests\Debug>Tests 100
Thread 1, iteration 0
Thread 1, iteration 1
Thread 2, iteration 0
Thread 1, iteration 2
Thread 2, iteration 1
Thread 2, iteration 2
Thread 2, iteration 3
Thread 1, iteration 3
Thread 1, iteration 4
Thread 2, iteration 4

comment:2 by viboes, 10 years ago

Resolution: invalid
Status: newclosed

The problem is in your example. The timeout should depend on MAX_COUNTS. When MAX_COUNTS 5000000, it seems that 10000 milliseconds is not enough to do the loop :(

by yalon-l@…, 10 years ago

Attachment: Tests.exe added

Executable of the bug demo

comment:3 by anonymous, 10 years ago

I don't think that the problem is in the example. The 10000 milliseconds is about 10 times the needed duration. I'm attacing the executable so one can see the messages written and then a long pause (of about 9 sec) after which deadlock is detected. I can increase of course the 10000, but it does nothing except for a longer wait till the program detects the deadlock.

I forgot to mention that I'm running on Windows 7.

In my experiments, I replace boost::mutex with a mutex of my own, based upon Windows native CreateMutex. With it the problem doesn't show and the deadlock never occurs.

comment:4 by yalon-l@…, 10 years ago

Resolution: invalid
Status: closedreopened

comment:5 by viboes, 10 years ago

I don't reach to reproduce the deadlock on MacOs/Ubuntu/Windows XP.

Could you replace the timed join by join and report what happens? Please, could you try to debug this issue on your side?

comment:6 by Yalon Lotan <yalon-l@…>, 10 years ago

Replacing the timed join by join didn't change a thing, except that the program now hangs forever instead of announcing that it had detected a deadlock... I'm surprised that you didn't manage to reproduce the bug on Windows XP. Did you try building it or using the attached exe? Running the exe on XP reproduced the bug for me. The same applies to building and running it on XP.

If you build it, pay attention to use Visual Studio 2003. I gave the code to a colleague with Visual Studio 2005 and with that compiler that problem didn't show.

I can debug the problem on my side. What information would be useful for you?

comment:7 by viboes, 10 years ago

Resolution: wontfix
Severity: ShowstopperProblem
Status: reopenedclosed

I have added the test to the regression test. For the time been no compiler is signaling a failure.

VS 2003 is too old to maintain it now, in 2012.

For the debug, I'm not waiting for any specific information other than, Hey, I have found what is happening? or better yet, And I've found a workaround ;-)

No seriously, I don't have the possibility and the time to debug this issue, but any patch is welcome.

Note: See TracTickets for help on using tickets.