Opened 13 years ago
Closed 12 years ago
#3448 closed Bugs (fixed)
interprocess_condition (emulated) can exit with inconsistent m_num_waiters value
Reported by: | Owned by: | Ion Gaztañaga | |
---|---|---|---|
Milestone: | Boost 1.45.0 | Component: | interprocess |
Version: | Boost 1.40.0 | Severity: | Problem |
Keywords: | interprocess_condition | Cc: |
Description
I describe this from the point of view of the 1.39.0 source code, but the problem still exists in the boost development trunk as of today.
Bug:
There is a set of conditions where a process can manage to enter do_timed_wait, increment m_num_waiters, and exit without decrementing it.
Boost 1.39.0
Sequence of events:
We join our hero, Process A (P_A), in boost/interprocess/sync/emulation/interprocess_condition.hpp.
P_A is executing a do_timed_wait(true, lock, abs_time) call, and is spinning at the while loop at line 124.
tout_enabled == true, and abs_time is a microsecond in the future (about to expire but hasn't yet).
Process B, P_A's trusty sidekick, sends a notify_all on the conditional, breaking P_A out of the while loop at line 124.
abs_time arrives (ie, P_A got to line 149 with microsec_clock::universal_time() >= abs_time and timed_out = false).
With these conditions, P_A gets to line 163 and calls the constructor for scoped_lock.
P_A jumps to boost/interprocess/sync/scoped_lock.hpp line 114.
P_A executes mp_mutex->timed_lock(abs_time) at line 115.
P_A jumps to boost/interprocess/sync/emulation/interprocess_condition.hpp line 49.
P_A takes a reading of now at line 56.
P_A finds that (now >= abs_time) at line 58 and is sent packing with a return value of false.
P_A arrives back in boost/interprocess/sync/emulation/interprocess_condition.hpp on line 163.
P_A gets to line 171 and finds lock is false. He panics! He sets timed_out to true and unlock_enter_mut to true, but in his haste to break out of evil Dr. while(1)'s clutches, he forgot to atomically decrement m_num_waiters!
Maniacal laughter can be heard behind him as he tries in vein to acquire the lock on line 214.
"You fool! You fell into my trap!", shouts Dr. while(1). "Process B grabbed that very lock and attempted to free you again! He is at line 56 of this very header file, waiting for a call from you that will never come, and he's holding your precious lock! Your deadlock is complete! HAHAHAHAHAHAH!!"
Change History (2)
comment:1 by , 13 years ago
Version: | Boost 1.39.0 → Boost 1.40.0 |
---|
comment:2 by , 12 years ago
Milestone: | Boost 1.41.0 → Boost-1.45.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed for Boost 1.45 in release branch
Confirmed to still exist in Boost 1.40. My test case: 5 threads timed_send'ing ~1024 byte messages into a message_queue with que_size == 1 and msg_size == 1024. Each thread is setup to send every 100 ms, with a 100 ms timeout.
Windows XP, 4 core processor, ~4Gb RAM
The threads all lock within 60 seconds of startup.
Based on the description above, I added this at line 174 of boost/interprocess/sync/emulation/interprocess_condition.hpp:
It solved the problem.