Opened 13 years ago

Closed 12 years ago

#3448 closed Bugs (fixed)

interprocess_condition (emulated) can exit with inconsistent m_num_waiters value

Reported by: Zachariah L Young <zachariah.l.young@…> Owned by: Ion Gaztañaga
Milestone: Boost 1.45.0 Component: interprocess
Version: Boost 1.40.0 Severity: Problem
Keywords: interprocess_condition Cc:

Description

I describe this from the point of view of the 1.39.0 source code, but the problem still exists in the boost development trunk as of today.

Bug:

There is a set of conditions where a process can manage to enter do_timed_wait, increment m_num_waiters, and exit without decrementing it.

Boost 1.39.0

Sequence of events:

We join our hero, Process A (P_A), in boost/interprocess/sync/emulation/interprocess_condition.hpp.

P_A is executing a do_timed_wait(true, lock, abs_time) call, and is spinning at the while loop at line 124.

tout_enabled == true, and abs_time is a microsecond in the future (about to expire but hasn't yet).

Process B, P_A's trusty sidekick, sends a notify_all on the conditional, breaking P_A out of the while loop at line 124.

abs_time arrives (ie, P_A got to line 149 with microsec_clock::universal_time() >= abs_time and timed_out = false).

With these conditions, P_A gets to line 163 and calls the constructor for scoped_lock.

P_A jumps to boost/interprocess/sync/scoped_lock.hpp line 114.

P_A executes mp_mutex->timed_lock(abs_time) at line 115.

P_A jumps to boost/interprocess/sync/emulation/interprocess_condition.hpp line 49.

P_A takes a reading of now at line 56.

P_A finds that (now >= abs_time) at line 58 and is sent packing with a return value of false.

P_A arrives back in boost/interprocess/sync/emulation/interprocess_condition.hpp on line 163.

P_A gets to line 171 and finds lock is false. He panics! He sets timed_out to true and unlock_enter_mut to true, but in his haste to break out of evil Dr. while(1)'s clutches, he forgot to atomically decrement m_num_waiters!

Maniacal laughter can be heard behind him as he tries in vein to acquire the lock on line 214.

"You fool! You fell into my trap!", shouts Dr. while(1). "Process B grabbed that very lock and attempted to free you again! He is at line 56 of this very header file, waiting for a call from you that will never come, and he's holding your precious lock! Your deadlock is complete! HAHAHAHAHAHAH!!"

Change History (2)

comment:1 by anonymous, 13 years ago

Version: Boost 1.39.0Boost 1.40.0

Confirmed to still exist in Boost 1.40. My test case: 5 threads timed_send'ing ~1024 byte messages into a message_queue with que_size == 1 and msg_size == 1024. Each thread is setup to send every 100 ms, with a 100 ms timeout.

Windows XP, 4 core processor, ~4Gb RAM

The threads all lock within 60 seconds of startup.

Based on the description above, I added this at line 174 of boost/interprocess/sync/emulation/interprocess_condition.hpp:

detail::atomic_dec32(const_cast<boost::uint32_t*>(&m_num_waiters));

It solved the problem.

comment:2 by Ion Gaztañaga, 12 years ago

Milestone: Boost 1.41.0Boost-1.45.0
Resolution: fixed
Status: newclosed

Fixed for Boost 1.45 in release branch

Note: See TracTickets for help on using tickets.