Opened 9 years ago
Closed 8 years ago
#9856 closed Bugs (wontfix)
[windows] condition_variable::wait_for returns wrong cv_status on timeout.
Reported by: | Owned by: | Niall Douglas | |
---|---|---|---|
Milestone: | Component: | thread | |
Version: | Boost 1.57.0 | Severity: | Problem |
Keywords: | condition_variable wait_for cv_status | Cc: |
Description
The following bug seems to work on *nix systems. But not on Windows 7 with Visual Studio 2013 Express.
If a condition variable is waiting for a defined amount of time, sometimes (nearly always) the returned cv_status is wrong.
int main() { mutex mtx; condition_variable cond; unique_lock<mutex> lk(mtx); std::cout << BOOST_VERSION << "\n"; for(int i=0; i<10; i++) { std::cout << i << " : " << chrono::steady_clock::now() << "\n"; if (cv_status::no_timeout == cond.wait_for(lk, chrono::seconds(1))) std::cout << "no_timeout\n"; } }
While the condition is never notified, i get no_timeout.
debuging the code:
steady_clock::time_point c_now = steady_clock::now(); do_wait(lock, ceil<milliseconds>(d).count()); return steady_clock::now() - c_now < d ? cv_status::no_timeout : cv_status::timeout;
steady_clock::now() - c_now < d is true. It seems do_wait is sleeping to short.
Change History (25)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 9 years ago
Summary: | condition_variable::wait_for returns wrong cv_status on timeout. → [windows] condition_variable::wait_for returns wrong cv_status on timeout. |
---|
comment:4 by , 8 years ago
This bug seems closely related to, or the same as, something I found on Linux in Boost 1.54, 1.48, and 1.46. When passed a deadline represented as a system_time, condition_variable::timed_wait() returns false (i.e. timed out) after a little under a millisecond. For more info and a test program, see the boost-users mailing list archive, thread subject "[Thread] thousands of spurious wakeups from timed_wait() per second" around Sept. 11, 2014.
(Sorry, I can't get a link to it past trac's spam filter. The spam filter says "Please respond to the following" and nothing follows.)
comment:5 by , 8 years ago
Milestone: | To Be Determined → Boost 1.57.0 |
---|
comment:6 by , 8 years ago
I am unable to replicate this issue with trunk 1.57 on VS2013 x86 on Win8.1.
comment:8 by , 8 years ago
Milestone: | Boost 1.57.0 |
---|---|
Resolution: | → worksforme |
Status: | assigned → closed |
comment:9 by , 8 years ago
It reproduces on 1.57. Windows 8.1 Visual Studio 2013 Ultimate
Using the function
boost::condition_variable condition; condition.wait_until(lock, _waitUntil)
comment:10 by , 8 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Version: | Boost 1.55.0 → Boost 1.57.0 |
comment:11 by , 8 years ago
I had a look for the testcase I wrote for #9856 but it looks like I deleted it last time I reset my Boost installation.
However, if I am to be blunt here, this must be a behaviour which differs between people's machines. Perhaps an AMD vs Intel difference, or maybe a HPET difference? On Windows it also matters hugely if any multimedia program is running because then the Windows context switch granularity goes from ~43ms to ~1ms. A multimedia program famous for this is the Chrome web browser.
Obviously whether a multimedia program is running should make no difference to Boost code. But I am trying to think of a repeatable demonstration, as otherwise finding and fixing any problem is very hard.
comment:12 by , 8 years ago
I have reproduced the problem on a Parallels Desktop 10 Virtual Machine, running on a MacBook Pro, Processor: Intel Core i5.
comment:13 by , 8 years ago
I've put together the following test case test_9856.cpp:
#include "boost/thread.hpp" #include <iostream> using namespace boost; int main() { mutex mtx; condition_variable cond; unique_lock<mutex> lk(mtx); std::cout << BOOST_VERSION << "\n"; for(int i=0; i<10; i++) { //std::cout << i << " : " << chrono::steady_clock::now().count() << "\n"; if (cv_status::timeout != cond.wait_for(lk, chrono::seconds(1))) { std::cout << "ERROR: Wrong return code!\n"; return 1; } } return 0; }
I'll test it on Windows when I reboot out of Linux later today. I've also lined up a KVM virtualised Windows to test upon.
Niall
comment:14 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:15 by , 8 years ago
I've successfully replicated this bug! You have to loop running the test case a few times, but it'll eventually appear. This was on Windows 8.1 x64.
I'll see if I can fix it next Saturday.
Niall
comment:16 by , 8 years ago
I have an update on this issue. Firstly, I have committed a testcase for this to Boost.Thread trunk, so it is now being tested as part of regular unit testing.
I have some quite scary news though. Boost.Thread decides whether the wait has timed out by comparing steady_clock (really QueryPerformanceCounter) from before and after, and if the wait took the timeout or longer it assumes that timeout occurred. Unfortunately, WaitForMultipleObjectsEx() provides no guarantees that it actually waits for the timeout requested - you might request 50ms, but might actually get 10ms. Here is an example program:
#include <windows.h> #include <stdio.h> #include <chrono> int main(void) { ULONG ulDelay_ms = 20; HANDLE hSemaphoreDelay = CreateSemaphore(NULL, 0, 1, NULL); for (size_t n = 0; n < 50; n++) { while (1) { auto begin = std::chrono::high_resolution_clock::now(); ULONG hr = WaitForSingleObject(hSemaphoreDelay, ulDelay_ms); auto end = std::chrono::high_resolution_clock::now(); auto diff = end - begin; if (hr == WAIT_ABANDONED) printf("Wait Abandoned "); else if (hr == WAIT_TIMEOUT) printf("Timed out "); else printf("Signaled "); DWORD lTDelta = std::chrono::duration_cast<std::chrono::milliseconds>(diff).count(); printf("Target Wait Interval: %u Real Wait Interval: %u (%u)\n", ulDelay_ms, lTDelta, diff.count()); if (lTDelta >= ulDelay_ms) break; } printf("\n"); } CloseHandle(hSemaphoreDelay); return 0; }
... and indeed very occasionally it is returning after 16-18ms instead of the requested 20ms at the same frequency that Boost.Thread is not returning timeout.
This is scary because I believe every single wait routine in Windows in Boost.Thread, and possibly other Boost libraries, will need to be audited and potentially refactored. I'll raise it on boost-dev tomorrow morning. I believe this new early timeout feature was probably introduced when Microsoft added timer coalescing to Windows to save battery on mobile devices, and so only appeared in Windows 7 and later. The fact it hasn't presented sooner is worrying.
Niall
comment:17 by , 8 years ago
FYI the current consensus in the discussion on boost-dev is that this should be a wontfix. See http://comments.gmane.org/gmane.comp.lib.boost.devel/257246.
Personally speaking I'd prefer to adjust the code to produce fewer spurious wakeups, but not completely prevent them. We'll see if Microsoft come back to us to say what their STL does.
comment:18 by , 8 years ago
Did some testing of the performance of passing a timeout to WaitForMultipleObjects versus using SetWaitableTimerEx:
Where underlying object is not signalled, 1ms timeout:
WaitForMultipleObjects alone: 1293.457580 microsecs
CreateWaitableTimer + SetWaitableTimer + WaitForMultipleObjects + CloseHandle: 1000.870380 microsecs
Where underlying object is signalled:
WaitForMultipleObjects alone: 0.246493 microsecs
SetWaitableTimer + WaitForMultipleObjects: 1.062579 microsecs
CreateWaitableTimer + SetWaitableTimer + WaitForMultipleObjects + CloseHandle: 2.055087 microsecs
Moving the creation of the waitable timer outside of interruptible_wait() will be a pain - every interruptible_wait() user will need to cache a copy, or else we keep a copy in thread local storage. Given that the number of occasions where the underlying semaphore goes signalled in between the check before starting the wait and the wait beginning will be few, I reckon the 8x slowdown for timeouts < 20ms will be worth the ease of implementation. So this is what I will do.
Niall
comment:20 by , 8 years ago
Replying to viboes:
It does appear that Windows respects the timeout you pass to a waitable timer, whilst it fuzzes any timeout given to WaitForMultipleObjects to be either too short or too long (on average 30% too long).
BTW I have more empirical findings. It turns out Windows ignores all TolerableDelay values <= 26 ms. I have no idea why. But 27 ms is definitely the minimum for it to have any effect at all.
Windows usually ticks at 15.6 ms, so a tolerable delay of 27 ms is about 1.75 context switches. Do I have permission to set that minimum for all timers i.e.
27 ms <= tolerable delay <= 5% of duration between now and timeout.
Do we need an upper bound on tolerable delay? I can't see Windows running for ten seconds without at least one interrupt firing though ...
Niall
comment:22 by , 8 years ago
@viboes I only have access to Windows 8.1 nowadays. They added coalesced timer support in Vista and a tickless kernel in Windows 8, so I would assume each Windows will be a little bit different.
The code I have written should compile and work on Win XP however. As with GetTickCount64(), it emulates the SetWaitableTimerEx() function on XP and earlier.
Niall
comment:24 by , 8 years ago
Addition of coalesced timers is in https://github.com/boostorg/thread/pull/44.
The fact that Thread doesn't return timeout up to 10% of the time on Windows is wontfix.
comment:25 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
This is also happening with wait_until.