Opened 9 years ago
Closed 9 years ago
#9355 closed Bugs (fixed)
boost::coroutine crash in base<void>::pull_coroutine_base<void> with multiple threads
Reported by: | Owned by: | olli | |
---|---|---|---|
Milestone: | To Be Determined | Component: | coroutine |
Version: | Boost 1.55.0 | Severity: | Problem |
Keywords: | pull_coroutine_base; crash | Cc: |
Description
Using 1.55b1 coroutine a sporadic crash occurs when running outside of the debugger. Appears related to creating the coroutine context. The attached test application creates N threads which execute coroutines.
Platform: Windows 7 x64
Compiler: vc2012
Build: x64
Note: Occurs outside of debugger ~ 1 in 10 executions of application on an Intel i7
Exception:
Unhandled exception at 0x000000013FD1222B (UnitTest_Concurrency_Test.exe) in WER1FB.tmp.mdmp: 0x80000001: Not implemented (parameters: 0x0000000000000001, 0x0000000000080F08).
Stack location:
UnitTest_Concurrency_Test.exe!boost::coroutines::detail::pull_coroutine_base<void>::pull_coroutine_base<void>(void (__int64) * fn, boost::coroutines::stack_context * stack_ctx, bool unwind, bool preserve_fpu) Line 276 C++
Educated guess:
It may be some sort of race condition where a coroutine context is being created at the same time in two threads that are resident on the same processing core.
Attachments (4)
Change History (21)
by , 9 years ago
Attachment: | UnitTest_Concurrency_Test.cpp added |
---|
comment:1 by , 9 years ago
please reduce test code - it contains too much of your application (Jobs etc.)
by , 9 years ago
Attachment: | 2013-11-07 Boost Coroutine Concurrency Bug-Test.7z added |
---|
Reduced test example with vc2012 solution
comment:2 by , 9 years ago
Thanks for the quick reply. I am very sorry for not cleaning up the original code example. I have now significantly reduced the code in a new attachment including a vc2012 project file. Please let me know if you would like me to reduce code further.
comment:3 by , 9 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I've only msvc-10.0 available on my computer - your code does not compile with msvc-10.0.
On Linux + g++-4.7.3 you app asserts: ConcurrencyTest.cpp:163: void test(): Assertion `iterCounter == iterationCountTotal' failed. After commenting out the assert-statement the output is: Iters = 1024 of 1516918 [FALSE] Time: 107.453ms (iteration=70ns) Done it seams that you application logic is wrong
Your test app is still too complex - I need a small test app which triggers the error.
comment:4 by , 9 years ago
Very sorry about that, the asserts are left over from the original full code and apologize and have nothing to do with the test.
Your output sounds right. The coroutine is only constructed therefore performing just the first iteration in each thread as this is the part that causes the issue. I assume you have tested running an optimized GCC build without the debugger with no issue?
comment:5 by , 9 years ago
I have simplified down the test further removing all the logic and the use of vectors etc. You are right in that most of it is unrelated to the crashing and the issue still occurs.
Afterward is a new batch script too I have seen runs of ~100 without fail but have also seen the lucky crash inside the debugger but its the same as attaching after failure.
Test Code:
#include <boost/coroutine/all.hpp> #include <boost/lockfree/queue.hpp> #include <boost/bind.hpp> #include <thread> #include <queue> typedef boost::coroutines::coroutine< void > coro_t; static void foo( coro_t::push_type& yield, int i ) {} struct Worker { Worker() : done(false) {} volatile bool done; void operator()() { while ( !done ) { pending.consume_all( [&]( int i ) { coro_t::pull_type( boost::bind( foo, _1, i ) ); } ); } } boost::lockfree::queue<int, boost::lockfree::capacity<1024> > pending; }; int main( int argc, char * argv[]) { Worker workers[2]; std::thread threads[2] = { std::thread(std::ref(workers[0])) , std::thread(std::ref(workers[1])) }; workers[0].pending.push(0); workers[1].pending.push(1); for ( auto& worker: workers ) worker.done = true; for ( auto& thread: threads ) thread.join(); return EXIT_SUCCESS; }
Batch script:
echo off cls set counter=0 :loop set /a counter=counter+1 echo run %counter% x64\Release\ConcurrencyTest.exe goto loop
comment:6 by , 9 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
comment:7 by , 9 years ago
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
I can't reproduce your problem - I've tested the code (a little bit modified) on LINUX 32/64bit gcc-4.7.3 debug/release as well as on Windows7 with MSVC-10.0 32/64bit debug/release.
by , 9 years ago
comment:8 by , 9 years ago
Thank you for looking into the issue and know your time is probably precious. Your modified code does not produce the problem as you have identified. The batch script was used for a reason that I should have made clear. The issue occurs every N runs of the application. Be it some optimization or the likes but performing a loop like you have added does not cause the crash. Its a first time sort of issue annoyingly so tricky to track.
comment:9 by , 9 years ago
I've also used the batch script - after the 555 invocation of the test-app (which itself runs your test code in a loop 1000x) I did not get a crash. Please not that I've only MSVC-10.0 - maybe yo should verify the code with MSVC-10.0 on your system.
comment:10 by , 9 years ago
I have tested the issue under vc2010 on a different system and can also reproduce the issue. This system has a Intel Centrino 2 core and does not experience the crash when only 2 threads are utilized. With 4 threads the issue however occurs but due to whatever the cause may be this processor will crash between 580-1120 executions of the program so is not a good platform to find or test on. I will try and test an i5 4670 at the weekend and see if it similar to the i7-3960 of the original system.
New code:
#include <queue> #include <boost/atomic.hpp> #include <boost/bind.hpp> #include <boost/coroutine/all.hpp> #include <boost/lockfree/queue.hpp> #include <boost/thread.hpp> typedef boost::coroutines::coroutine< void > coro_t; static void foo( coro_t::push_type& yield, int i ) {} struct Worker { boost::atomic< bool > done; Worker() : done( false) {} void operator()() { while ( ! done) { pending.consume_all( [&]( int i) { coro_t::pull_type( boost::bind( foo, _1, i) ); } ); } } boost::lockfree::queue< int, boost::lockfree::capacity< 1024 > > pending; }; int main( int argc, char * argv[]) { const uint32_t kCount = 4; Worker workers[kCount]; boost::thread threads[kCount]; for ( uint32_t i = 0; i < kCount; ++i ) threads[i] = boost::thread( boost::ref( workers[i]) ); for ( uint32_t i = 0; i < kCount; ++i ) workers[i].pending.push(i); for ( uint32_t i = 0; i < kCount; ++i ) workers[i].done = true; for ( uint32_t i = 0; i < kCount; ++i ) threads[i].join(); return EXIT_SUCCESS; }
comment:11 by , 9 years ago
Okay so I have tested a Haswell i5-4670 and with 2 thread the issue too 53,161 executions to occur, however up the test to 4 threads and the crash occurs every 2-4 executions! I don't know that that really indicates as on the i7 two threads was causing the issue so could be amplified by hyper-threading somehow.
I don't like the 'fix' but putting a mutex lock around the constructor for the coroutine is the only work around I have currently found
{ boost::lock_guard<boost::recursive_mutex> lock(m_guard); coro_t::pull_type( boost::bind( foo, _1, i) ); }
comment:12 by , 9 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
I have been drilling down into the code and have located the offending problem in standard_stack_allocator_windows.cpp @ line 63:
SYSTEM_INFO system_info() { static SYSTEM_INFO si = system_info_(); return si; }
This function is called in standard_stack_allocator::allocate(...) via detail::pagesize(). The issue here is the use of a static variable in an unsafe manner between threads which explains why it might only fall over the first time and then run without fail.
The first thread is initializing the static value and writing to the structure. A second thread is then already reading/read data from it. This means pagesize() is returning an uninitialized variable.
I don't know the performance characteristics of GetSystemInfo() but a very simple work-around for the crashes is to remove the 'static' keyword:
/*static*/ SYSTEM_INFO si = system_info_();
A better solution is a critical section though as GetSystemInfo() may take an unkown time which I think is the root to the issue to begin with.
comment:13 by , 9 years ago
I think this answers the issue and its a compiler problem of sorts and fact this person reports most compilers still don't implement static variable concurrency safely: http://stackoverflow.com/a/4590634
comment:14 by , 9 years ago
A mutex based solution would be:
SYSTEM_INFO system_info() { boost::lock_guard<boost::recursive_mutex> lock(m_guard); static SYSTEM_INFO si = system_info_(); return si; }
This isn't optimal though as there should be a way to only block on first entry to the function somehow.
comment:15 by , 9 years ago
A simpler solution is to put the static into global scope it would appear so that the data is obtained prior to co-routine constructors. It works, and I think to get a race condition on a global static isn't really ever going to happen unless somebody created lots of global static threads too which would be sort of silly.
comment:16 by , 9 years ago
Okay so I have learnt lots of new things today. I have a pretty good solution wihtout the performance issue of a mutex. I think boost probably has a call_once too but used std in this test:
#include <mutex> ... static std::once_flag l_onceFlag; SYSTEM_INFO system_info() { static SYSTEM_INFO si; std::call_once(l_onceFlag, [&] { si = system_info_(); }); return si; }
comment:17 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
thx, the bug is fixed in boost-trunk, please verify
coroutine multi-thread crash test code