Boost C++ Libraries: Ticket #9355: boost::coroutine crash in base<void>::pull_coroutine_base<void> with multiple threads https://svn.boost.org/trac10/ticket/9355 <p> Using 1.55b1 coroutine a sporadic crash occurs when running outside of the debugger. Appears related to creating the coroutine context. The attached test application creates N threads which execute coroutines. </p> <p> Platform: Windows 7 x64<br /> Compiler: vc2012 <br /> Build: x64 <br /> Note: Occurs outside of debugger ~ 1 in 10 executions of application on an Intel i7<br /> </p> <p> Exception:<br /> </p> <pre class="wiki">Unhandled exception at 0x000000013FD1222B (UnitTest_Concurrency_Test.exe) in WER1FB.tmp.mdmp: 0x80000001: Not implemented (parameters: 0x0000000000000001, 0x0000000000080F08). </pre><p> Stack location:<br /> </p> <pre class="wiki">UnitTest_Concurrency_Test.exe!boost::coroutines::detail::pull_coroutine_base&lt;void&gt;::pull_coroutine_base&lt;void&gt;(void (__int64) * fn, boost::coroutines::stack_context * stack_ctx, bool unwind, bool preserve_fpu) Line 276 C++ </pre><p> <br /> Educated guess: <br /> It may be some sort of race condition where a coroutine context is being created at the same time in two threads that are resident on the same processing core. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9355 Trac 1.4.3 craig@… Thu, 07 Nov 2013 13:31:31 GMT attachment set https://svn.boost.org/trac10/ticket/9355 https://svn.boost.org/trac10/ticket/9355 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">UnitTest_Concurrency_Test.cpp</span> </li> </ul> <p> coroutine multi-thread crash test code </p> Ticket craig@… Thu, 07 Nov 2013 13:32:26 GMT attachment set https://svn.boost.org/trac10/ticket/9355 https://svn.boost.org/trac10/ticket/9355 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">run_concurrency.bat</span> </li> </ul> <p> Batch file to execute test continuously </p> Ticket olli Thu, 07 Nov 2013 14:30:56 GMT <link>https://svn.boost.org/trac10/ticket/9355#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:1</guid> <description> <p> please reduce test code - it contains too much of your application (Jobs etc.) </p> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Thu, 07 Nov 2013 16:01:57 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/9355 https://svn.boost.org/trac10/ticket/9355 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">2013-11-07 Boost Coroutine Concurrency Bug-Test.7z</span> </li> </ul> <p> Reduced test example with vc2012 solution </p> Ticket Craig Hutchinson <craig@…> Thu, 07 Nov 2013 16:05:45 GMT <link>https://svn.boost.org/trac10/ticket/9355#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:2</guid> <description> <p> Thanks for the quick reply. I am very sorry for not cleaning up the original code example. I have now significantly reduced the code in a new attachment including a vc2012 project file. Please let me know if you would like me to reduce code further. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>olli</dc:creator> <pubDate>Thu, 07 Nov 2013 19:15:09 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/9355#comment:3 https://svn.boost.org/trac10/ticket/9355#comment:3 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> <p> I've only msvc-10.0 available on my computer - your code does not compile with msvc-10.0. </p> <p> On Linux + g++-4.7.3 you app asserts: <a class="missing wiki">ConcurrencyTest</a>.cpp:163: void test(): Assertion `iterCounter == iterationCountTotal' failed. After commenting out the assert-statement the output is: Iters = 1024 of 1516918 [FALSE] Time: 107.453ms (iteration=70ns) Done it seams that you application logic is wrong </p> <p> Your test app is still too complex - I need a small test app which triggers the error. </p> Ticket Craig Hutchinson <craig@…> Fri, 08 Nov 2013 07:24:54 GMT <link>https://svn.boost.org/trac10/ticket/9355#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:4</guid> <description> <p> Very sorry about that, the asserts are left over from the original full code and apologize and have nothing to do with the test. </p> <p> Your output sounds right. The coroutine is only constructed therefore performing just the first iteration in each thread as this is the part that causes the issue. I assume you have tested running an optimized GCC build without the debugger with no issue? </p> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Fri, 08 Nov 2013 08:20:19 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:5</guid> <description> <p> I have simplified down the test further removing all the logic and the use of vectors etc. You are right in that most of it is unrelated to the crashing and the issue still occurs. </p> <p> Afterward is a new batch script too I have seen runs of ~100 without fail but have also seen the lucky crash inside the debugger but its the same as attaching after failure. </p> <p> Test Code: </p> <pre class="wiki">#include &lt;boost/coroutine/all.hpp&gt; #include &lt;boost/lockfree/queue.hpp&gt; #include &lt;boost/bind.hpp&gt; #include &lt;thread&gt; #include &lt;queue&gt; typedef boost::coroutines::coroutine&lt; void &gt; coro_t; static void foo( coro_t::push_type&amp; yield, int i ) {} struct Worker { Worker() : done(false) {} volatile bool done; void operator()() { while ( !done ) { pending.consume_all( [&amp;]( int i ) { coro_t::pull_type( boost::bind( foo, _1, i ) ); } ); } } boost::lockfree::queue&lt;int, boost::lockfree::capacity&lt;1024&gt; &gt; pending; }; int main( int argc, char * argv[]) { Worker workers[2]; std::thread threads[2] = { std::thread(std::ref(workers[0])) , std::thread(std::ref(workers[1])) }; workers[0].pending.push(0); workers[1].pending.push(1); for ( auto&amp; worker: workers ) worker.done = true; for ( auto&amp; thread: threads ) thread.join(); return EXIT_SUCCESS; } </pre><p> Batch script: </p> <pre class="wiki">echo off cls set counter=0 :loop set /a counter=counter+1 echo run %counter% x64\Release\ConcurrencyTest.exe goto loop </pre> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Fri, 08 Nov 2013 08:36:44 GMT</pubDate> <title>status changed; resolution deleted https://svn.boost.org/trac10/ticket/9355#comment:6 https://svn.boost.org/trac10/ticket/9355#comment:6 <ul> <li><strong>status</strong> <span class="trac-field-old">closed</span> → <span class="trac-field-new">reopened</span> </li> <li><strong>resolution</strong> <span class="trac-field-deleted">invalid</span> </li> </ul> Ticket olli Fri, 08 Nov 2013 18:36:11 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/9355#comment:7 https://svn.boost.org/trac10/ticket/9355#comment:7 <ul> <li><strong>status</strong> <span class="trac-field-old">reopened</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">worksforme</span> </li> </ul> <p> I can't reproduce your problem - I've tested the code (a little bit modified) on LINUX 32/64bit gcc-4.7.3 debug/release as well as on Windows7 with MSVC-10.0 32/64bit debug/release. </p> Ticket olli Fri, 08 Nov 2013 18:37:17 GMT attachment set https://svn.boost.org/trac10/ticket/9355 https://svn.boost.org/trac10/ticket/9355 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test.cpp</span> </li> </ul> Ticket craig@… Fri, 08 Nov 2013 20:42:40 GMT <link>https://svn.boost.org/trac10/ticket/9355#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:8</guid> <description> <p> Thank you for looking into the issue and know your time is probably precious. Your modified code does not produce the problem as you have identified. The batch script was used for a reason that I should have made clear. The issue occurs every N runs of the application. Be it some optimization or the likes but performing a loop like you have added does not cause the crash. Its a first time sort of issue annoyingly so tricky to track. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>olli</dc:creator> <pubDate>Fri, 08 Nov 2013 21:31:10 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:9</guid> <description> <p> I've also used the batch script - after the 555 invocation of the test-app (which itself runs your test code in a loop 1000x) I did not get a crash. Please not that I've only MSVC-10.0 - maybe yo should verify the code with MSVC-10.0 on your system. </p> </description> <category>Ticket</category> </item> <item> <author>craig@…</author> <pubDate>Fri, 08 Nov 2013 21:49:36 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:10</guid> <description> <p> I have tested the issue under vc2010 on a different system and can also reproduce the issue. This system has a Intel Centrino 2 core and does not experience the crash when only 2 threads are utilized. With 4 threads the issue however occurs but due to whatever the cause may be this processor will crash between 580-1120 executions of the program so is not a good platform to find or test on. I will try and test an i5 4670 at the weekend and see if it similar to the i7-3960 of the original system. </p> <p> New code: </p> <pre class="wiki">#include &lt;queue&gt; #include &lt;boost/atomic.hpp&gt; #include &lt;boost/bind.hpp&gt; #include &lt;boost/coroutine/all.hpp&gt; #include &lt;boost/lockfree/queue.hpp&gt; #include &lt;boost/thread.hpp&gt; typedef boost::coroutines::coroutine&lt; void &gt; coro_t; static void foo( coro_t::push_type&amp; yield, int i ) {} struct Worker { boost::atomic&lt; bool &gt; done; Worker() : done( false) {} void operator()() { while ( ! done) { pending.consume_all( [&amp;]( int i) { coro_t::pull_type( boost::bind( foo, _1, i) ); } ); } } boost::lockfree::queue&lt; int, boost::lockfree::capacity&lt; 1024 &gt; &gt; pending; }; int main( int argc, char * argv[]) { const uint32_t kCount = 4; Worker workers[kCount]; boost::thread threads[kCount]; for ( uint32_t i = 0; i &lt; kCount; ++i ) threads[i] = boost::thread( boost::ref( workers[i]) ); for ( uint32_t i = 0; i &lt; kCount; ++i ) workers[i].pending.push(i); for ( uint32_t i = 0; i &lt; kCount; ++i ) workers[i].done = true; for ( uint32_t i = 0; i &lt; kCount; ++i ) threads[i].join(); return EXIT_SUCCESS; } </pre> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Sat, 09 Nov 2013 18:07:46 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:11 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:11</guid> <description> <p> Okay so I have tested a Haswell i5-4670 and with 2 thread the issue too 53,161 executions to occur, however up the test to 4 threads and the crash occurs every 2-4 executions! I don't know that that really indicates as on the i7 two threads was causing the issue so could be amplified by hyper-threading somehow. </p> <p> I don't like the 'fix' but putting a mutex lock around the constructor for the coroutine is the only work around I have currently found </p> <pre class="wiki">{ boost::lock_guard&lt;boost::recursive_mutex&gt; lock(m_guard); coro_t::pull_type( boost::bind( foo, _1, i) ); } </pre> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Wed, 13 Nov 2013 08:17:04 GMT</pubDate> <title>status changed; resolution deleted https://svn.boost.org/trac10/ticket/9355#comment:12 https://svn.boost.org/trac10/ticket/9355#comment:12 <ul> <li><strong>status</strong> <span class="trac-field-old">closed</span> → <span class="trac-field-new">reopened</span> </li> <li><strong>resolution</strong> <span class="trac-field-deleted">worksforme</span> </li> </ul> <p> I have been drilling down into the code and have located the offending problem in standard_stack_allocator_windows.cpp @ line 63: </p> <pre class="wiki">SYSTEM_INFO system_info() { static SYSTEM_INFO si = system_info_(); return si; } </pre><p> This function is called in standard_stack_allocator::allocate(...) via detail::pagesize(). The issue here is the use of a static variable in an unsafe manner between threads which explains why it might only fall over the first time and then run without fail. </p> <p> The first thread is initializing the static value and writing to the structure. A second thread is then already reading/read data from it. This means pagesize() is returning an uninitialized variable. </p> <p> I don't know the performance characteristics of <a class="missing wiki">GetSystemInfo</a>() but a very simple work-around for the crashes is to remove the 'static' keyword: </p> <pre class="wiki">/*static*/ SYSTEM_INFO si = system_info_(); </pre><p> A better solution is a critical section though as <a class="missing wiki">GetSystemInfo</a>() may take an unkown time which I think is the root to the issue to begin with. </p> Ticket Craig Hutchinson <craig@…> Wed, 13 Nov 2013 08:21:34 GMT <link>https://svn.boost.org/trac10/ticket/9355#comment:13 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:13</guid> <description> <p> I think this answers the issue and its a compiler problem of sorts and fact this person reports most compilers still don't implement static variable concurrency safely: <a class="ext-link" href="http://stackoverflow.com/a/4590634"><span class="icon">​</span>http://stackoverflow.com/a/4590634</a> </p> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Wed, 13 Nov 2013 08:48:05 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:14 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:14</guid> <description> <p> A mutex based solution would be: </p> <pre class="wiki">SYSTEM_INFO system_info() { boost::lock_guard&lt;boost::recursive_mutex&gt; lock(m_guard); static SYSTEM_INFO si = system_info_(); return si; } </pre><p> This isn't optimal though as there should be a way to only block on first entry to the function somehow. </p> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Wed, 13 Nov 2013 09:11:38 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:15 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:15</guid> <description> <p> A simpler solution is to put the static into global scope it would appear so that the data is obtained prior to co-routine constructors. It works, and I think to get a race condition on a global static isn't really ever going to happen unless somebody created lots of global static threads too which would be sort of silly. </p> </description> <category>Ticket</category> </item> <item> <author>Craig Hutchinson <craig@…></author> <pubDate>Wed, 13 Nov 2013 09:33:38 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9355#comment:16 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9355#comment:16</guid> <description> <p> Okay so I have learnt lots of new things today. I have a pretty good solution wihtout the performance issue of a mutex. I think boost probably has a call_once too but used std in this test: </p> <pre class="wiki">#include &lt;mutex&gt; ... static std::once_flag l_onceFlag; SYSTEM_INFO system_info() { static SYSTEM_INFO si; std::call_once(l_onceFlag, [&amp;] { si = system_info_(); }); return si; } </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>olli</dc:creator> <pubDate>Wed, 13 Nov 2013 16:33:50 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/9355#comment:17 https://svn.boost.org/trac10/ticket/9355#comment:17 <ul> <li><strong>status</strong> <span class="trac-field-old">reopened</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> thx, the bug is fixed in boost-trunk, please verify </p> Ticket