Boost C++ Libraries: Ticket #8323: boost::thread::try_join_for/try_join_until may block indefinitely due to a combination of problems in Boost.Thread and Boost.Chrono https://svn.boost.org/trac10/ticket/8323 <p> In the following, try_join_test may hang indefinitely (using Boost 1.50): </p> <div class="wiki-code"><div class="code"><pre><span class="kt">void</span> <span class="nf">very_busy_thread</span><span class="p">()</span> <span class="p">{</span> <span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span> <span class="n">boost</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">interruption_point</span><span class="p">();</span> <span class="p">}</span> <span class="kt">void</span> <span class="nf">try_join_test</span><span class="p">()</span> <span class="p">{</span> <span class="n">boost</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t</span><span class="p">(</span><span class="n">very_busy_thread</span><span class="p">);</span> <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="n">i</span><span class="o">&lt;</span><span class="mi">1000000</span><span class="p">;</span><span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="n">t</span><span class="p">.</span><span class="n">try_join_for</span><span class="p">(</span><span class="n">boost</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span> <span class="c1">// This call may hang indefinitely</span> <span class="p">}</span> <span class="n">t</span><span class="p">.</span><span class="n">interrupt</span><span class="p">();</span> <span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span> <span class="p">}</span> </pre></div></div><p> This has been reproduced on a real dual-core Intel Xeon system and also on 2 multicore virtual machines (one using <a class="missing wiki">VirtualBox</a>, another with KVM). </p> <p> Here's my (incomplete, possibly incorrect) analysis of the problem: </p> <p> thread.try_join_for(boost::chrono::milliseconds(0)) called on thread which is still running </p> <div class="wiki-code"><div class="code"><pre><span class="kt">bool</span> <span class="nf">try_join_for</span><span class="p">(</span><span class="k">const</span> <span class="n">chrono</span><span class="o">::</span><span class="n">duration</span><span class="o">&lt;</span><span class="n">Rep</span><span class="p">,</span> <span class="n">Period</span><span class="o">&gt;&amp;</span> <span class="n">rel_time</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">try_join_until</span><span class="p">(</span><span class="n">chrono</span><span class="o">::</span><span class="n">steady_clock</span><span class="o">::</span><span class="n">now</span><span class="p">()</span> <span class="o">+</span> <span class="n">rel_time</span><span class="p">);</span> <span class="p">}</span> </pre></div></div><p> steady_clock comes into play, which calls <a class="missing wiki">QueryPerformanceCounter</a> the "steadiness" of which is disputable </p> <p> here Clock is steady_clock </p> <blockquote> <p> t = rep_ = 5591678518200 </p> </blockquote> <blockquote> <p> system_clock::time_point s_now = system_clock::now() = rep_=130084071987844000; </p> </blockquote> <blockquote> <p> typename Clock::time_point c_now = Clock::now() = rep_ = 5591679763370 which is LESS THAN t = rep_ = 5591678518200 (<a class="missing wiki">QueryPerformanceCounter</a> says hello) </p> </blockquote> <blockquote> <p> try_join_until(s_now + ceil&lt;nanoseconds&gt;(t - c_now)) is called with a system time in the PAST AND in a past ~1msec ago (possibly important!) </p> </blockquote> <p> thread.cpp:342: </p> <div class="wiki-code"><div class="code"><pre> <span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span> <span class="n">rel_time</span><span class="o">=</span> <span class="n">chrono</span><span class="o">::</span><span class="n">ceil</span><span class="o">&lt;</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="o">&gt;</span><span class="p">(</span><span class="n">tp</span><span class="o">-</span><span class="n">chrono</span><span class="o">::</span><span class="n">system_clock</span><span class="o">::</span><span class="n">now</span><span class="p">());</span> </pre></div></div><p> rel_time becomes EXACTLY -1; </p> <p> detail::timeout(uintmax_t milliseconds_) accepts a _signed_ integer, converting it to a huge number; </p> <p> target_time.is_sentinel() == false </p> <p> target_time.remaining_milliseconds(), time_left.milliseconds &gt; min_timer_wait_period is false (thread.cpp:482) </p> <p> using_timer == false (not using <a class="missing wiki">SetWaitableTimer</a>) </p> <p> target_time becomes detail::timeout(time_left.milliseconds) (a huge number) (thread.cpp:501) </p> <p> is_sentinel() returns true -&gt; remaining_milliseconds returns remaining_time(win32::infinite) (thread_data.hpp:151) </p> <p> time_left.milliseconds becomes a large number </p> <p> using_timer is false, time_left.milliseconds is used as timeout which is huge </p> <p> <a class="missing wiki">WaitForMultipleObjects</a> waits for a nearly-infinite amount of time </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/8323 Trac 1.4.3 viboes Sat, 23 Mar 2013 00:44:59 GMT <link>https://svn.boost.org/trac10/ticket/8323#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:1</guid> <description> <p> Hi, </p> <p> It is strange to me that successive calls to steady_clock::now() are not monotonic. The trace you gave got the second call to steady_clock::now() 5591679763370 which is GT the fist one 5591678518200 </p> <pre class="wiki"> t = rep_ = 5591678518200 typename Clock::time_point c_now = Clock::now() = rep_ = 5591679763370 ^ </pre><p> What am I missing? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Sat, 23 Mar 2013 00:49:46 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8323#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:2</guid> <description> <p> Maybe the problem is that the system_clock is taken too early </p> <p> Coulod you try </p> <pre class="wiki"> typename Clock::time_point c_now = Clock::now(); return try_join_until(system_clock::now() + ceil&lt;nanoseconds&gt;(t - c_now)); </pre><p> instead of </p> <pre class="wiki"> system_clock::time_point s_now = system_clock::now(); typename Clock::time_point c_now = Clock::now(); return try_join_until(s_now + ceil&lt;nanoseconds&gt;(t - c_now)); </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Thu, 28 Mar 2013 23:54:59 GMT</pubDate> <title>owner, status changed https://svn.boost.org/trac10/ticket/8323#comment:3 https://svn.boost.org/trac10/ticket/8323#comment:3 <ul> <li><strong>owner</strong> changed from <span class="trac-author">Anthony Williams</span> to <span class="trac-author">viboes</span> </li> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> Ticket viboes Sat, 30 Mar 2013 09:02:36 GMT <link>https://svn.boost.org/trac10/ticket/8323#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:4</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/8323#comment:1" title="Comment 1">viboes</a>: </p> <blockquote class="citation"> <p> Hi, </p> <p> It is strange to me that successive calls to steady_clock::now() are not monotonic. The trace you gave got the second call to steady_clock::now() 5591679763370 which is GT the fist one 5591678518200 </p> <pre class="wiki"> t = rep_ = 5591678518200 typename Clock::time_point c_now = Clock::now() = rep_ = 5591679763370 ^ </pre><p> What am I missing? </p> </blockquote> <p> Any news on this? </p> </description> <category>Ticket</category> </item> <item> <author>szakharchenko@…</author> <pubDate>Sat, 30 Mar 2013 11:40:12 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/8323 https://svn.boost.org/trac10/ticket/8323 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test.cpp</span> </li> </ul> <p> Simple test case (should hang with 99% certainty) </p> Ticket szakharchenko@… Sat, 30 Mar 2013 11:47:44 GMT <link>https://svn.boost.org/trac10/ticket/8323#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:5</guid> <description> <p> I've worked around my original problem, and can't offer more explanations or testing. I've attached a more deterministic testcase. In short, <a class="missing wiki">QueryPerformanceCounter</a> isn't steady at least on some multicore processors, nor it should be, steady_clock isn't steady, but even when all of this will be fixed some moment in the future in Boost.Chrono, Boost.Thread will hang when told to try_join_until(now-exactly 1 msec) while any other moment in the past seems fine. So this is actually a combination of two problems, you may want to clone this ticket or something. I can reproduce it any day (you should be able too, with the attached testcase). Following is another stack trace. </p> <pre class="wiki"> Frame 0: boost::detail::timeout::remaining_milliseconds() thread_data.hpp Line 151 is_sentinel must have returned true, going to return infinite wait time. this: {start=1714216 milliseconds=18446744073709551615 relative=true} Frame 1: boost::this_thread::interruptible_wait() thread.cpp Line 513 using_timer: false handle_to_wait_for: 0x38 (OK) handle_count: 1 wait_handle_index: 0 interruption_index: ~0u handles: [0x38,0,0] timeout_index: ~0u Frame 2: boost::thread::try_join_until() thread.cpp Line 343 local_thread_info: not null,thread_handle={handle_to_manage=0x00000038 } rel_time: {rep_=-1 } /* !!! */ rel_time is calculated with a difference of the system clock measurements on line 342, so it could have run backwards there as well. Frame 3: boost::thread::try_join_until&lt;Clock=steady&gt;() thread.hpp Line 355 t {d_={rep_=3089317882890 } } s_now {d_={rep_=130091140137592000 } c_now {d_={rep_=3089318990860 } t-c_now is 3089317882890-3089318990860 = -1107970, which gives a ceil&lt;milliseconds&gt; of -1. </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Sat, 30 Mar 2013 14:05:10 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8323#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:6</guid> <description> <p> I see the problem now. </p> <p> Could you try to replace </p> <pre class="wiki"> template &lt;class Clock, class Duration&gt; bool try_join_until(const chrono::time_point&lt;Clock, Duration&gt;&amp; t) { using namespace chrono; system_clock::time_point s_now = system_clock::now(); typename Clock::time_point c_now = Clock::now(); return try_join_until(s_now + ceil&lt;nanoseconds&gt;(t - c_now)); } </pre><p> by </p> <pre class="wiki"> template &lt;class Clock, class Duration&gt; bool try_join_until(const chrono::time_point&lt;Clock, Duration&gt;&amp; t) { using namespace chrono; system_clock::time_point s_now = system_clock::now(); bool joined= false; do { typename Clock::duration d = ceil&lt;nanoseconds&gt;(t-Clock::now()); if (d &lt;= Clock::duration::zero()) return false; // in case the Clock::time_point t is already reached joined = try_join_until(s_now + d); } while (! joined); return true; } </pre> </description> <category>Ticket</category> </item> <item> <author>szakharchenko@…</author> <pubDate>Sat, 30 Mar 2013 15:37:16 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8323#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:7</guid> <description> <p> This change doesn't affect the problem which is presented in the attached test case, because the try_join_until overload that you patched isn't selected if the time point is based on the system_clock (the next one, which mentions the system clock, is selected). It does work with the steady clock, though. </p> </description> <category>Ticket</category> </item> <item> <author>szakharchenko@…</author> <pubDate>Sat, 30 Mar 2013 15:37:16 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8323#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:7</guid> <description> <p> This change doesn't affect the problem which is presented in the attached test case, because the try_join_until overload that you patched isn't selected if the time point is based on the system_clock (the next one, which mentions the system clock, is selected). It does work with the steady clock, though. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Sat, 30 Mar 2013 18:30:55 GMT</pubDate> <title>milestone changed https://svn.boost.org/trac10/ticket/8323#comment:8 https://svn.boost.org/trac10/ticket/8323#comment:8 <ul> <li><strong>milestone</strong> <span class="trac-field-old">To Be Determined</span> → <span class="trac-field-new">Boost 1.54.0</span> </li> </ul> <p> Committed revision <a class="changeset" href="https://svn.boost.org/trac10/changeset/83652" title="Thread: check for time_points in the past on try_join_until to take ...">[83652]</a>. </p> Ticket viboes Sat, 30 Mar 2013 18:31:53 GMT <link>https://svn.boost.org/trac10/ticket/8323#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8323#comment:9</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/8323#comment:7" title="Comment 7">szakharchenko@…</a>: </p> <blockquote class="citation"> <p> This change doesn't affect the problem which is presented in the attached test case, because the try_join_until overload that you patched isn't selected if the time point is based on the system_clock (the next one, which mentions the system clock, is selected). It does work with the steady clock, though. </p> </blockquote> <p> This is another issue that could be discussed. Be free to create a new ticket. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Sun, 31 Mar 2013 10:56:54 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/8323#comment:11 https://svn.boost.org/trac10/ticket/8323#comment:11 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/83660" title="Thread: merge from trunk 1.54. Fix #8027,#8323,#8337.">[83660]</a>) Thread: merge from trunk 1.54. Fix <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8027" title="#8027: Bugs: thread library fails to compile with Visual Studio 2003 (closed: fixed)">#8027</a>,<a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8323" title="#8323: Bugs: boost::thread::try_join_for/try_join_until may block indefinitely due ... (closed: fixed)">#8323</a>,<a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8337" title="#8337: Bugs: The internal representation of &#34;std::string(this-&gt;code()-&gt;message())&#34; ... (closed: fixed)">#8337</a>. </p> Ticket