Boost C++ Libraries: Ticket #8538: asio: race condition for epoll & kpoll https://svn.boost.org/trac10/ticket/8538 <p> Please consider following code in boost\asio\detail\impl\epoll_reactor.ipp: </p> <pre class="wiki">345: void epoll_reactor::deregister_internal_descriptor(socket_type descriptor, 346: epoll_reactor::per_descriptor_data&amp; descriptor_data) 347: { ... 351: mutex::scoped_lock descriptor_lock(descriptor_data-&gt;mutex_); ... 365: descriptor_lock.unlock(); 366: free_descriptor_state(descriptor_data); 367: descriptor_data = 0; 368: } </pre><p> Lets assume, that <em>thread 1</em> executes epoll_reactor::deregister_internal_descriptor and successfully acquired lock (descriptor_data-&gt;mutex_), while <em>thread 2</em> is waiting to acquire same lock in this or different function (epoll_reactor::start_op, epoll_reactor::cancel_ops). </p> <p> As soon as <em>thread 1</em> executes <strong>descriptor_lock.unlock</strong>(); on line 365, <em>thread 2</em> will acquire descriptor_data-&gt;mutex_ it was waiting for. </p> <p> At this point <em>thread 1</em> executes <strong>free_descriptor_state</strong> (line 366) effectively destroying mutex, which means that when <em>thread 2</em> will attempt to release mutex, as part of mutex::scoped_lock dtor, it might seg fault, because memory used for mutex already might have been allocated for something else by <em>thread 1</em>. </p> <p> Above I quoted code fragment from <em>epoll_reactor::deregister_internal_descriptor</em>, but exactly same issue is present in void <em>epoll_reactor::deregister_descriptor</em>. </p> <p> I am attaching complete patch to this ticket. </p> <p> Furthermore, looks like same problem present in boost\asio\detail\impl\kqueue_reactor.ipp. </p> <p> Since I am building on Linux and not building kqueue facility I cannot provided validate patch for that, but I suspect it will be very similar to epoll's one. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/8538 Trac 1.4.3 Leonid Gershanovich <gleonid@…> Wed, 01 May 2013 20:37:16 GMT attachment set https://svn.boost.org/trac10/ticket/8538 https://svn.boost.org/trac10/ticket/8538 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">epoll_reactor.ipp.patch</span> </li> </ul> <p> patch for boost/asio/detail/impl/epoll_reactor.ipp </p> Ticket Leonid Gershanovich <gleonid@…> Wed, 01 May 2013 20:49:02 GMT <link>https://svn.boost.org/trac10/ticket/8538#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8538#comment:1</guid> <description> <p> I ran into above problem when one thread was calling read operations on socket, while another thread, on certain conditions, was calling <em>close</em> on the same socket object. </p> <p> I know that on one hand asio's documentation states that asio's <em>basic_datagram_socket</em>, <em>basic_stream_socket</em> are not thread safe for shared objects, but on the other hand - asio's code already has mutexes and locks. </p> <p> In my mind it would be logical to either completely remove locking, since documentation does not make any guarantees about thread safety, or make sure that existing locks perform protection correctly. </p> <p> Thanks for taking time to consider my thoughts. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>viboes</dc:creator> <pubDate>Mon, 13 May 2013 21:31:39 GMT</pubDate> <title>component changed; owner set https://svn.boost.org/trac10/ticket/8538#comment:2 https://svn.boost.org/trac10/ticket/8538#comment:2 <ul> <li><strong>owner</strong> set to <span class="trac-author">chris_kohlhoff</span> </li> <li><strong>component</strong> <span class="trac-field-old">None</span> → <span class="trac-field-new">asio</span> </li> </ul> Ticket anonymous Tue, 14 May 2013 19:54:08 GMT <link>https://svn.boost.org/trac10/ticket/8538#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8538#comment:3</guid> <description> <p> <code>free_descriptor_state</code> does not destroy the <code>descriptor_state</code>. Instead, if marks it as being reusable within the <code>object_pool</code>. Thus, the <code>mutex::scoped_lock</code> will safely unlock the safe mutex it had originally locked. The mutex is only used to protect internal state information. It would be unfortunate if applications had to pay for additional locking overhead when using a single thread or if the underlying sockets already provide thread-safety. </p> </description> <category>Ticket</category> </item> <item> <author>Leonid Gershanovich <gleonid@…></author> <pubDate>Tue, 14 May 2013 21:40:02 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8538#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8538#comment:4</guid> <description> <p> You are right, free_descriptor_state does not free memory, thus my initial statement that segfault can occur in mutex::scoped_lock dtor is incorrect. However, segfault does happen, although in slightly different place. </p> <p> in epoll_reactor::start_op you can see following code: </p> <pre class="wiki">217: mutex::scoped_lock descriptor_lock(descriptor_data-&gt;mutex_); 218: 219: if (descriptor_data-&gt;shutdown_) 220: { 221: post_immediate_completion(op); 222: return; 223: } </pre><p> If thread 1 executes epoll_reactor::deregister_internal_descriptor quoted in the begining, while thread 2 executes epoll_reactor::start_op above, thread 2 will segfaults on line 219 if thread 1 completes line 367 (descriptor_data = 0) before thread 2 starts executing line 219. That was actually the problem I have initially ran into. Sorry for confusion, I should have initially described the exact problem I had. </p> <p> Patch that I have attached on May 1st covers that. </p> <p> While I completely agree with "It would be unfortunate if applications had to pay for additional locking overhead when using a single thread or if the underlying sockets already provide thread-safety". I just like to point out that mutexes are already in asio code, my patch neither introduce any new locks nor makes locked section longer. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>chris_kohlhoff</dc:creator> <pubDate>Fri, 24 May 2013 02:46:04 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/8538#comment:5 https://svn.boost.org/trac10/ticket/8538#comment:5 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> <p> As you noted, your program violates the documented thread-safety requirements. The current implementation is not really relevant, and it could well change in the future. </p> Ticket