Opened 9 years ago

Closed 9 years ago

#8538 closed Bugs (invalid)

asio: race condition for epoll & kpoll

Reported by: Leonid Gershanovich <gleonid@…> Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.53.0 Severity: Problem
Keywords: Cc:

Description

Please consider following code in boost\asio\detail\impl\epoll_reactor.ipp:

345: void epoll_reactor::deregister_internal_descriptor(socket_type descriptor,
346:     epoll_reactor::per_descriptor_data& descriptor_data)
347: {
...
351:     mutex::scoped_lock descriptor_lock(descriptor_data->mutex_);
...
365:     descriptor_lock.unlock();
366:     free_descriptor_state(descriptor_data);
367:     descriptor_data = 0;
368: }

Lets assume, that thread 1 executes epoll_reactor::deregister_internal_descriptor and successfully acquired lock (descriptor_data->mutex_), while thread 2 is waiting to acquire same lock in this or different function (epoll_reactor::start_op, epoll_reactor::cancel_ops).

As soon as thread 1 executes descriptor_lock.unlock(); on line 365, thread 2 will acquire descriptor_data->mutex_ it was waiting for.

At this point thread 1 executes free_descriptor_state (line 366) effectively destroying mutex, which means that when thread 2 will attempt to release mutex, as part of mutex::scoped_lock dtor, it might seg fault, because memory used for mutex already might have been allocated for something else by thread 1.

Above I quoted code fragment from epoll_reactor::deregister_internal_descriptor, but exactly same issue is present in void epoll_reactor::deregister_descriptor.

I am attaching complete patch to this ticket.

Furthermore, looks like same problem present in boost\asio\detail\impl\kqueue_reactor.ipp.

Since I am building on Linux and not building kqueue facility I cannot provided validate patch for that, but I suspect it will be very similar to epoll's one.

Attachments (1)

epoll_reactor.ipp.patch (4.7 KB ) - added by Leonid Gershanovich <gleonid@…> 9 years ago.
patch for boost/asio/detail/impl/epoll_reactor.ipp

Download all attachments as: .zip

Change History (6)

by Leonid Gershanovich <gleonid@…>, 9 years ago

Attachment: epoll_reactor.ipp.patch added

patch for boost/asio/detail/impl/epoll_reactor.ipp

comment:1 by Leonid Gershanovich <gleonid@…>, 9 years ago

I ran into above problem when one thread was calling read operations on socket, while another thread, on certain conditions, was calling close on the same socket object.

I know that on one hand asio's documentation states that asio's basic_datagram_socket, basic_stream_socket are not thread safe for shared objects, but on the other hand - asio's code already has mutexes and locks.

In my mind it would be logical to either completely remove locking, since documentation does not make any guarantees about thread safety, or make sure that existing locks perform protection correctly.

Thanks for taking time to consider my thoughts.

comment:2 by viboes, 9 years ago

Component: Noneasio
Owner: set to chris_kohlhoff

comment:3 by anonymous, 9 years ago

free_descriptor_state does not destroy the descriptor_state. Instead, if marks it as being reusable within the object_pool. Thus, the mutex::scoped_lock will safely unlock the safe mutex it had originally locked. The mutex is only used to protect internal state information. It would be unfortunate if applications had to pay for additional locking overhead when using a single thread or if the underlying sockets already provide thread-safety.

comment:4 by Leonid Gershanovich <gleonid@…>, 9 years ago

You are right, free_descriptor_state does not free memory, thus my initial statement that segfault can occur in mutex::scoped_lock dtor is incorrect. However, segfault does happen, although in slightly different place.

in epoll_reactor::start_op you can see following code:

217:  mutex::scoped_lock descriptor_lock(descriptor_data->mutex_);
218:
219:  if (descriptor_data->shutdown_)
220:  {
221:    post_immediate_completion(op);
222:    return;
223:  }

If thread 1 executes epoll_reactor::deregister_internal_descriptor quoted in the begining, while thread 2 executes epoll_reactor::start_op above, thread 2 will segfaults on line 219 if thread 1 completes line 367 (descriptor_data = 0) before thread 2 starts executing line 219. That was actually the problem I have initially ran into. Sorry for confusion, I should have initially described the exact problem I had.

Patch that I have attached on May 1st covers that.

While I completely agree with "It would be unfortunate if applications had to pay for additional locking overhead when using a single thread or if the underlying sockets already provide thread-safety". I just like to point out that mutexes are already in asio code, my patch neither introduce any new locks nor makes locked section longer.

comment:5 by chris_kohlhoff, 9 years ago

Resolution: invalid
Status: newclosed

As you noted, your program violates the documented thread-safety requirements. The current implementation is not really relevant, and it could well change in the future.

Note: See TracTickets for help on using tickets.