Opened 8 years ago

Last modified 8 years ago

#11069 new Bugs

io_service hangs for 5 minutes

Reported by: dmitrmax@… Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.55.0 Severity: Problem
Keywords: Cc:

Description

Everything said below is applicable to Linux, 64-bit, CentOS 6.6, compiled by g++ 4.4.

I have an io_service object which is runned by several threads. According to the logic at the program finish all sockets which belong to this io_service are closed and these threads begin to exit one by one.

Just before threads exits, it posts on the io_service a callback which joins this thread from another thread and cleans up some associated data.

Everything runs perfect until only one thread is left. Occasionally this thread is not running any posted event for exactly 5 minutes. It is just waiting for something. After 5 minutes it wakes up, joins all the the threads which were posted, and successfully exits. It doesn't happen all the time, but approx. every 50 program executions I get this situations.

According to asio sources, epoll reactor has some timeout which equals to exactly 5 minutes. It seems that there is a bug somewhere.

P.S.: Meanwhile the main program thread waits for condition_variable which is set by the last thread running io_service, which signals that all the threads exited.

Attachments (2)

boost.patch (2.0 KB ) - added by dmitrmax@… 8 years ago.
boost.2.patch (2.0 KB ) - added by dmitrmax@… 8 years ago.
V2 version of patch. First is not correct.

Download all attachments as: .zip

Change History (3)

by dmitrmax@…, 8 years ago

Attachment: boost.patch added

comment:1 by dmitrmax@…, 8 years ago

The bug was fixed. It is a race condition. It happens in following conditions: 1) io_service::run_one() is used to execute works; 2) only two threads calling run_one() left; 3) one of these threads is executing handler completion; 4) handler posts another operation into the op_queue and after it the first thread doesn't call run_one() anymore. 5) meanwhile after the first thread started handler completion but before handler posted the new op, the second thread is running reactor (epoll_reactor in my case).

After the first thread has executed the handler completion, it should check for new works, and if there are some new works, wake up a thread or interrupt reactor running. I put this logic into work_cleanup object.

by dmitrmax@…, 8 years ago

Attachment: boost.2.patch added

V2 version of patch. First is not correct.

Note: See TracTickets for help on using tickets.