Opened 8 years ago
Last modified 8 years ago
#11069 new Bugs
io_service hangs for 5 minutes
Reported by: | Owned by: | chris_kohlhoff | |
---|---|---|---|
Milestone: | To Be Determined | Component: | asio |
Version: | Boost 1.55.0 | Severity: | Problem |
Keywords: | Cc: |
Description
Everything said below is applicable to Linux, 64-bit, CentOS 6.6, compiled by g++ 4.4.
I have an io_service object which is runned by several threads. According to the logic at the program finish all sockets which belong to this io_service are closed and these threads begin to exit one by one.
Just before threads exits, it posts on the io_service a callback which joins this thread from another thread and cleans up some associated data.
Everything runs perfect until only one thread is left. Occasionally this thread is not running any posted event for exactly 5 minutes. It is just waiting for something. After 5 minutes it wakes up, joins all the the threads which were posted, and successfully exits. It doesn't happen all the time, but approx. every 50 program executions I get this situations.
According to asio sources, epoll reactor has some timeout which equals to exactly 5 minutes. It seems that there is a bug somewhere.
P.S.: Meanwhile the main program thread waits for condition_variable which is set by the last thread running io_service, which signals that all the threads exited.
The bug was fixed. It is a race condition. It happens in following conditions: 1) io_service::run_one() is used to execute works; 2) only two threads calling run_one() left; 3) one of these threads is executing handler completion; 4) handler posts another operation into the op_queue and after it the first thread doesn't call run_one() anymore. 5) meanwhile after the first thread started handler completion but before handler posted the new op, the second thread is running reactor (epoll_reactor in my case).
After the first thread has executed the handler completion, it should check for new works, and if there are some new works, wake up a thread or interrupt reactor running. I put this logic into work_cleanup object.