Opened 6 years ago

Last modified 6 years ago

#12298 new Bugs

epool_wait hang

Reported by: baibin <406455861@…> Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.53.0 Severity: Problem
Keywords: epool_wait Cc:

Description

We use sofarpc (https://github.com/baidu/sofa-pbrpc ) which uses boost asio as network lib.

code like

boost::asio::io_service io_ser; auto work = new boost::asio::io_service::work(io_ser);

after I delete work, io_ser still in run function and never stop. so i print io_ser class member as below

(gdb) p *(boost::asio::detail::task_io_service * const) 0x22738e0 $26 = {<boost::asio::detail::service_base<boost::asio::detail::task_io_service>> = {<boost::asio::io_service::service> = {<boost::noncopyable_::noncopyable> = {<No data fields>},

_vptr.service = 0xa24a90 <vtable for boost::asio::detail::task_io_service+16>, key_ = {type_info_ = 0xa245a0 <typeinfo for boost::asio::detail::typeid_wrapper<boost::asio::detail::task_io_service>>,

id_ = 0x0}, owner_ = @0x228f6d0, next_ = 0x0}, static id = {<boost::asio::io_service::id> = {<boost::noncopyable_::noncopyable> = {<No data fields>}, <No data fields>}, <No data fields>}},

one_thread_ = false, mutex_ = {<boost::noncopyable_::noncopyable> = {<No data fields>}, mutex_ = {data = {lock = 0, count = 0, owner = 0, nusers = 7, kind = 0, spins = 0, list = {

prev = 0x0, next = 0x0}}, size = '\000' <repeats 12 times>, "\a", '\000' <repeats 26 times>, align = 0}}, task_ = 0x2266e10,

task_operation_ = {<boost::asio::detail::task_io_service_operation> = {next_ = 0x0, func_ = 0x0, task_result_ = 0}, <No data fields>}, task_interrupted_ = false, outstanding_work_ = {value_ = 3}, op_queue_ = {<boost::noncopyable_::noncopyable> = {<No data fields>}, front_ = 0x0, back_ = 0x0}, stopped_ = false, shutdown_ = false, first_idle_thread_ = 0x7f182991fce0}

and I found one of the thread stack as belows Thread 6 (Thread 0x7f182a321700 (LWP 30142)): #0 0x00007f1833fb2163 in epoll_wait () from /lib64/libc.so.6 #1 0x000000000061f888 in boost::asio::detail::epoll_reactor::run (this=0x2266e10, block=<optimized out>, ops=...) at /usr/local/include/boost/asio/detail/impl/epoll_reactor.ipp:392 #2 0x0000000000624671 in boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=0x22738e0) at /usr/local/include/boost/asio/detail/impl/task_io_service.ipp:396 #3 boost::asio::detail::task_io_service::run (this=0x22738e0, ec=...) at /usr/local/include/boost/asio/detail/impl/task_io_service.ipp:153 #4 0x000000000062521e in boost::asio::io_service::run (this=0x228f6d0) at /usr/local/include/boost/asio/impl/io_service.ipp:59 #5 sofa::pbrpc::ThreadGroupImpl::thread_run (param=0x22bf100) at src/sofa/pbrpc/thread_group_impl.h:263 #6 0x00007f1834e7a9d1 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1833fb1b6d in clone () from /lib64/libc.so.6

because of its hanging, i want see if i see time_out parameter of epoll_wait function is -1. so i print epoll_reactor::timer_fd_ (gdb) p timer_fd_ $6 = 515 so we don't use time_out a normal value . so why epoll_wait never has a return ??

Change History (3)

comment:1 by baibin <406455861@…>, 6 years ago

sorry i summit twice .same with #12297

comment:1 by baibin <406455861@…>, 6 years ago

sorry i summit twice .same with #12297

comment:2 by mail2tao@…, 6 years ago

I met similar issue: All data has been sent to remote endpoint (remote endpoint has successfully received all of them), but sending callback not be called because of epoll_wait hang.

BTW, seems the master branch has fixed this issue.

Note: See TracTickets for help on using tickets.