Opened 13 years ago

Closed 13 years ago

#3670 closed Bugs (fixed)

endless loop in dev_poll_reactor on Solaris

Reported by: dirkmoermans@… Owned by: chris_kohlhoff
Milestone: Boost 1.42.0 Component: asio
Version: Boost 1.41.0 Severity: Regression
Keywords: Cc:

Description

The following example-program exposes high CPU-usage on Solaris 10, despite not doing anything useful while sleeping. Top reports a CPU-usage between 10 and 25% on a 4 CPU-box.

In order to reproduce, a TCP-server has to be listening on the configured host-port.

#include <boost/asio.hpp>
#include <boost/thread/thread.hpp>
#include <time.h>
#include <vector>
#include <iostream>

// g++ -pthread -g2 -I/opt/include/boost-1_41 -L/opt/lib -lboost_system-gcc43-mt-1_41 -lboost_thread-gcc43-mt-1_41 -lsocket -lnsl f.cc

// this program exposes a high CPU usage when run on solaris 10 using the dev_poll_reactor (the default).

const char HOSTNAME[]="localhost";
const char PORT[]="46000";
const int THREADS=3;
const int SOCKETS=10;
const int LOOPS=10;
const int SLEEPTIME=10; // seconds

void handler(const boost::system::error_code& error)
{
   std::cout << "socket connected '" << error << "'" << std::endl;
} 

int main()
{
   using namespace boost::asio::ip;

   boost::asio::io_service service;
   boost::asio::io_service::work work(service);
   boost::thread_group pool;
   for(int i=0;i<THREADS;++i)
      pool.create_thread(boost::bind(&boost::asio::io_service::run, boost::ref(service)));

   tcp::resolver resolver(service);
   tcp::resolver::query query(HOSTNAME,PORT);
   tcp::resolver::iterator result=resolver.resolve(query);
   const tcp::resolver::iterator end;
   if(result==end)
   {
      return -1;
   }

   for(int i=0;0<LOOPS;++i)
   {
      typedef std::vector<boost::shared_ptr<boost::asio::ip::tcp::socket> > Sockets;
      Sockets sockets;
      for(int j=0;j<SOCKETS;++j)
      {
         sockets.push_back(
            boost::shared_ptr<boost::asio::ip::tcp::socket>
            (new boost::asio::ip::tcp::socket(service,boost::asio::ip::tcp::v4())));
      }
      
      for(Sockets::iterator si=sockets.begin();
          si!=sockets.end();
          ++si)
      {
         (*si)->async_connect(*result,handler);
      }
      sleep(SLEEPTIME);
   }
}

The cause seems to be an endless loop in dev_poll_reactor::run

The problematic stacktrace:

#0  0xffffffff7e0d3a50 in _write () from /lib/64/libc.so.1
#1  0xffffffff7e0c4394 in write () from /lib/64/libc.so.1
#2  0x0000000100024258 in boost::asio::detail::dev_poll_reactor<false>::run (this=0x100131e80, block=true) at /opt/gnu4u_prince_ext/include/boost-1_41/boost/asio/detail/dev_poll_reactor.hpp:467
#3  0x0000000100024850 in boost::asio::detail::task_io_service<boost::asio::detail::dev_poll_reactor<false> >::do_one (this=0x10012f090, lock=@0xffffffff7d0fba28, this_idle_thread=0xffffffff7d0fba08, ec=@0xffffffff7d0fbb18)
    at /opt/gnu4u_prince_ext/include/boost-1_41/boost/asio/detail/task_io_service.hpp:260
#4  0x0000000100024adc in boost::asio::detail::task_io_service<boost::asio::detail::dev_poll_reactor<false> >::run (this=0x10012f090, ec=@0xffffffff7d0fbb18)
    at /opt/gnu4u_prince_ext/include/boost-1_41/boost/asio/detail/task_io_service.hpp:103
#5  0x0000000100024c34 in boost::asio::io_service::run (this=0xffffffff7ffff028) at /opt/gnu4u_prince_ext/include/boost-1_41/boost/asio/impl/io_service.ipp:58
#6  0x000000010000d33c in boost::_mfi::mf0<unsigned long, boost::asio::io_service>::operator() (this=0x100131b20, t=@0xffffffff7ffff028) at /opt/gnu4u_prince_ext/include/boost-1_41/boost/bind/mem_fn_template.hpp:68
#7  0x000000010000d400 in boost::_bi::list1<boost::reference_wrapper<boost::asio::io_service> >::operator()<unsigned long, boost::_mfi::mf0<unsigned long, boost::asio::io_service>, boost::_bi::list0> (this=0x100131b30, 
    f=@0x100131b20, a=@0xffffffff7d0fbd9f) at /opt/gnu4u_prince_ext/include/boost-1_41/boost/bind/bind.hpp:236
#8  0x000000010000d460 in boost::_bi::bind_t<unsigned long, boost::_mfi::mf0<unsigned long, boost::asio::io_service>, boost::_bi::list1<boost::reference_wrapper<boost::asio::io_service> > >::operator() (this=0x100131b20)
    at /opt/gnu4u_prince_ext/include/boost-1_41/boost/bind/bind_template.hpp:20
#9  0x000000010000d48c in boost::detail::thread_data<boost::_bi::bind_t<unsigned long, boost::_mfi::mf0<unsigned long, boost::asio::io_service>, boost::_bi::list1<boost::reference_wrapper<boost::asio::io_service> > > >::run (
    this=0x100131a50) at /opt/gnu4u_prince_ext/include/boost-1_41/boost/thread/detail/thread.hpp:56
#10 0xffffffff7f00ba30 in thread_proxy () from /opt/gnu4u_prince_ext/lib/libboost_thread-gcc43-mt-1_41.so.1.41.0
#11 0xffffffff7e0d24e0 in _lwp_start () from /lib/64/libc.so.1
#12 0xffffffff7e0d24e0 in _lwp_start () from /lib/64/libc.so.1

the local variables just after the write:

(gdb) info locals
ev = {fd = 13, events = 24, revents = 0}
result = 8
more_reads = false
more_writes = false
more_except = false
ec = {m_val = 0, m_cat = 0xffffffff7f4033a8}
descriptor = 13
i = 0
lock = {<boost::noncopyable_::noncopyable> = {<No data fields>}, mutex_ = @0x100131ea8, locked_ = true}
events_size = 0
timeout = -1
events = {{fd = 13, events = 28, revents = 4}, {fd = 14, events = 28, revents = 4}, {fd = 16, events = 28, revents = 4}, {fd = 0, events = 0, revents = 0} <repeats 125 times>}
dp = {dp_fds = 0xffffffff7d0fb2d0, dp_nfds = 128, dp_timeout = -1}
num_events = 3
sb = {<boost::noncopyable_::noncopyable> = {<No data fields>}, blocked_ = true, old_mask_ = {__sigbits = {0, 0, 0, 0}}}

The program works fine on Linux and works fine on Solaris when compiled with the flag -DBOOST_ASIO_DISABLE_DEV_POLL.

I am pretty sure that this behaviour has been present since at least boost 1.37.

Change History (5)

comment:1 by chris_kohlhoff, 13 years ago

I cannot reproduce the problem. However, please try the following fix and report the results:

Index: dev_poll_reactor.hpp =================================================================== --- dev_poll_reactor.hpp (revision 58628) +++ dev_poll_reactor.hpp (working copy) @@ -435,7 +435,6 @@

more_writes = write_op_queue_.has_operation(descriptor);

if ((events[i].events & (POLLERR | POLLHUP)) != 0

  • && (events[i].events & ~(POLLERR | POLLHUP)) == 0

&& !more_except && !more_reads && !more_writes)

{

If we have an event and no operations associated with the

comment:2 by dirkmoermans@…, 13 years ago

Thanks for the patch. I will try it out as soon as I get access to a solaris box (somewhere next week).

comment:3 by dirkmoermans@…, 13 years ago

The patch solves the problem for me.

I dumped some traces with the contents of events[i].events and got as output without the patch in an endless loop:

events[i].events=1c
events[i].events & ~(POLLERR | POLLHUP) = 4

4 corresponds to POLLOUT in sys/poll.h:

#define POLLOUT         0x0004          /* fd is writeable (won't block) */

I don't know why it is not reproducable, but it would be great if you would include your patch in the next boost-release.

comment:4 by chris_kohlhoff, 13 years ago

(In [58762]) Apply fix for reported excessive CPU usage under Solaris. Refs #3670.

comment:5 by chris_kohlhoff, 13 years ago

Resolution: fixed
Status: newclosed

(In [58883]) Merge from trunk. Fixes #3743, #3670, #3822.

........

r58670 | chris_kohlhoff | 2010-01-04 23:33:04 +1100 (Mon, 04 Jan 2010) | 2 lines

Fix example to compile with MSVC 10 beta 2.

........

r58671 | chris_kohlhoff | 2010-01-04 23:33:42 +1100 (Mon, 04 Jan 2010) | 2 lines

Fix Win64 warnings.

........

r58703 | chris_kohlhoff | 2010-01-05 22:51:41 +1100 (Tue, 05 Jan 2010) | 3 lines

Include boost/limits.hpp rather than <limits>, to support older compilers. Refs #3743.

........

r58704 | chris_kohlhoff | 2010-01-05 23:20:10 +1100 (Tue, 05 Jan 2010) | 2 lines

Drop back to second_clock if microsec_clock is unavailable. Refs #3743.

........

r58705 | chris_kohlhoff | 2010-01-05 23:50:39 +1100 (Tue, 05 Jan 2010) | 2 lines

Use sockatmark if SIOCATMARK is not defined. Refs #3743.

........

r58740 | chris_kohlhoff | 2010-01-06 13:38:39 +1100 (Wed, 06 Jan 2010) | 2 lines

Use buffer debugging workaround with MSVC 8 only.

........

r58761 | chris_kohlhoff | 2010-01-06 23:27:05 +1100 (Wed, 06 Jan 2010) | 2 lines

Disable iostreams-related functionality if BOOST_NO_IOSTREAMS is defined. Refs #3743.

........

r58762 | chris_kohlhoff | 2010-01-06 23:36:51 +1100 (Wed, 06 Jan 2010) | 2 lines

Apply fix for reported excessive CPU usage under Solaris. Refs #3670.

........

r58782 | chris_kohlhoff | 2010-01-07 09:44:48 +1100 (Thu, 07 Jan 2010) | 2 lines

Disable handler allocation and invocation hooks on g++ 2.x. Refs #3743.

........

r58793 | chris_kohlhoff | 2010-01-08 09:18:16 +1100 (Fri, 08 Jan 2010) | 2 lines

Support platforms that don't define INET6_ADDRSTRLEN. Refs #3743.

........

Note: See TracTickets for help on using tickets.