Opened 7 years ago
#11777 new Bugs
Doing async reads appears to do spurrious calls to recvmsg that return EAGAIN
Reported by: | Owned by: | chris_kohlhoff | |
---|---|---|---|
Milestone: | To Be Determined | Component: | asio |
Version: | Boost 1.54.0 | Severity: | Showstopper |
Keywords: | Cc: |
Description
In running an strace on a program that runs many threads, each controlling many streams, we see the following pattern when looking at one of the threads:
epoll_wait(31, ..., 128, -1) = 46 recvmsg(4114, ..., 0) = 2892 recvmsg(4114, ..., 0) = -1 EAGAIN recvmsg(3700, ..., 0) = 16768 recvmsg(3700, ..., 0) = -1 EAGAIN and so on for all 46 sockets.
When doing an strace -c on the process, where the bulk (90+%) of the threads are related to these sockets, I see 60% of the time in 27624 calls to write (we read data, process it, and write it to another socket), 34% for 52153 calls to recvmsg (of which 24526 are ERRORS), and 6% are in 4668 calls to epoll_wait.
We are noticing that the bulk of the applications time is system time when monitoring with the times() system call. Calling system calls that return immediately with errors is a good way to cause this, especially if there are buffers that are mapped or tested for validity before the EAGAIN check is done.
Looking at the release notes for subsequent versions doesn't indicate a fix towards this issue.
Unfortunately, the programs are proprietary, and the data streams are proprietary streams on our customers' networks.
If the epoll uses edge triggered events, you will get a notification if the first read was a partial, so you don't need to do the extra read.
Even though this is an optimization, this is causing us a major performance issue on a large number of machines, thus, the Showstopper severity.