Opened 6 years ago
Last modified 6 years ago
#12474 new Bugs
Using two different resolver instances on the same io_service causes a race condition
Reported by: | Owned by: | chris_kohlhoff | |
---|---|---|---|
Milestone: | To Be Determined | Component: | asio |
Version: | Boost 1.61.0 | Severity: | Problem |
Keywords: | race condition, data race | Cc: |
Description
Dear developer (or developers) of Boost Asio,
I think I've found a bug in Asio. If I instantiate two different TCP resolvers on the same io_service and I use these two distinct resolvers for resolving the same endpoint, a race condition is generated.
I have attached two sources (C++11 is required) that reproduce the issue. Unfortunately you need to run the programs many times to experience a failure ... I have bash scripts that allow me to run these programs thousands of times, each time using a different, random, free port on the local host.
Quick instructions for running the tests:
test_engine_client_dbg_x -s -p <port>
runs the server.
test_engine_client_dbg_x -c -p <port>
Runs the client.
test_engine_client_dbg_2.cpp
is the source that fails (at line 364) because, with no apparent reason, we find (sometimes) the pointer to mEngineClient
to be NULL
. I've also verified that if in this code I insert a while
loop that waits until mEngineClient.get() != nullptr
, the code proceeds with no error (meaning that, at some point, that pointer is reset to the correct value). But, I repeat, there is no reason why mEngineClient.get()
should be NULL
in this point.
test_engine_client_dbg_4.cpp
is the source that works. Notice that this time I'm not instantiating a second resolver within the EngineClient
class, but I'm simply passing the already-resolved endpoint iterator to the EngineClient
's constructor. With this change, we never loose the mEngineClient
pointer.
This race condition can be experienced only if we call io_service.run()
from multiple threads. I've verified that this does not happen if we call a single io_service.run()
.
Also, I think it is difficult to be reproduced, because I've experienced it only on one specific machine that, maybe, has a different timing with respect to the others I have (because of its hardware). On this machine (which has Fedora 23 OS, with kernel 4.7.3 and gcc 5.3.1) I've also tried re-building by using gcc 4.8.5, but the issue is the same (so this is not a compiler bug). I've also tried, on the same machine, with Fedora 24 OS (yes ... I re-installed the OS) and so I was also able to test with gcc 6.1 and the issue comes out again. On other machines (I've tried also a CentOS6, with 8 cores and gcc 4.8.5) I was not able to reproduce this issue even running the tests thousands of times, while on the Fedora machine where I'm experiencing this issue, it happens basically for sure within 1000 runs.
In summary, stress test is the only way to hope to experience this bug (but sometimes it also comes out at the first run).
Attachments (2)
Change History (3)
by , 6 years ago
Attachment: | test_engine_client_dbg_2.cpp added |
---|
by , 6 years ago
Attachment: | test_engine_client_dbg_4.cpp added |
---|
This is the code that does NOT have the race condition
comment:1 by , 6 years ago
Simply wanted to add that I've also tried with Boost 1.59.0 and the issue was there already.
This is the code that has the race condition