id summary reporter owner description type status milestone component version severity resolution keywords cc 12474 Using two different resolver instances on the same io_service causes a race condition michele.de.stefano@… chris_kohlhoff "Dear developer (or developers) of Boost Asio,[[BR]] [[BR]] I think I've found a bug in Asio. If I instantiate two different TCP resolvers on the same io_service and I use these two distinct resolvers for resolving the same endpoint, a race condition is generated.[[BR]] [[BR]] I have attached two sources (C++11 is required) that reproduce the issue. Unfortunately you need to run the programs many times to experience a failure ... I have bash scripts that allow me to run these programs thousands of times, each time using a different, random, free port on the local host.[[BR]] [[BR]] Quick instructions for running the tests:[[BR]] [[BR]] `test_engine_client_dbg_x -s -p `[[BR]] [[BR]] runs the server.[[BR]] [[BR]] `test_engine_client_dbg_x -c -p `[[BR]] [[BR]] Runs the client.[[BR]] [[BR]] `test_engine_client_dbg_2.cpp` is the source that fails (at line 364) because, with no apparent reason, we find (sometimes) the pointer to `mEngineClient` to be `NULL`. I've also verified that if in this code I insert a `while` loop that waits until `mEngineClient.get() != nullptr`, the code proceeds with no error (meaning that, at some point, that pointer is reset to the correct value). But, I repeat, there is no reason why `mEngineClient.get()` should be `NULL` in this point.[[BR]] [[BR]] `test_engine_client_dbg_4.cpp` is the source that works. Notice that this time I'm not instantiating a second resolver within the `EngineClient` class, but I'm simply passing the already-resolved endpoint iterator to the `EngineClient`'s constructor. With this change, we never loose the `mEngineClient` pointer.[[BR]] [[BR]] This race condition can be experienced only if we call `io_service.run()` from multiple threads. I've verified that this does not happen if we call a single `io_service.run()`.[[BR]] [[BR]] Also, I think it is difficult to be reproduced, because I've experienced it only on one specific machine that, maybe, has a different timing with respect to the others I have (because of its hardware). On this machine (which has Fedora 23 OS, with kernel 4.7.3 and gcc 5.3.1) I've also tried re-building by using gcc 4.8.5, but the issue is the same (so this is not a compiler bug). I've also tried, on the same machine, with Fedora 24 OS (yes ... I re-installed the OS) and so I was also able to test with gcc 6.1 and the issue comes out again. On other machines (I've tried also a CentOS6, with 8 cores and gcc 4.8.5) I was not able to reproduce this issue even running the tests thousands of times, while on the Fedora machine where I'm experiencing this issue, it happens basically for sure within 1000 runs.[[BR]] [[BR]] In summary, stress test is the only way to hope to experience this bug (but sometimes it also comes out at the first run)." Bugs new To Be Determined asio Boost 1.61.0 Problem race condition, data race