Opened 6 years ago

Closed 5 years ago

#12519 closed Patches (fixed)

boost::thread::try_join_for does not return after timeout

Reported by: mweb@… Owned by: viboes
Milestone: Boost 1.65.0 Component: thread
Version: Boost 1.61.0 Severity: Problem
Keywords: Cc:

Description

If we use BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC try_join_for does not abort after the given timeout anymore. If I remove the define it works again.

We use the define to be able to use sleep_for while changing the system time. (See #6787)

To Reproduce:

#define BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC

#include <iostream>
#include <boost/thread.hpp>

void test_func()
{
    boost::this_thread::sleep_for(boost::chrono::milliseconds(1500));
}

int main()
{
    boost::thread t(test_func);

    if (!t.try_join_for(boost::chrono::milliseconds(50))) {
        std::cout << "OK" << std::endl;
    } else {
        std::cout << "FAILED" << std::endl;
    }

    if (t.try_join_for(boost::chrono::milliseconds(2000))) {
        std::cout << "OK" << std::endl;
    } else {
        std::cout << "FAILED" << std::endl;
    }
}

If the define is available the first try_join_for waits till the thread is finished, the expected behavior would be that the first try_join_for would return before the thread is finished.

Change History (27)

comment:1 by viboes, 6 years ago

It seems that the reference ticket doesn't works yet.

comment:2 by viboes, 6 years ago

Owner: changed from Anthony Williams to viboes

comment:3 by viboes, 6 years ago

Just to be fixed, have you tried defining BOOST_THREAD_VERSION 4?

in reply to:  3 comment:4 by viboes, 6 years ago

Replying to viboes:

Just to be fixed, have you tried defining BOOST_THREAD_VERSION 4?

Forget this comment, please.

comment:7 by viboes, 6 years ago

Milestone: To Be DeterminedBoost 1.64.0

comment:8 by viboes, 6 years ago

Status: newassigned

comment:9 by lumosimann@…, 6 years ago

We now found a temporary fix for this problem. Please check

https://gist.github.com/lukasm91/9732228f33e92963bb8762dac519e79f

Note that there are two bugs involved, one in sleep_for, and one in try_join_for. We found two functions where a differentiation between monotonic and non-monotonic time was missing, such that some functions always used monotonic, and others always used non-monotonic time.

There is also an updated test file attached. Consider that when you undefine BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, this currently needs to be done in libs/thread/src/pthread/thread.cpp too, otherwise the compiler won't find the correct template specialization. The old behavior can be reactivated by undefining USEFIXES.

@viboes Could you please check this fix and incorperate it into boost? We know that most probably a few #ifdefs are missing, but this fix currently solves our problems.

comment:10 by lumosimann@…, 6 years ago

I have just added a set of google tests to the gist mentioned above (googletests.cpp).

Note that I am not sure whether this is all the expected behavior. Especially, I am unsure about what the expected behavior is when I set the time backwards while a sleep_for is running and BOOST_THREAD_HAS... is undefined.

comment:11 by viboes, 6 years ago

I've rolled back the last change as it seems it introduce a regression.

https://github.com/boostorg/thread/commit/9bbf9bed80836282263ec0bdea0adf5c1fa3621b

comment:12 by viboes, 6 years ago

Type: BugsPatches

Thanks a lot for the analysis. Yes, I missed this, and mixed monotonic and non-monotonic clocks :(

I'll try to merge your changes as soon as possible.

comment:13 by viboes, 6 years ago

I have some trouble with https://gist.github.com/lukasm91/9732228f33e92963bb8762dac519e79f#file-boost_12510-patch-L53

-#ifdef BOOST_THREAD_USES_CHRONO
 #include <boost/chrono/system_clocks.hpp>
 #include <boost/chrono/ceil.hpp>
-#endif

Why do you need to remove these lines?

in reply to:  13 comment:14 by viboes, 6 years ago

Replying to viboes:

I have some trouble with https://gist.github.com/lukasm91/9732228f33e92963bb8762dac519e79f#file-boost_12510-patch-L53

-#ifdef BOOST_THREAD_USES_CHRONO
 #include <boost/chrono/system_clocks.hpp>
 #include <boost/chrono/ceil.hpp>
-#endif

Oh, I see that it wouldn't compile if not Chrono is not there :(

I would need to change this because unfortunately Boost.Thread can be used without Boost.Chrono.

Why do you need to remove these lines?

comment:15 by viboes, 6 years ago

I've committed

https://github.com/boostorg/thread/commit/c7348b29cf8bfa1272645d04784419d37e1e7db5

Please, could you tell me if fixes the issue.

You will need to compile Boost.Thread after uncommenting in the config.hpp file or compiling everything with -D

//#define BOOST_THREAD_USEFIXES_TIMESPEC
//#define BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC

As the commit says, this is a partial solution as it needs to define some compiler defines, but at least it could be used to check if the solution works.

I order to don't need the flags I would need to be able to detect BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC and move inline, everything that could depend on it.

I believe that I can remove BOOST_THREAD_USEFIXES_TIMESPEC, but is a little bit too late for boost 1.64.

comment:16 by lumosimann@…, 6 years ago

I now get a segfault due to infinite recursion.

This is consequence of https://github.com/boostorg/thread/commit/c7348b29cf8bfa1272645d04784419d37e1e7db5#diff-8a73888d7fa5a2b2b38f85db61376ba4L517

This line is for WIN32, I don't know whether this makes sense there, but for sure you need to add the fix in the linux section too, i.e.

diff --git a/include/boost/thread/detail/thread.hpp b/include/boost/thread/detail/thread.hpp
index 8dc6a9c..a67e8ea 100644
--- a/include/boost/thread/detail/thread.hpp
+++ b/include/boost/thread/detail/thread.hpp
@@ -542,7 +542,7 @@ namespace boost
         }
 #endif
 #ifdef BOOST_THREAD_USES_CHRONO
-        bool try_join_until(const chrono::time_point<chrono::system_clock, chrono::nanoseconds>& tp)
+        bool try_join_until(const chrono::time_point<my_clock_t, chrono::nanoseconds>& tp)
         {
           using namespace chrono;
           nanoseconds d = tp.time_since_epoch();

Otherwise we will infinitely recurse into try_join_until.

Although this fix is only temporary, I would recommend renaming my_clock_t and move it to the detail-namespace.

Could this also be a fix for https://svn.boost.org/trac/boost/ticket/6787?

comment:17 by lumosimann@…, 6 years ago

Will this problem be fixed until Boost 1.65?

We would still get a segfault when adding BOOST_THREAD_USEFIXED_TIMESPEC because the patch has not been applied correctly.

I think the effort is very little because the solution is almost finished except that it needs to be rewritten in "boost-style" (e.g., probably we should not use a typedef my_clock_t).

comment:18 by anton.indrawan@…, 5 years ago

I did get the recursive call as well and the patch by lumosimann solved the problem. I also vote it to be integrated into the next release (Boost 1.65).

comment:19 by anton.indrawan@…, 5 years ago

@viboes: I see the following commit:

https://github.com/boostorg/thread/commit/30dff7f84ac5731fabeb6727f3b947ac12c3508e

I am wondering why you replace timespec_now by timespec_now_realtime. My application hangs and it blocks indefinitely in boost::condition_variable::do_wait_for.

comment:20 by viboes, 5 years ago

Please, could you tell me if the problem is there yet?

There were a lot of commits since then, in particular

https://github.com/boostorg/thread/commit/bf4b38b0af892ac6e5e1f2daaf8f21b80da4d311

comment:21 by lumosimann@…, 5 years ago

Yes, the problem is still around, because the patch has not been applied correctly. Your commits are unrelated to this.

The patch I submitted in post #comment:9 still works for me. You have applied this patch, but as I have already mentioned in comment #comment:16, you applied the patch at the wrong place. I have provided another patch in that comment which works at least for me.

I would recommend to

  1. Fix what I posted in comment #16
  2. Remove BOOST_THREAD_USEFIXES_TIMESPEC and make it default
  3. Rename my_clock_t into something more descriptive and move into detail namespace
  4. Test whether and how this fix is also needed for windows (this I really don't know)

As I mentionned in #comment:9, there are also unittests in the github-gist given in that comment (I would post it again, but the bug tracking system does not allow me to post that link to github).

When you run those tests, you will see that BOOST_THREAD_USEFIXES_TIMESPEC + BOOST_THREAD_CONDATTR_SET_CLOCK_MONOTONIC currently results in a segfault, and all other combinations result in at least one error.

The fix in comment #comment:16 plus defining BOOST_THREAD_USEFIXES_TIMESPEC resolves all unittests with and without BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC set accordingly.

Doing this would solve both issues #12519 and #6787. Therefore I would strongly vote to integrate this into the upcoming boost version.

comment:22 by viboes, 5 years ago

I will take a look again to the patches. Please, could you provide a github PR?

comment:23 by viboes, 5 years ago

I would

  1. take in account what you posted in comment #16 (I forget it surely)
  2. Define BOOST_THREAD_USEFIXES_TIMESPEC for the time being
  3. Rename my_clock_t into detail::internal_clock_t

I have no time now to check for windows :(

comment:25 by lumosimann@…, 5 years ago

I had a look at your commit. Seems fine to me and works with my tests.

You could post this to #6787 as well, which is likely to be fixed by now.

comment:26 by viboes, 5 years ago

Milestone: Boost 1.64.0Boost 1.65.0

comment:27 by viboes, 5 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.