Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#12215 closed Bugs (fixed)

Boost.context: call stack corrupted on Windows using default fixedsize_stack

Reported by: runningwithscythes@… Owned by: olli
Milestone: To Be Determined Component: context
Version: Boost 1.61.0 Severity: Problem
Keywords: Cc:

Description

There is an issue in basic_fixedsize_stack since at least Boost 1.59 on Windows using MSVC2013 or MSVC2015 in debug builds only, causing wired crashes of seemingly totally unrelated Windows API calls and the like. The following simple unit test fails on any Windows machine I tested so far:

#define BOOST_COROUTINES_UNIDRECT
#define BOOST_COROUTINES_V2
#include <boost/coroutine2/coroutine.hpp>
// ...

using coro_t = boost::coroutines2::coroutine<int>;

BOOST_AUTO_TEST_CASE(test_windows_boost_bug)
{
  bool result = false;

  auto coro_function = [&](coro_t::push_type& sink) {
#if defined(PLATFORM_WINDOWS)
    char buffer[MAX_PATH];
    // The following simple Windows API call crashes when using MSVC
    // on Windows in debug build only.
    GetModuleFileName(nullptr, buffer, MAX_PATH);
    // Exception thrown at 0x00007FF939A21D58 (ntdll.dll) in
    // test.shift.task.x86_64.vc140.exe: 0xC0000005:
    // Access violation reading location 0xFFFFFFFFFFFFFFFF.

    result = true; // code not reached.
#endif
  };

  coro_t::pull_type{coro_function};
  BOOST_CHECK(result);
}

I stumbled across this bug several times but didn't try to fix it until I realized that it is still present in the recently released Boost 1.61.

Once the code crashes the full stack trace looks like this:

ntdll.dll!LdrGetDllFullName	Unknown
KernelBase.dll!GetModuleFileNameW	Unknown
KernelBase.dll!GetModuleFileNameA	Unknown
>	test.shift.task.x86_64.vc140.exe!test_windows_boost_bug::test_method::__l2::<lambda>	C++
test.shift.task.x86_64.vc140.exe!boost::coroutines2::detail::pull_coroutine<int>::control_block::<lambda>	C++
test.shift.task.x86_64.vc140.exe!std::_Invoker_functor::_Call<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64>	C++
test.shift.task.x86_64.vc140.exe!std::invoke<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64>	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::apply_impl<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64>,0,1>	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::apply<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64> >	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) >::run	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::context_entry<boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) > >	C++
test.shift.task.x86_64.vc140.exe!make_fcontext	Unknown
0000015ef8773e60	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
00000018dad1d500	Unknown
0000015ef8773e80	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
0000000000010000	Unknown
0000000000010000	Unknown
0000015ef8773f20	Unknown
0000015ef8773ec0	Unknown
00000018dad1d984	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown

It took me a while to figure out what went wrong with the call stack as I initially thought about a bug in the context switching code. However, the solution turned out to be rather simple: The stack memory allocated using the basic_fixedsize_stack class simply isn't initialized. A simple call to memset fully resolves the issue for me.

Attachments (1)

boost_1_61_0-context-init-stack.patch (586 bytes ) - added by runningwithscythes@… 6 years ago.
patch to initialize stack memory

Download all attachments as: .zip

Change History (7)

by runningwithscythes@…, 6 years ago

patch to initialize stack memory

comment:1 by olli, 6 years ago

Resolution: fixed
Status: newclosed

thx, fixed

in reply to:  1 ; comment:2 by Alan Wilkie <alan@…>, 6 years ago

Replying to olli:

thx, fixed

Just to round this out, I have been chasing the same (or very similar) issue and I think the root cause is the "fbr_strg" entry in the context is not being specifically initialised. When the initial context switch occurs, it picks up the unitialised value and writes it to the TIB (especially in debug builds where new memory is intialised to 0xCD). Some Windows functions consult this value and use it if it's not zero.

Initialising the allocated stack space also zeroes the context and fixes the problem. I think it should also be possible to fix by setting fbr_strg to zero in make_x86_64_ms_pe_masm.asm and make_i386_ms_pe_masm.asm.

in reply to:  2 ; comment:3 by olli, 6 years ago

makes sense - I've changed the code in branch develop. could you verify the fix, please

in reply to:  3 comment:4 by Alan Wilkie <alan@…>, 6 years ago

Replying to olli:

makes sense - I've changed the code in branch develop. could you verify the fix, please

I haven't verified the actual code of the develop branch, but I've made the same change to the 1.60 code and it does fix the crash. Looking at the commit, I assume that corresponding changes would need to be made in make_x86_64_ms_pe_gas.asm and make_i386_ms_pe_masm.asm?

comment:5 by baldzar@…, 6 years ago

I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).

The fix is the same, zeroing the stack.

in reply to:  5 comment:6 by olli, 6 years ago

Replying to baldzar@…:

I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).

The fix is the same, zeroing the stack.

But the problem seams to be related to the fiber-storge field in the TIB. The fix in 1.62 does initialize this field with zeros. Could you verify that this fixes the problem, please?

Note: See TracTickets for help on using tickets.