#12215 closed Bugs (fixed)
Boost.context: call stack corrupted on Windows using default fixedsize_stack
Reported by: | Owned by: | olli | |
---|---|---|---|
Milestone: | To Be Determined | Component: | context |
Version: | Boost 1.61.0 | Severity: | Problem |
Keywords: | Cc: |
Description
There is an issue in basic_fixedsize_stack since at least Boost 1.59 on Windows using MSVC2013 or MSVC2015 in debug builds only, causing wired crashes of seemingly totally unrelated Windows API calls and the like. The following simple unit test fails on any Windows machine I tested so far:
#define BOOST_COROUTINES_UNIDRECT #define BOOST_COROUTINES_V2 #include <boost/coroutine2/coroutine.hpp> // ... using coro_t = boost::coroutines2::coroutine<int>; BOOST_AUTO_TEST_CASE(test_windows_boost_bug) { bool result = false; auto coro_function = [&](coro_t::push_type& sink) { #if defined(PLATFORM_WINDOWS) char buffer[MAX_PATH]; // The following simple Windows API call crashes when using MSVC // on Windows in debug build only. GetModuleFileName(nullptr, buffer, MAX_PATH); // Exception thrown at 0x00007FF939A21D58 (ntdll.dll) in // test.shift.task.x86_64.vc140.exe: 0xC0000005: // Access violation reading location 0xFFFFFFFFFFFFFFFF. result = true; // code not reached. #endif }; coro_t::pull_type{coro_function}; BOOST_CHECK(result); }
I stumbled across this bug several times but didn't try to fix it until I realized that it is still present in the recently released Boost 1.61.
Once the code crashes the full stack trace looks like this:
ntdll.dll!LdrGetDllFullName Unknown KernelBase.dll!GetModuleFileNameW Unknown KernelBase.dll!GetModuleFileNameA Unknown > test.shift.task.x86_64.vc140.exe!test_windows_boost_bug::test_method::__l2::<lambda> C++ test.shift.task.x86_64.vc140.exe!boost::coroutines2::detail::pull_coroutine<int>::control_block::<lambda> C++ test.shift.task.x86_64.vc140.exe!std::_Invoker_functor::_Call<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64> C++ test.shift.task.x86_64.vc140.exe!std::invoke<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64> C++ test.shift.task.x86_64.vc140.exe!boost::context::detail::apply_impl<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64>,0,1> C++ test.shift.task.x86_64.vc140.exe!boost::context::detail::apply<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64> > C++ test.shift.task.x86_64.vc140.exe!boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) >::run C++ test.shift.task.x86_64.vc140.exe!boost::context::detail::context_entry<boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) > > C++ test.shift.task.x86_64.vc140.exe!make_fcontext Unknown 0000015ef8773e60 Unknown cdcdcdcdcdcdcdcd Unknown cdcdcdcdcdcdcdcd Unknown cdcdcdcdcdcdcdcd Unknown 00000018dad1d500 Unknown 0000015ef8773e80 Unknown cdcdcdcdcdcdcdcd Unknown cdcdcdcdcdcdcdcd Unknown 0000000000010000 Unknown 0000000000010000 Unknown 0000015ef8773f20 Unknown 0000015ef8773ec0 Unknown 00000018dad1d984 Unknown cdcdcdcdcdcdcdcd Unknown cdcdcdcdcdcdcdcd Unknown cdcdcdcdcdcdcdcd Unknown
It took me a while to figure out what went wrong with the call stack as I initially thought about a bug in the context switching code. However, the solution turned out to be rather simple: The stack memory allocated using the basic_fixedsize_stack class simply isn't initialized. A simple call to memset fully resolves the issue for me.
Attachments (1)
Change History (7)
by , 6 years ago
Attachment: | boost_1_61_0-context-init-stack.patch added |
---|
follow-up: 3 comment:2 by , 6 years ago
Replying to olli:
thx, fixed
Just to round this out, I have been chasing the same (or very similar) issue and I think the root cause is the "fbr_strg" entry in the context is not being specifically initialised. When the initial context switch occurs, it picks up the unitialised value and writes it to the TIB (especially in debug builds where new memory is intialised to 0xCD). Some Windows functions consult this value and use it if it's not zero.
Initialising the allocated stack space also zeroes the context and fixes the problem. I think it should also be possible to fix by setting fbr_strg to zero in make_x86_64_ms_pe_masm.asm and make_i386_ms_pe_masm.asm.
follow-up: 4 comment:3 by , 6 years ago
makes sense - I've changed the code in branch develop. could you verify the fix, please
comment:4 by , 6 years ago
Replying to olli:
makes sense - I've changed the code in branch develop. could you verify the fix, please
I haven't verified the actual code of the develop branch, but I've made the same change to the 1.60 code and it does fix the crash. Looking at the commit, I assume that corresponding changes would need to be made in make_x86_64_ms_pe_gas.asm and make_i386_ms_pe_masm.asm?
follow-up: 6 comment:5 by , 6 years ago
I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).
The fix is the same, zeroing the stack.
comment:6 by , 6 years ago
Replying to baldzar@…:
I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).
The fix is the same, zeroing the stack.
But the problem seams to be related to the fiber-storge field in the TIB. The fix in 1.62 does initialize this field with zeros. Could you verify that this fixes the problem, please?
patch to initialize stack memory