Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#8544 closed Bugs (fixed)

Calling managed DLL from within boost::context may cause a crash

Reported by: vitaly.blinov@… Owned by: olli
Milestone: To Be Determined Component: context
Version: Boost 1.53.0 Severity: Problem
Keywords: context, coroutine Cc:

Description

Only Windows platform is affected.

If the code running in the context (coroutine) invokes anything that involves crossing clr.dll (mscorwks.dll) boundary, a crash occurs with about 50% probability.

My investigation showed that call stack of the problem is consistent with clr.dll!Thread::InitThread throwing OutOfMemory exception. With some deep debugging I narrowed the problem down to CommitThreadStack function inside the clr.dll. That method accesses a dword located at FS:[0xE0C] (this is called "deallocaton stack" on TIB wiki page) and compares it with current top of the stack (FS:[0x4]). It appears that exception is thrown if FS:[0xE0C] value is greater than FS:[0x4] (or, perhaps, FS:[0x8]). That variable is not very well documented, but I believe the pair FS:[0xE0C] - FS:[0x4] defines the maximum stack size. On windows 7, the difference between these is always 0x100000, which gives stack size of 1M. Interestingly, that always the case even if the fiber or thread were created with smaller stack size.

jump_context never touches that variable. As a result, the value in FS:[0xE0C] is defined by the calling thread, and therefore it contains arbitrary value. If it is greater than current top of stack, problem occurs.

clr.dllCommitThreadStack also appears to be accessing FS:[0xF78], but it's purpose and whether the value stored in it affects the behavior is unknown.

My current workaround of writing current bottom of the stack to FS:[0xE0C] prior to calling managed DLL appears to be working:

(MS VS specific)

DWORD store = __readfsdword(0xE0C);
__writefsdword(0xE0C, __readfsdword(0x8));
call_managed_dll();
__writefsdword(0xE0C, store);

This bug is very obscure and so far I only managed to observe it on Windows7 and Windows server 2008.

Suggested fix: Store and restore FS:[0xE0C] in jump_context.

Attachments (2)

CoroCLRTest.zip (16.3 KB ) - added by vitaly.blinov@… 9 years ago.
VS 2012 solution demonstrating the problem
CoroCLRTest.2.zip (20.5 KB ) - added by vitaly.blinov@… 9 years ago.
Updated solution, includes 64 bit libraries and workaround

Download all attachments as: .zip

Change History (9)

in reply to:  description comment:1 by olli, 9 years ago

I've committed a fix for Win32 toboost-trunk - could you verify the fix, please? What about 64bit Windows? Does it check the 'deallocation stack' TIB-member too (if yes I assume it is located at another position).

comment:2 by vitaly.blinov@…, 9 years ago

Thanks Oliver,

I'll test it early next week. I haven't had a chance to try it with 64bit libs, but I suspect the behaviour might be similar. I'll do my best to put together a small test project so we'll know for sure.

by vitaly.blinov@…, 9 years ago

Attachment: CoroCLRTest.zip added

VS 2012 solution demonstrating the problem

comment:3 by anonymous, 9 years ago

Okay, I managed to put together a minimal solution that reproduces the problem. Attached contains 2012 solution with CLR dll, which exports a native method and native EXE calling that exported method. I believe such calls will go through something MS calls "double thunking". This is rather special use case, but it is not unusual at all.

Will try to reproduce it on 64 bit windows first.

comment:4 by vitaly.blinov@…, 9 years ago

The fix from the boost-trunk appears to be working, I verified it with the test project attached (it has a bug - it falls into an infinite loop with the fix :) ) Will now try to adapt the tests to 64 bit Windows. There is no information about NT_TIB structure on 64 bit windows on the internet though.

by vitaly.blinov@…, 9 years ago

Attachment: CoroCLRTest.2.zip added

Updated solution, includes 64 bit libraries and workaround

comment:5 by vitaly.blinov@…, 9 years ago

64 bit libraries appear to have the same vulnerability. I attached updated solution with 64 bit configuration. In the depths of the Internet I found that on 64 bit Windows "deallocation stack" must be located at GS:[0x1478]. This is unconfirmed, but workaround with this assumption works.

comment:6 by olli, 9 years ago

Resolution: fixed
Status: newclosed

deallocation stack for 64bit windows will be stored/restored too - please verify. thank you!

in reply to:  6 comment:7 by vitaly.blinov@…, 9 years ago

Replying to olli:

please verify.

Compiled context library from the trunk. Run tests on both 32 and 64 bit platforms, CLR DLL calls were sucessful in all configurations. Verified.

thank you!

No problem. Thank you for this library, it really rocks!

Note: See TracTickets for help on using tickets.