#12592 closed Bugs (worksforme)
Wrongly aligned bottom stack address
Reported by: | Owned by: | olli | |
---|---|---|---|
Milestone: | To Be Determined | Component: | context |
Version: | Boost 1.62.0 | Severity: | Problem |
Keywords: | Cc: |
Description
I'm facing an incompatibility of Boost context based coroutines with the compiler generated function __chkstk()
(on Windows using MSVC2015), where apparently the bottom stack address is not aligned correctly to page boundaries.
My application uses Qt within a Boost coroutine (v2) and unfortunately runs out of stack space, which is a different issue and out of scope of this problem here. However, symptoms are that deep within the call stack some QT function implicitly calls the compiler generated function __chkstk()
and crashes with a memory access violation. The function's code looks like this:
void QQmlNotifier::emitNotify(QQmlNotifierEndpoint *endpoint, void **a) { ; auto generated code at the beginning of the function 00000000730D9C20 48 89 54 24 10 mov qword ptr [rsp+10h],rdx 00000000730D9C25 48 89 4C 24 08 mov qword ptr [rsp+8],rcx ; This QT function needs 0x1878 bytes on the stack, which is more than one page of memory, ; thus call __chkstk to make sure that all pages are mapped to actual memory 00000000730D9C2A B8 78 18 00 00 mov eax,1878h 00000000730D9C2F E8 39 CE B6 FF call __chkstk (072C46A6Dh) Qt5Qmld.dll!__chkstk(): __chkstk: ; simply jump to actual function below 0000000072C46A6D E9 0E 23 58 00 jmp __chkstk (0731C8D80h) ; do some stuff... 00000000731C8D80 48 83 EC 10 sub rsp,10h 00000000731C8D84 4C 89 14 24 mov qword ptr [rsp],r10 00000000731C8D88 4C 89 5C 24 08 mov qword ptr [rsp+8],r11 00000000731C8D8D 4D 33 DB xor r11,r11 00000000731C8D90 4C 8D 54 24 18 lea r10,[rsp+18h] 00000000731C8D95 4C 2B D0 sub r10,rax 00000000731C8D98 4D 0F 42 D3 cmovb r10,r11 ; At this point r11 gets the bottom address of context stack assigned 00000000731C8D9C 65 4C 8B 1C 25 10 00 00 00 mov r11,qword ptr gs:[10h] ; check if the actually allocated size of stack is big enough 00000000731C8DA5 4D 3B D3 cmp r10,r11 ; if this is false and if we don't jump to cs20, trouble begins 00000000731C8DA8 F2 73 17 bnd jae cs20 (0731C8DC2h) 00000000731C8DAB 66 41 81 E2 00 F0 and r10w,0F000h cs10: ; walk down the stack in 4096 byte steps 00000000731C8DB1 4D 8D 9B 00 F0 FF FF lea r11,[r11-1000h] ; trigger guard page (by writing a value) to allow dynamic resize of stack or mapping of additional pages 00000000731C8DB8 41 C6 03 00 mov byte ptr [r11],0 ; r10 and r11 will never be equal because the value of r11 is not aligned to 0x1000. 00000000731C8DBC 4D 3B D3 cmp r10,r11 00000000731C8DBF F2 75 EF bnd jne cs10 (0731C8DB1h) cs20: ; irrelevant end of function 00000000731C8DC2 4C 8B 14 24 mov r10,qword ptr [rsp] 00000000731C8DC6 4C 8B 5C 24 08 mov r11,qword ptr [rsp+8] 00000000731C8DCB 48 83 C4 10 add rsp,10h 00000000731C8DCF F2 C3 bnd ret
The value of r11 in above code is the same as is assigned in boost_1_62_0\libs\context\src\asm\make_x86_64_ms_pe_masm.asm during context creation:
; ... ; save bottom address of context stack as 'limit' 00007FF7C0A05A8C 48 89 48 10 mov qword ptr [rax+10h],rcx ; ... (see source file for full code)
When using QT without the surrounding coroutine/context and walking into __chkstk
, the bottom stack limit (r11) is always a value aligned to 0x1000. Due to the fact that context does not align this value to 0x1000 the loop above writes random zeros into the applications stack address space until an access violation/seg fault occurs, destroying vital data on its way down and possibly interfering with other threads before crashing.
I don't know yet if this only affects MSVC and the default basic_fixedsize_stack used on Windows, or if other stack allocators are broken as well.
Attachments (1)
Change History (10)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
I don't think that this function is related to the Basic Runtime Checks setting, but is generally used in functions that use more than one page of stack memory (using large local variables). See http://stackoverflow.com/questions/8400118/what-is-the-purpose-of-the-chkstk-function.
I'm currently using the prebuilt version of QT, so I can't simply test whether this setting actually removes the call to _chkstk
.
However, I do have a different workaround for this issue which works fine for me, so it isn't blocking my work.
comment:3 by , 6 years ago
Regarding _chkstk
see this (stupid) simple code:
#include <array> #include <cstring> int main(int argc, char* argv[]) { std::array<char, 8192> bar; strcpy(bar.data(), argv[0]); return 0; }
results in
int main(int argc, char* argv[]) { 00007FF6323C2C10 B8 18 20 00 00 mov eax,2018h 00007FF6323C2C15 E8 76 EE 01 00 call __chkstk (07FF6323E1A90h) ...
The compiler does not generate this call when using a smaller array (like 1024 bytes). Thus it should be easy to reproduce the issue using an array large enough. I can provide you a minimal test case by tomorrow if you like.
comment:4 by , 6 years ago
I can confirm that MSVC adds __chkstk
to any function requesting more than one page of memory regardless of "Basic Runtime Checks", "Disable Security Checks", or other compiler flags, even in fully optimized release builds. It is a Windows NT platform requirement to do these checks. See https://support.microsoft.com/en-us/kb/100775
comment:5 by , 6 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I've had time to look at this bug in more detail:
The bottom stack is correctly aligned (16-byte) - the problem is that
execution_context<int> source([](execution_context<int> sink, int)
uses the 'fixedsize_stack' with a default size of 64kB. That means that in best case you get an segmentation fault if you excceed the stack size (allocating 7 arrays on the stack).
Windows does not provide segemented (on demand growing) stacks.
If you use the 'protected_fixedsize_stack' you get an guard page appended at the stack. If you access memory from that page an 'stack overflow' exception is raised:
execution_context<int> source(
std::allocator_arg, protected_fixedsize_stack( 64*1024),
[](execution_context<int> sink, int)
In your case you used the fixedsize_stack with a stack size of 64kb, which seams too small for your use case. You should try a larger stack size.
So, I close this bug as 'works for me'.
BTW, to disable optimization for a function you could use '#pragma optimize("",on|off)'
follow-up: 7 comment:6 by , 6 years ago
The bottom stack address is required to be aligned at a page boundary (4KB) on 64bit Windows. protected_fixedsize_stack uses
void * vp = ::VirtualAlloc( 0, size__, MEM_COMMIT, PAGE_READWRITE);
to allocate its memory, which does have this guarantee (the function allocates on a memory page level). fixedsize_stack however simply uses
void * vp = std::malloc( size_);
which has no such guarantee.
Of course I agree with you that when running out of stack there is no way (other than dynamically resizing the stack) to recover and without a guard page the application will crash at some point due to accessing an unmapped page. However, MSVC's generated _chkstk assumes that
a) there is a guard page (as setup by protected_fixedsize_stack), and
b) the bottom stack address is properly aligned.
In any case _chkstk will attempt to overwrite several bytes in attempt to trigger the dynamic stack resizing mechanism (assumed to be present), but only when the stack is page aligned _chkstk will be limited to the number of additional pages required by the user's function. When the stack is not properly aligned _chkstk will potentially overwrite many more bytes until it crashes because due to assumptions above it fails to find the end of stack and happily loops through process memory (e.g. other fixedsize_stacks from other contexts allocated right next to each other) overwriting several bytes until it reaches unmapped addresses and only then generating an access violation.
Actually, due to the assumptions of _chkstk/Windows use of fixedsize_stack should be discouraged and shouldn't be default on Windows, but that is a different issue.
Thanks for showing how to change the stack allocator. I missed that part and manually patched my copy of Boost to use protected_fixedsize_stack by default.
comment:7 by , 6 years ago
Replying to Christian Maaser:
Actually, due to the assumptions of _chkstk/Windows use of fixedsize_stack should be discouraged and shouldn't be default on Windows, but that is a different issue.
A solution for fixedsize_stack could be to use
VirtualAlloc()
as in protected_fixedsize_stack but without adding a guard page at the end. On Windows the minimal stack size must be of page size. So fixedsize_stack would allocate a stack of mulitple pages size.
comment:8 by , 6 years ago
C++17 will probably introduce http://en.cppreference.com/w/cpp/memory/c/aligned_alloc, but until then this sounds like a good solution to me. Generally I think the benefits of fixedsize_stack over protected_fixedsize_stack are virtually non-existent, because the additional guard page does not require any actual memory allocation but only an address space allocation and reliably protects against stack overflows (utilizing _chkstk).
comment:9 by , 5 years ago
The next release of boost.context supports Windows-Fibers - of course Windows-Fibers are slower than fcontext.
As a quick fix (before I get time to fix this issue) you could disable basic runtime checks:
#pragma check_stack(off) or set "Basic Runtime Checks" to default under "C/C++"/"Code Generation"