Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#12592 closed Bugs (worksforme)

Wrongly aligned bottom stack address

Reported by: Christian Maaser <runningwithscythes@…> Owned by: olli
Milestone: To Be Determined Component: context
Version: Boost 1.62.0 Severity: Problem
Keywords: Cc:

Description

I'm facing an incompatibility of Boost context based coroutines with the compiler generated function __chkstk() (on Windows using MSVC2015), where apparently the bottom stack address is not aligned correctly to page boundaries.

My application uses Qt within a Boost coroutine (v2) and unfortunately runs out of stack space, which is a different issue and out of scope of this problem here. However, symptoms are that deep within the call stack some QT function implicitly calls the compiler generated function __chkstk() and crashes with a memory access violation. The function's code looks like this:

void QQmlNotifier::emitNotify(QQmlNotifierEndpoint *endpoint, void **a)
{
; auto generated code at the beginning of the function
00000000730D9C20 48 89 54 24 10       mov         qword ptr [rsp+10h],rdx
00000000730D9C25 48 89 4C 24 08       mov         qword ptr [rsp+8],rcx
; This QT function needs 0x1878 bytes on the stack, which is more than one page of memory,
; thus call __chkstk to make sure that all pages are mapped to actual memory
00000000730D9C2A B8 78 18 00 00       mov         eax,1878h
00000000730D9C2F E8 39 CE B6 FF       call        __chkstk (072C46A6Dh)

Qt5Qmld.dll!__chkstk():
__chkstk:
; simply jump to actual function below
0000000072C46A6D E9 0E 23 58 00       jmp         __chkstk (0731C8D80h)

; do some stuff...
00000000731C8D80 48 83 EC 10          sub         rsp,10h
00000000731C8D84 4C 89 14 24          mov         qword ptr [rsp],r10
00000000731C8D88 4C 89 5C 24 08       mov         qword ptr [rsp+8],r11
00000000731C8D8D 4D 33 DB             xor         r11,r11
00000000731C8D90 4C 8D 54 24 18       lea         r10,[rsp+18h]
00000000731C8D95 4C 2B D0             sub         r10,rax
00000000731C8D98 4D 0F 42 D3          cmovb       r10,r11
; At this point r11 gets the bottom address of context stack assigned
00000000731C8D9C 65 4C 8B 1C 25 10 00 00 00 mov         r11,qword ptr gs:[10h]

; check if the actually allocated size of stack is big enough
00000000731C8DA5 4D 3B D3             cmp         r10,r11
; if this is false and if we don't jump to cs20, trouble begins
00000000731C8DA8 F2 73 17             bnd jae     cs20 (0731C8DC2h)
00000000731C8DAB 66 41 81 E2 00 F0    and         r10w,0F000h
cs10:
; walk down the stack in 4096 byte steps
00000000731C8DB1 4D 8D 9B 00 F0 FF FF lea         r11,[r11-1000h]

; trigger guard page (by writing a value) to allow dynamic resize of stack or mapping of additional pages
00000000731C8DB8 41 C6 03 00          mov         byte ptr [r11],0

; r10 and r11 will never be equal because the value of r11 is not aligned to 0x1000.
00000000731C8DBC 4D 3B D3             cmp         r10,r11
00000000731C8DBF F2 75 EF             bnd jne     cs10 (0731C8DB1h)

cs20:
; irrelevant end of function
00000000731C8DC2 4C 8B 14 24          mov         r10,qword ptr [rsp]
00000000731C8DC6 4C 8B 5C 24 08       mov         r11,qword ptr [rsp+8]
00000000731C8DCB 48 83 C4 10          add         rsp,10h
00000000731C8DCF F2 C3                bnd ret

The value of r11 in above code is the same as is assigned in boost_1_62_0\libs\context\src\asm\make_x86_64_ms_pe_masm.asm during context creation:

; ...
; save bottom address of context stack as 'limit'
00007FF7C0A05A8C 48 89 48 10          mov         qword ptr [rax+10h],rcx
; ... (see source file for full code)

When using QT without the surrounding coroutine/context and walking into __chkstk, the bottom stack limit (r11) is always a value aligned to 0x1000. Due to the fact that context does not align this value to 0x1000 the loop above writes random zeros into the applications stack address space until an access violation/seg fault occurs, destroying vital data on its way down and possibly interfering with other threads before crashing.

I don't know yet if this only affects MSVC and the default basic_fixedsize_stack used on Windows, or if other stack allocators are broken as well.

Attachments (1)

wrong_stack_alignment.cpp (958 bytes ) - added by Christian Maaser <runningwithscythes@…> 6 years ago.
minimal test case

Download all attachments as: .zip

Change History (10)

comment:1 by olli, 6 years ago

As a quick fix (before I get time to fix this issue) you could disable basic runtime checks:

#pragma check_stack(off) or set "Basic Runtime Checks" to default under "C/C++"/"Code Generation"

comment:2 by Christian Maaser <runningwithscythes@…>, 6 years ago

I don't think that this function is related to the Basic Runtime Checks setting, but is generally used in functions that use more than one page of stack memory (using large local variables). See http://stackoverflow.com/questions/8400118/what-is-the-purpose-of-the-chkstk-function.

I'm currently using the prebuilt version of QT, so I can't simply test whether this setting actually removes the call to _chkstk.

However, I do have a different workaround for this issue which works fine for me, so it isn't blocking my work.

comment:3 by Christian Maaser <runningwithscythes@…>, 6 years ago

Regarding _chkstk see this (stupid) simple code:

#include <array>
#include <cstring>

int main(int argc, char* argv[])
{
  std::array<char, 8192> bar;
  strcpy(bar.data(), argv[0]);
  return 0;
}

results in

int main(int argc, char* argv[])
{
00007FF6323C2C10 B8 18 20 00 00       mov         eax,2018h  
00007FF6323C2C15 E8 76 EE 01 00       call        __chkstk (07FF6323E1A90h)  
...

The compiler does not generate this call when using a smaller array (like 1024 bytes). Thus it should be easy to reproduce the issue using an array large enough. I can provide you a minimal test case by tomorrow if you like.

by Christian Maaser <runningwithscythes@…>, 6 years ago

Attachment: wrong_stack_alignment.cpp added

minimal test case

comment:4 by Christian Maaser <runningwithscythes@…>, 6 years ago

I can confirm that MSVC adds __chkstk to any function requesting more than one page of memory regardless of "Basic Runtime Checks", "Disable Security Checks", or other compiler flags, even in fully optimized release builds. It is a Windows NT platform requirement to do these checks. See https://support.microsoft.com/en-us/kb/100775

comment:5 by olli, 6 years ago

Resolution: worksforme
Status: newclosed

I've had time to look at this bug in more detail:

The bottom stack is correctly aligned (16-byte) - the problem is that

execution_context<int> source([](execution_context<int> sink, int)

uses the 'fixedsize_stack' with a default size of 64kB. That means that in best case you get an segmentation fault if you excceed the stack size (allocating 7 arrays on the stack).

Windows does not provide segemented (on demand growing) stacks.

If you use the 'protected_fixedsize_stack' you get an guard page appended at the stack. If you access memory from that page an 'stack overflow' exception is raised:

execution_context<int> source(

std::allocator_arg, protected_fixedsize_stack( 64*1024),

[](execution_context<int> sink, int)

In your case you used the fixedsize_stack with a stack size of 64kb, which seams too small for your use case. You should try a larger stack size.

So, I close this bug as 'works for me'.

BTW, to disable optimization for a function you could use '#pragma optimize("",on|off)'

Last edited 6 years ago by olli (previous) (diff)

comment:6 by Christian Maaser, 6 years ago

The bottom stack address is required to be aligned at a page boundary (4KB) on 64bit Windows. protected_fixedsize_stack uses

void * vp = ::VirtualAlloc( 0, size__, MEM_COMMIT, PAGE_READWRITE);

to allocate its memory, which does have this guarantee (the function allocates on a memory page level). fixedsize_stack however simply uses

void * vp = std::malloc( size_);

which has no such guarantee.

Of course I agree with you that when running out of stack there is no way (other than dynamically resizing the stack) to recover and without a guard page the application will crash at some point due to accessing an unmapped page. However, MSVC's generated _chkstk assumes that

a) there is a guard page (as setup by protected_fixedsize_stack), and

b) the bottom stack address is properly aligned.

In any case _chkstk will attempt to overwrite several bytes in attempt to trigger the dynamic stack resizing mechanism (assumed to be present), but only when the stack is page aligned _chkstk will be limited to the number of additional pages required by the user's function. When the stack is not properly aligned _chkstk will potentially overwrite many more bytes until it crashes because due to assumptions above it fails to find the end of stack and happily loops through process memory (e.g. other fixedsize_stacks from other contexts allocated right next to each other) overwriting several bytes until it reaches unmapped addresses and only then generating an access violation.

Actually, due to the assumptions of _chkstk/Windows use of fixedsize_stack should be discouraged and shouldn't be default on Windows, but that is a different issue.

Thanks for showing how to change the stack allocator. I missed that part and manually patched my copy of Boost to use protected_fixedsize_stack by default.

in reply to:  6 comment:7 by olli, 6 years ago

Replying to Christian Maaser:

Actually, due to the assumptions of _chkstk/Windows use of fixedsize_stack should be discouraged and shouldn't be default on Windows, but that is a different issue.

A solution for fixedsize_stack could be to use

VirtualAlloc()

as in protected_fixedsize_stack but without adding a guard page at the end. On Windows the minimal stack size must be of page size. So fixedsize_stack would allocate a stack of mulitple pages size.

comment:8 by Christian Maaser <runningwithscythes@…>, 6 years ago

C++17 will probably introduce http://en.cppreference.com/w/cpp/memory/c/aligned_alloc, but until then this sounds like a good solution to me. Generally I think the benefits of fixedsize_stack over protected_fixedsize_stack are virtually non-existent, because the additional guard page does not require any actual memory allocation but only an address space allocation and reliably protects against stack overflows (utilizing _chkstk).

comment:9 by olli, 5 years ago

The next release of boost.context supports Windows-Fibers - of course Windows-Fibers are slower than fcontext.

Note: See TracTickets for help on using tickets.