Opened 9 years ago

Closed 9 years ago

Last modified 8 years ago

#8509 closed Patches (fixed)

SSE/AVX optimization and C++11 support

Reported by: Andrey Semashev Owned by: Andy Tompkins
Milestone: To Be Determined Component: uuid
Version: Boost 1.53.0 Severity: Optimization
Keywords: Cc:

Description

The suboptimal performance of boost::uuids::uuid operators had been brought up on the developers mailing list before, and I have performed some testing on various compilers to confirm that. I also have applications that depend on uuid operations performance, so I'm interested in optimizing it.

I've attached the test I used for benchmarking, and also my testing results performed on Intel Core i7 2600K (also tried on an older Core 2 Duo machine with similar results). The benchmarking code includes the "stock" functions which correspond to the current implementations of the equality and ordering operators, the "mem" functions based on memcmp, and "simd" functions that are implemented with SSE intrinsics. The tests measure the time needed to perform a certain number of operations in a loop. The arguments to the operations are either placed on the stack or on the heap (to emulate distinct objects in an application). To summarize the results:

  1. The simd_equal version is the fastest across almost all configurations. The performance gain varies and can be 3.5x - 8x faster than the stock version. On MSVC x64 target though, all variants perform close (mem and sind slightly faster) if the compared values are placed adjacently on the stack. The simd version is still the fastest one if the operands are allocated on the heap.
  1. On MSVC x86 target mem_less turned out to be the fastest, with simd_less coming second. On other platforms, including MSVC x64, simd_less performed best (with more moderate gain though - 1.6x to 2.3x faster than the stock version).

Based on these results I've prepared a patch for uuid that makes use of SSE/AVX operations when possible (basically, it uses the "simd" versions when SSE/AVX is enabled at compile time). Also, the patch changes the generic implementations of the operators to use memcmp, since compilers generally better optimize code with it as opposed to std::equal and std::lexicographical_compare (to be fair, GCC and Clang generated the same code for "stock" and "mem" versions). For MSVC x86, the generic (now memcmp-based) operator< is used since it showed faster in the tests. Lastly, the patch adds constexpr and noexcept where appropriate to improve compatibility with C++11 and allow for further optimizations by the supporting compilers.

I would be glad to see this patch applied. If you have any questions or comments, I'll be glad to answer here or on the mailing list.

Attachments (6)

uuid_operators.cpp (5.8 KB ) - added by Andrey Semashev 9 years ago.
Benchmarking code
uuid_operators.txt (15.4 KB ) - added by Andrey Semashev 9 years ago.
Benchmarking results
uuid_simd.patch (15.4 KB ) - added by Andrey Semashev 9 years ago.
Updated patch that adds optimizations for SSE and C++11 to boost::uuid.
testcase-1-981648-vsoptimizer-bug.cpp (3.8 KB ) - added by hajokirchhoff <mailinglists@…> 8 years ago.
testcase 1 showing part of the VS2013 optimizer problem
testcase-981648-vsoptimizer-bug.zip (9.8 KB ) - added by hajokirchhoff <mailinglists@…> 8 years ago.
Testcase showing all optimizer problems including the access violation
testcase-981648-vs2008-bug.zip (10.7 KB ) - added by hajokirchhoff <mailinglists@…> 8 years ago.
Testcase for VS2008

Download all attachments as: .zip

Change History (29)

by Andrey Semashev, 9 years ago

Attachment: uuid_operators.cpp added

Benchmarking code

by Andrey Semashev, 9 years ago

Attachment: uuid_operators.txt added

Benchmarking results

by Andrey Semashev, 9 years ago

Attachment: uuid_simd.patch added

Updated patch that adds optimizations for SSE and C++11 to boost::uuid.

comment:1 by Andrey Semashev, 9 years ago

(In [86385]) Added optimizations for C++11 and SSE. Refs #8509.

comment:2 by Andrey Semashev, 9 years ago

Resolution: fixed
Status: newclosed

(In [86660]) Merged changes from trunk: warning silencing and support for SSE and C++11. Fixes #8495, #8697, #8509.

comment:3 by hajokirchhoff <mailinglists@…>, 8 years ago

There seems to be a bug in the Visual Studio 2013 optimizer, which causes the != operator to crash. Please have a look at https://connect.microsoft.com/VisualStudio/feedbackdetail/view/981648#tabs. Putting a #pragma optimize("", off/on) around the operators works around this bug. I'll follow up with more details.

comment:4 by Andrey Semashev, 8 years ago

Could you provide a small code snippet so I can reproduce the problem? I'll need it to implement a test.

Also, what compiler switches do you use when you have the problem?

comment:5 by hajokirchhoff <mailinglists@…>, 8 years ago

I'll try to reproduce it. I am using /O2 and "no whole program optimization". The problem is that the optimizer will usually produce code like this:

000000013F2110DA  movdqu      xmm0,xmmword ptr [rsp+48h]  
000000013F2110E0  movdqu      xmm1,xmmword ptr [rax+10h]  
000000013F2110E5  pcmpeqd     xmm1,xmm0  
000000013F2110E9  pmovmskb    eax,xmm1  
000000013F2110ED  cmp         eax,0FFFFh  

It's only when the register xmm0 is already assigned a different variable that the optimizer will then produce the problematic opcode. I need to find a way to reproduce that specific scenario. The bug manifests itself in a 29MB binary of an application I am currently porting from VS2008 to VS2013 (and boost 1.55 to boost 1.56).

comment:6 by hajokirchhoff <mailinglists@…>, 8 years ago

My guess is that is further complicated by the fact that boost/uuid/detail/config.hpp defined BOOST_UUID_USE_SSE2 for Visual Studio and emits SSE2 opcodes which the compiler then optimizes. At least if I compile the debug version I see an entirely different output for _mm_loadu_si128 than with release.

comment:7 by Andrey Semashev, 8 years ago

So this is case specific, I guess. I'm not sure it's a good idea to disable optimization for everyone. Does it happen with other operators or functions?

If patching Boost is difficult for you you can define BOOST_UUID_NO_SIMD to disable the optimized routines.

by hajokirchhoff <mailinglists@…>, 8 years ago

testcase 1 showing part of the VS2013 optimizer problem

comment:8 by hajokirchhoff <mailinglists@…>, 8 years ago

Yes, it's very specific. At least the other problem (movdqu) you mentioned over at connect.microsoft seems more common. I uploaded a testcase for you just now.

Since the problematic functions are inline I'll try to disable optimization in my calling functions. That should leave optimization for all other instances intact. BTW, I noticed that config.hpp does not define SSE41, even though that is available for VS2013. I'll try to enable that. Could be that my problem then disappears (for now).

comment:9 by hajokirchhoff <mailinglists@…>, 8 years ago

Finally! Please see attached testcase that shows both problems. The problem is not that specific. I imagine it might affect others as well. All I had to do was move the function with the != operator to a different .cpp file than the calling function. Then I was able to reproduce the crash.

Note that this is Visual Studio 2013 Professional Update 3, Version 12.0.30723.0

by hajokirchhoff <mailinglists@…>, 8 years ago

Testcase showing all optimizer problems including the access violation

comment:10 by hajokirchhoff <mailinglists@…>, 8 years ago

The good news is that code generated with #define BOOST_UUID_USE_SSE3 seems to be working okay. So the easiest fix for this problem would be to add

#if _MSC_VER>=1800 && defined(_M_X64)
#define BOOST_UUID_USE_SSE3
#endif

to boost/uuid/detail/config.hpp, line 44. BOOST_UUID_USE_SSE41 would also work, but the code for the operator== gets bigger. There is an added comparision which looks weird to me.

After that I don't have any more problems (yet).

comment:11 by Andrey Semashev, 8 years ago

Defining these macros makes the resulting binaries only runnable on CPUs with the respective instructions supported. This is ok if you target your application to these more recent CPUs, but in general SSE3 and later are not required to be present in Intel64/AMD64 CPUs.

comment:12 by hajokirchhoff <mailinglists@…>, 8 years ago

Yes, of course. Then the only solution I see is disabling the uuid_x86 for VS2013 for the present. I am curious what you can find out about the problem.

comment:13 by Andrey Semashev, 8 years ago

Please, have a look at this pull request:

https://github.com/boostorg/uuid/pull/4

It works around the optimizer bug for me.

comment:14 by mailinglists@…, 8 years ago

Thanks, this seems to work. It still generates a movups though, but it does not crash anymore.

BTW, perhaps you could click on "Users can reproduce this bug" over at connect.microsoft, so we can get their attention and possibly get this fixed (see connect.microsoft link above).

comment:15 by hajokirchhoff <mailinglists@…>, 8 years ago

This bug seems to affect Visual Studio 2008 as well. When was this patch released? I just ported from boost 1.55 to 1.56 and never saw this problem before, but now I have the same problem with the pcmpeqd opcode in the VS 2008 compiled object files. OTOH it could be that our class layout changed and the UUIDs were 16-byte aligned until now.

Anyway, I see this crash with Visual Studio 2008 as well, so your fix should probably be enabled for _MSC_VER VS2008 also.

comment:16 by Andrey Semashev, 8 years ago

The optimized routines were first released in 1.56.

Did you test the change in the pull request (modifying it to also apply to VS2008), does it fix the problem for you?

comment:17 by Andrey Semashev, 8 years ago

BTW, the test case doesn't reproduce the crash with VS2008 SP1. Can you provide a test that breaks with that compiler?

comment:18 by hajokirchhoff <mailinglists@…>, 8 years ago

The pull request works for both, VS2013 and VS2008. I just verified again that VS2008 indeed has a similar problem, although the circumstances when this bug manifests itself are different. I'll try to come up with another test project for VS2008.

comment:19 by hajokirchhoff <mailinglists@…>, 8 years ago

I correct myself: The pull request works for VS2013 but not for VS2008. The testcase project crashes with 2008 as well, could you please try it again? Perhaps you forgot to switch to the x64 configuration? I'll upload another testcase with a VS2008 solution in a short while.

comment:20 by hajokirchhoff <mailinglists@…>, 8 years ago

I've added a new testcase for VS2008. _ReadWriteBarrier() does not help with Visual Studio 2008. It still crashes.

by hajokirchhoff <mailinglists@…>, 8 years ago

Testcase for VS2008

comment:21 by Andrey Semashev, 8 years ago

Thanks for the test case. It crashes in a different place, that's why I didn't see the crash in my test.

I've updated the pull request with a new fix for VS2008.

comment:22 by hajokirchhoff <mailinglists@…>, 8 years ago

The new fix doesn't work. Our application does not crash at the operator== location anymore, but it doesn't work either. Instead it crashes in entirely different places and/or behaves strangely. My guess is that some of the other operators no longer work correctly with the optimizations and VS2008. Unfortunately I don't have any more time to investigate this problem. I've deactivated all optimizations in detail/uuid_config.hpp for VS2008 and our application is working again as it should. Since VS 2008 is already 6 years old I'd probably just disable the optimizations in the official boost lib as well and move on.

comment:23 by Andrey Semashev, 8 years ago

Sorry, there was a bug in the workaround. Fixed now.

Note: See TracTickets for help on using tickets.