Boost C++ Libraries: Ticket #8509: SSE/AVX optimization and C++11 support https://svn.boost.org/trac10/ticket/8509 <p> The suboptimal performance of boost::uuids::uuid operators had been brought up on the developers mailing list before, and I have performed some testing on various compilers to confirm that. I also have applications that depend on uuid operations performance, so I'm interested in optimizing it. </p> <p> I've attached the test I used for benchmarking, and also my testing results performed on Intel Core i7 2600K (also tried on an older Core 2 Duo machine with similar results). The benchmarking code includes the "stock" functions which correspond to the current implementations of the equality and ordering operators, the "mem" functions based on memcmp, and "simd" functions that are implemented with SSE intrinsics. The tests measure the time needed to perform a certain number of operations in a loop. The arguments to the operations are either placed on the stack or on the heap (to emulate distinct objects in an application). To summarize the results: </p> <ol><li>The simd_equal version is the fastest across almost all configurations. The performance gain varies and can be 3.5x - 8x faster than the stock version. On MSVC x64 target though, all variants perform close (mem and sind slightly faster) if the compared values are placed adjacently on the stack. The simd version is still the fastest one if the operands are allocated on the heap. </li></ol><ol start="2"><li>On MSVC x86 target mem_less turned out to be the fastest, with simd_less coming second. On other platforms, including MSVC x64, simd_less performed best (with more moderate gain though - 1.6x to 2.3x faster than the stock version). </li></ol><p> Based on these results I've prepared a patch for uuid that makes use of SSE/AVX operations when possible (basically, it uses the "simd" versions when SSE/AVX is enabled at compile time). Also, the patch changes the generic implementations of the operators to use memcmp, since compilers generally better optimize code with it as opposed to std::equal and std::lexicographical_compare (to be fair, GCC and Clang generated the same code for "stock" and "mem" versions). For MSVC x86, the generic (now memcmp-based) operator&lt; is used since it showed faster in the tests. Lastly, the patch adds constexpr and noexcept where appropriate to improve compatibility with C++11 and allow for further optimizations by the supporting compilers. </p> <p> I would be glad to see this patch applied. If you have any questions or comments, I'll be glad to answer here or on the mailing list. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/8509 Trac 1.4.3 Andrey Semashev Sun, 28 Apr 2013 00:33:04 GMT attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">uuid_operators.cpp</span> </li> </ul> <p> Benchmarking code </p> Ticket Andrey Semashev Sun, 28 Apr 2013 00:33:26 GMT attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">uuid_operators.txt</span> </li> </ul> <p> Benchmarking results </p> Ticket Andrey Semashev Wed, 16 Oct 2013 14:30:00 GMT attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">uuid_simd.patch</span> </li> </ul> <p> Updated patch that adds optimizations for SSE and C++11 to boost::uuid. </p> Ticket Andrey Semashev Mon, 21 Oct 2013 23:01:26 GMT <link>https://svn.boost.org/trac10/ticket/8509#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:1</guid> <description> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/86385" title="Added optimizations for C++11 and SSE. Refs #8509.">[86385]</a>) Added optimizations for C++11 and SSE. Refs <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8509" title="#8509: Patches: SSE/AVX optimization and C++11 support (closed: fixed)">#8509</a>. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Tue, 12 Nov 2013 20:19:08 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/8509#comment:2 https://svn.boost.org/trac10/ticket/8509#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/86660" title="Merged changes from trunk: warning silencing and support for SSE and ...">[86660]</a>) Merged changes from trunk: warning silencing and support for SSE and C++11. Fixes <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8495" title="#8495: Patches: UUID-Library emits Warning C4244 from uuid_generators.hpp (closed: fixed)">#8495</a>, <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8697" title="#8697: Patches: gcc -Wshadow throws warnings (closed: fixed)">#8697</a>, <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/8509" title="#8509: Patches: SSE/AVX optimization and C++11 support (closed: fixed)">#8509</a>. </p> Ticket hajokirchhoff <mailinglists@…> Thu, 25 Sep 2014 17:26:38 GMT <link>https://svn.boost.org/trac10/ticket/8509#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:3</guid> <description> <p> There seems to be a bug in the Visual Studio 2013 optimizer, which causes the != operator to crash. Please have a look at <a class="ext-link" href="https://connect.microsoft.com/VisualStudio/feedbackdetail/view/981648#tabs"><span class="icon">​</span>https://connect.microsoft.com/VisualStudio/feedbackdetail/view/981648#tabs</a>. Putting a #pragma optimize("", off/on) around the operators works around this bug. I'll follow up with more details. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Thu, 25 Sep 2014 18:25:58 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:4</guid> <description> <p> Could you provide a small code snippet so I can reproduce the problem? I'll need it to implement a test. </p> <p> Also, what compiler switches do you use when you have the problem? </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 07:12:35 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:5</guid> <description> <p> I'll try to reproduce it. I am using /O2 and "no whole program optimization". The problem is that the optimizer will usually produce code like this: </p> <pre class="wiki">000000013F2110DA movdqu xmm0,xmmword ptr [rsp+48h] 000000013F2110E0 movdqu xmm1,xmmword ptr [rax+10h] 000000013F2110E5 pcmpeqd xmm1,xmm0 000000013F2110E9 pmovmskb eax,xmm1 000000013F2110ED cmp eax,0FFFFh </pre><p> It's only when the register xmm0 is already assigned a different variable that the optimizer will then produce the problematic opcode. I need to find a way to reproduce that specific scenario. The bug manifests itself in a 29MB binary of an application I am currently porting from VS2008 to VS2013 (and boost 1.55 to boost 1.56). </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 07:27:20 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:6</guid> <description> <p> My guess is that is further complicated by the fact that boost/uuid/detail/config.hpp defined BOOST_UUID_USE_SSE2 for Visual Studio and emits SSE2 opcodes which the compiler then optimizes. At least if I compile the debug version I see an entirely different output for _mm_loadu_si128 than with release. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Fri, 26 Sep 2014 07:48:56 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:7</guid> <description> <p> So this is case specific, I guess. I'm not sure it's a good idea to disable optimization for everyone. Does it happen with other operators or functions? </p> <p> If patching Boost is difficult for you you can define BOOST_UUID_NO_SIMD to disable the optimized routines. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 08:02:30 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">testcase-1-981648-vsoptimizer-bug.cpp</span> </li> </ul> <p> testcase 1 showing part of the VS2013 optimizer problem </p> Ticket hajokirchhoff <mailinglists@…> Fri, 26 Sep 2014 08:05:48 GMT <link>https://svn.boost.org/trac10/ticket/8509#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:8</guid> <description> <p> Yes, it's very specific. At least the other problem (movdqu) you mentioned over at connect.microsoft seems more common. I uploaded a testcase for you just now. </p> <p> Since the problematic functions are inline I'll try to disable optimization in my calling functions. That should leave optimization for all other instances intact. BTW, I noticed that config.hpp does not define SSE41, even though that is available for VS2013. I'll try to enable that. Could be that my problem then disappears (for now). </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 08:22:33 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:9</guid> <description> <p> Finally! Please see attached testcase that shows both problems. The problem is not that specific. I imagine it might affect others as well. All I had to do was move the function with the != operator to a different .cpp file than the calling function. Then I was able to reproduce the crash. </p> <p> Note that this is Visual Studio 2013 Professional Update 3, Version 12.0.30723.0 </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 08:23:12 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">testcase-981648-vsoptimizer-bug.zip</span> </li> </ul> <p> Testcase showing all optimizer problems including the access violation </p> Ticket hajokirchhoff <mailinglists@…> Fri, 26 Sep 2014 15:26:01 GMT <link>https://svn.boost.org/trac10/ticket/8509#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:10</guid> <description> <p> The good news is that code generated with #define BOOST_UUID_USE_SSE3 seems to be working okay. So the easiest fix for this problem would be to add </p> <pre class="wiki">#if _MSC_VER&gt;=1800 &amp;&amp; defined(_M_X64) #define BOOST_UUID_USE_SSE3 #endif </pre><p> to boost/uuid/detail/config.hpp, line 44. BOOST_UUID_USE_SSE41 would also work, but the code for the operator== gets bigger. There is an added comparision which looks weird to me. </p> <p> After that I don't have any more problems (yet). </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Fri, 26 Sep 2014 15:35:25 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:11 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:11</guid> <description> <p> Defining these macros makes the resulting binaries only runnable on CPUs with the respective instructions supported. This is ok if you target your application to these more recent CPUs, but in general SSE3 and later are not required to be present in Intel64/AMD64 CPUs. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Fri, 26 Sep 2014 15:38:44 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:12 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:12</guid> <description> <p> Yes, of course. Then the only solution I see is disabling the uuid_x86 for VS2013 for the present. I am curious what you can find out about the problem. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Sat, 27 Sep 2014 19:09:30 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:13 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:13</guid> <description> <p> Please, have a look at this pull request: </p> <p> <a class="ext-link" href="https://github.com/boostorg/uuid/pull/4"><span class="icon">​</span>https://github.com/boostorg/uuid/pull/4</a> </p> <p> It works around the optimizer bug for me. </p> </description> <category>Ticket</category> </item> <item> <author>mailinglists@…</author> <pubDate>Tue, 30 Sep 2014 06:20:31 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:14 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:14</guid> <description> <p> Thanks, this seems to work. It still generates a movups though, but it does not crash anymore. </p> <p> BTW, perhaps you could click on "Users can reproduce this bug" over at connect.microsoft, so we can get their attention and possibly get this fixed (see connect.microsoft link above). </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Wed, 01 Oct 2014 14:46:14 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:15 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:15</guid> <description> <p> This bug seems to affect Visual Studio 2008 as well. When was this patch released? I just ported from boost 1.55 to 1.56 and never saw this problem before, but now I have the same problem with the pcmpeqd opcode in the VS 2008 compiled object files. OTOH it could be that our class layout changed and the UUIDs were 16-byte aligned until now. </p> <p> Anyway, I see this crash with Visual Studio 2008 as well, so your fix should probably be enabled for _MSC_VER VS2008 also. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Wed, 01 Oct 2014 18:03:36 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:16 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:16</guid> <description> <p> The optimized routines were first released in 1.56. </p> <p> Did you test the change in the pull request (modifying it to also apply to VS2008), does it fix the problem for you? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Wed, 01 Oct 2014 20:24:24 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:17 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:17</guid> <description> <p> BTW, the test case doesn't reproduce the crash with VS2008 SP1. Can you provide a test that breaks with that compiler? </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Thu, 02 Oct 2014 07:46:37 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:18 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:18</guid> <description> <p> The pull request works for both, VS2013 and VS2008. I just verified again that VS2008 indeed has a similar problem, although the circumstances when this bug manifests itself are different. I'll try to come up with another test project for VS2008. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Thu, 02 Oct 2014 15:05:11 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:19 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:19</guid> <description> <p> I correct myself: The pull request works for VS2013 but not for VS2008. The testcase project crashes with 2008 as well, could you please try it again? Perhaps you forgot to switch to the x64 configuration? I'll upload another testcase with a VS2008 solution in a short while. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Thu, 02 Oct 2014 15:13:45 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:20 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:20</guid> <description> <p> I've added a new testcase for VS2008. _ReadWriteBarrier() does not help with Visual Studio 2008. It still crashes. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Thu, 02 Oct 2014 15:14:11 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/8509 https://svn.boost.org/trac10/ticket/8509 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">testcase-981648-vs2008-bug.zip</span> </li> </ul> <p> Testcase for VS2008 </p> Ticket Andrey Semashev Sat, 04 Oct 2014 13:02:32 GMT <link>https://svn.boost.org/trac10/ticket/8509#comment:21 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:21</guid> <description> <p> Thanks for the test case. It crashes in a different place, that's why I didn't see the crash in my test. </p> <p> I've updated the pull request with a new fix for VS2008. </p> </description> <category>Ticket</category> </item> <item> <author>hajokirchhoff <mailinglists@…></author> <pubDate>Mon, 06 Oct 2014 14:49:45 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:22 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:22</guid> <description> <p> The new fix doesn't work. Our application does not crash at the operator== location anymore, but it doesn't work either. Instead it crashes in entirely different places and/or behaves strangely. My guess is that some of the other operators no longer work correctly with the optimizations and VS2008. Unfortunately I don't have any more time to investigate this problem. I've deactivated all optimizations in detail/uuid_config.hpp for VS2008 and our application is working again as it should. Since VS 2008 is already 6 years old I'd probably just disable the optimizations in the official boost lib as well and move on. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Andrey Semashev</dc:creator> <pubDate>Sun, 12 Oct 2014 10:30:31 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/8509#comment:23 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/8509#comment:23</guid> <description> <p> Sorry, there was a bug in the workaround. Fixed now. </p> </description> <category>Ticket</category> </item> </channel> </rss>