id,summary,reporter,owner,description,type,status,milestone,component,version,severity,resolution,keywords,cc 8509,SSE/AVX optimization and C++11 support,Andrey Semashev,Andy Tompkins,"The suboptimal performance of boost::uuids::uuid operators had been brought up on the developers mailing list before, and I have performed some testing on various compilers to confirm that. I also have applications that depend on uuid operations performance, so I'm interested in optimizing it. I've attached the test I used for benchmarking, and also my testing results performed on Intel Core i7 2600K (also tried on an older Core 2 Duo machine with similar results). The benchmarking code includes the ""stock"" functions which correspond to the current implementations of the equality and ordering operators, the ""mem"" functions based on memcmp, and ""simd"" functions that are implemented with SSE intrinsics. The tests measure the time needed to perform a certain number of operations in a loop. The arguments to the operations are either placed on the stack or on the heap (to emulate distinct objects in an application). To summarize the results: 1. The simd_equal version is the fastest across almost all configurations. The performance gain varies and can be 3.5x - 8x faster than the stock version. On MSVC x64 target though, all variants perform close (mem and sind slightly faster) if the compared values are placed adjacently on the stack. The simd version is still the fastest one if the operands are allocated on the heap. 2. On MSVC x86 target mem_less turned out to be the fastest, with simd_less coming second. On other platforms, including MSVC x64, simd_less performed best (with more moderate gain though - 1.6x to 2.3x faster than the stock version). Based on these results I've prepared a patch for uuid that makes use of SSE/AVX operations when possible (basically, it uses the ""simd"" versions when SSE/AVX is enabled at compile time). Also, the patch changes the generic implementations of the operators to use memcmp, since compilers generally better optimize code with it as opposed to std::equal and std::lexicographical_compare (to be fair, GCC and Clang generated the same code for ""stock"" and ""mem"" versions). For MSVC x86, the generic (now memcmp-based) operator< is used since it showed faster in the tests. Lastly, the patch adds constexpr and noexcept where appropriate to improve compatibility with C++11 and allow for further optimizations by the supporting compilers. I would be glad to see this patch applied. If you have any questions or comments, I'll be glad to answer here or on the mailing list. ",Patches,closed,To Be Determined,uuid,Boost 1.53.0,Optimization,fixed,,