Boost C++ Libraries: Ticket #9282: performance regression in boost::unordered on 64-bit platforms https://svn.boost.org/trac10/ticket/9282 <p> On 64-bit platforms boost::unordered containers use 'mix64_policy' which doesn't grow the bucket count as fast as the 'prime' version. That makes adding items to an empty/'non-reserved' container extremely slow. Changing new_bucket_count/prev_bucket_count should fix the problem. Versions affected: 1.50 and above. </p> <p> Regards, </p> <ul><li>Ildar. </li></ul> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9282 Trac 1.4.3 aiapub-cpp@… Tue, 22 Oct 2013 03:43:33 GMT <link>https://svn.boost.org/trac10/ticket/9282#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:1</guid> <description> <p> I was wrong - it's the mix64_policy::apply_hash() function - too many collisions while inserting uint64_t values. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Wed, 23 Oct 2013 12:11:26 GMT</pubDate> <title>status changed https://svn.boost.org/trac10/ticket/9282#comment:2 https://svn.boost.org/trac10/ticket/9282#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> <p> Could you tell me what values you're inserting? I want to work out if this is a bug in the implementation, or if it's a weakness in the mix function. If you can't give the actual values, is there a pattern they follow? </p> Ticket aiapub-cpp@… Wed, 23 Oct 2013 16:28:29 GMT <link>https://svn.boost.org/trac10/ticket/9282#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:3</guid> <description> <p> I believe it's the weakness of the mix function - taking it out (just using hf(x)) actually improves performance versus the prime_policy. I understand that you use the mix function to avoid collisions with simple hash-functions and I was also looking into a way to improve it. The test I use is simple - add 100M of 64-bit index combined with 0x9876543200000000 (any number above 32 bits) in reverse order, then make sure they all exist, then fetch for non-existing records. The code snippet: </p> <p> inline uint64_t NewID(size_t i) { </p> <blockquote> <p> return (0x9876543200000000ULL | i); </p> </blockquote> <p> } </p> <p> void test() { </p> <blockquote> <p> printf("Fill: boost::unordered_set (64K)\t"); boost::unordered_set&lt;uint64_t&gt; ids(65536); </p> </blockquote> <blockquote> <p> size_t N=100000000; <em> 100M records size_t i=0; for(; i!=N; ++i) </em></p> <blockquote> <p> ids.insert(NewID(N-i)); </p> </blockquote> </blockquote> <blockquote> <p> printf("Fetch: boost::unordered_set (64K)\n"); for(i=0; i!=N; ++i) { </p> <blockquote> <p> if (ids.find(NewID(i+1))==ids.end()) { </p> <blockquote> <p> printf("\nFailed!\n"); break; </p> </blockquote> <p> } </p> </blockquote> <p> } </p> </blockquote> <blockquote> <p> printf("Fetch: boost::unordered_set invalid records\n"); for (i=0; i!=N; ++i) { </p> <blockquote> <p> if (ids.find((uint64_t)(i | 0xF0000000UL))!=ids.end()) { </p> <blockquote> <p> printf("\nFailed!\n"); break; </p> </blockquote> <p> } </p> </blockquote> <p> } </p> </blockquote> <p> } </p> <p> Thanks for looking into it. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Wed, 23 Oct 2013 21:20:17 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9282#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:4</guid> <description> <p> This particular benchmark hits the best case behaviour for the prime policy, so it should be a bit faster than the mix policy. You're inserting consecutive values which will result in no collisions when using the prime policy, but for the mix policy the bucket should be chosen pseudo-randomly which means there will be some collisions. But in theory that result shouldn't happen in general. </p> <p> I wouldn't deny that some may consider that best case desirable, so it might be worth using the prime policy for integer types. But I first want to check that is what's happening here, and not something worse. </p> <p> I'm currently using a 32 bit operating system, which makes running your test a bit tricky, so can you try running this with the resulting container to get an idea of how many collisions there are: </p> <pre class="wiki">#include &lt;map&gt; #include &lt;iostream&gt; #include &lt;boost/unordered_set.hpp&gt; #include &lt;stdint.h&gt; void bucket_histogram(boost::unordered_set&lt;uint64_t&gt; const &amp;x) { typedef std::map&lt;std::size_t, std::size_t&gt; map; map frequencies; std::cout &lt;&lt; "Bucket count: " &lt;&lt; x.bucket_count() &lt;&lt; std::endl; for (std::size_t i = 0; i &lt; x.bucket_count(); ++i) { ++frequencies[x.bucket_size(i)]; } for (map::const_iterator i = frequencies.begin(); i != frequencies.end(); ++i) { std::cout &lt;&lt; i-&gt;first &lt;&lt; " " &lt;&lt; i-&gt;second &lt;&lt; std::endl; } std::cout &lt;&lt; std::endl; } </pre><p> The output for the prime policy should be something like: </p> <pre class="wiki">Bucket count: 100663319 0 663319 1 100000000 </pre><p> i.e. no collisions. All buckets either have 1 element, or no elements. </p> <p> The output for the mix policy should be something like: </p> <pre class="wiki">Bucket count: 134217728 0 63716044 1 47468076 2 17685158 3 4390091 4 819351 5 122072 6 15183 7 1564 8 176 9 13 </pre><p> i.e. up to 9 collisions. </p> <p> I got those values by running a simulation of the container. If it's significantly worse than that, then there's definitely a bug. </p> <p> In theory, the average frequencies should follow the poisson distribution with lambda = number of elements/number of buckets: </p> <pre class="wiki"> 0 63714059.8155024 1 47470673.7812626 2 17684204.0498788 3 4391919.58553563 4 818058.771181037 5 121900.256154096 6 15137.1280543625 7 1611.14846535692 8 150.049893684399 9 12.4217647383941 10 0.925493593394311 </pre><p> Which is pretty similar to the results my simulation got. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Wed, 23 Oct 2013 22:26:44 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9282#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:5</guid> <description> <p> It just occurred to me that my old laptop is running a 64 bit operating system. It only has 2 GB memory, so I couldn't run it for as many records. I ran it for 30M records and I got the kind of distribution I'd expect. </p> <p> But the performance was worse than I expected (about twice as long for the mix policy). I'm not sure, but I think it might be because of bad cache coherency. The mix policy is jumping around the buckets array, while the prime policy is moving through it linearly. So even if there's an empty bucket the mix policy is much more likely to have a cache miss when checking the bucket. I'll think about this. </p> </description> <category>Ticket</category> </item> <item> <author>aiapub-cpp@…</author> <pubDate>Thu, 24 Oct 2013 01:44:24 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/9282 https://svn.boost.org/trac10/ticket/9282 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">boost_unordered_mix.png</span> </li> </ul> <p> memory usage - boost::unordered_set using mix64_policy </p> Ticket aiapub-cpp@… Thu, 24 Oct 2013 01:45:35 GMT attachment set https://svn.boost.org/trac10/ticket/9282 https://svn.boost.org/trac10/ticket/9282 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">boost_unordered_nomix.png</span> </li> </ul> <p> memory usage - boost::unordered_set using mix64_policy WITHOUT the mix code </p> Ticket aiapub-cpp@… Thu, 24 Oct 2013 01:59:03 GMT <link>https://svn.boost.org/trac10/ticket/9282#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:6</guid> <description> <p> I attached diagrams showing memory usage with and without the mix function. You can see that memory is used differently - the first one has longer bucket-chains. Peak memory usage is the same for both - 4GB. </p> <p> And you were right about the histograms (the time is in seconds): </p> <p> With mix code: </p> <pre class="wiki">Fill: boost::unordered_set (64K) - 00:52 Bucket count: 134217728 0 63716044 1 47468076 2 17685158 3 4390091 4 819351 5 122072 6 15183 7 1564 8 176 9 13 Fetch: boost::unordered_set (64K) - 00:15 Fetch: boost::unordered_set non-existing records(64K) - 00:18 Total time: 01:41 </pre><p> Without the mix code: </p> <pre class="wiki">Fill: boost::unordered_set (64K) - 00:07 Bucket count: 134217728 0 34217728 1 100000000 Fetch: boost::unordered_set (64K) - 00:01 Fetch: boost::unordered_set non-existing records (64K) - 00:00 Total time: 00:09 </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Thu, 07 Nov 2013 15:50:32 GMT</pubDate> <title>milestone changed https://svn.boost.org/trac10/ticket/9282#comment:7 https://svn.boost.org/trac10/ticket/9282#comment:7 <ul> <li><strong>milestone</strong> <span class="trac-field-old">To Be Determined</span> → <span class="trac-field-new">Boost 1.56.0</span> </li> </ul> <p> Just to let you know that I haven't forgotten about this. I'm not sure how I'm what action I'll take (if any) it, but I'll try to resolve this for 1.56.0. </p> Ticket Nate Rosenblum <flander@…> Sun, 23 Feb 2014 18:42:23 GMT <link>https://svn.boost.org/trac10/ticket/9282#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:8</guid> <description> <p> Came across this yesterday. Pardon the unsolicited feedback, but it's hard to imagine what one could do to resolve this in the general case; as noted above, the combination of identity function hash_value&lt;int&gt; + the insertion order strongly favors the implementation w/o Wang's mixing function (best case) or the prime policy, both for collision avoidance (low bits of all keys are unique) and for more efficient use of the cache write buffer. </p> <p> One possibility would be to expose the bucket policy as a top-level template parameter, so that users can pick an appropriate strategy in the rare occasions that the default is inappropriate(*). As a small side note, a third policy w/ power-of-two table size that doesn't perturb the hash value would be slightly more efficient when we know that the hash function distributes well over the lower bits (e.g. hash_value&lt;std::string&gt;). </p> <p> (*) It's hard to imagine when this would be; if you know that you're filling the set/map with entries that will hash to unique consecutive buckets during insert, a different data structure might be worthwhile. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Sun, 23 Feb 2014 20:39:02 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9282#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:9</guid> <description> <p> Thanks for the feedback. Exposing the policy is a good idea, but I'm not keen on adding an extra template parameter, so I'm thinking about adding a type trait for the hash function, which would let someone using a custom hash function indicate which strategy it is suited for. Actually, I'll bring up that up on the development list now. </p> <p> I decided in the end to use a prime policy for integers in the next release. This is slower in general, but this case is pretty important. The change is at: </p> <p> <a class="ext-link" href="https://github.com/boostorg/unordered/commit/57819d1dd98d23f2a59f59bdb752420b5ae11d52"><span class="icon">​</span>https://github.com/boostorg/unordered/commit/57819d1dd98d23f2a59f59bdb752420b5ae11d52</a> </p> <p> I'll close this ticket when it's merged to master. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Mon, 24 Feb 2014 18:23:10 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9282#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:10</guid> <description> <blockquote class="citation"> <p> This is slower in general, but this case is pretty important </p> </blockquote> <p> Agreed; prime tables (with MLF 1.0) will always outperform power-two tables with integral keys in the general case, since they produce no collisions. The only exception is if the hash function is guaranteed to be unique in the lower log_2(table_size) bits; in that case unmixed power-two hashing will win. Not sure where I would look to find such use cases (other than the toy insert-consecutive benchmark above), so I am enthusiastic for the above-referenced patch to go in. </p> </description> <category>Ticket</category> </item> <item> <author>Nate Rosenblum <flander@…></author> <pubDate>Mon, 24 Feb 2014 18:24:06 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9282#comment:11 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9282#comment:11</guid> <description> <p> Sorry, failed to sign in above <sup></sup><sup></sup> </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Daniel James</dc:creator> <pubDate>Fri, 02 May 2014 23:00:44 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/9282#comment:12 https://svn.boost.org/trac10/ticket/9282#comment:12 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> Ticket