Boost C++ Libraries: Ticket #11801: Recursive modify in multi_index sometimes does not update index correctly https://svn.boost.org/trac10/ticket/11801 <p> Please use attached simplified file to reproduce the problem. I used boost 1.59 with VS2015. Reproduced in all 4 configurations (debug/release, x86/x64). I'm not sure if it's a bug or limitation of multi_index, but commenting any line in fill_test_date() fixes the problem, so seems like a bug in some specific conditions. If it's multi_index limitation, please mention this in description of modify/modify_key. </p> <p> The problem: multi_index container has two keys ordered_unique and ordered_non_unique. In calc_layer() I call modify() on first index and change the key of second index. During calculation of new key for second index, other recursive calls to modify() may be performed. All calls to modify() return true but in some cases second index is not correctly ordered in we see message "WRONG SORT ORDER BY LAYER". </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/11801 Trac 1.4.3 Valentyn Shtronda <valiko.ua@…> Wed, 18 Nov 2015 10:04:32 GMT attachment set https://svn.boost.org/trac10/ticket/11801 https://svn.boost.org/trac10/ticket/11801 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">DirectedGraph.cpp</span> </li> </ul> Ticket Joaquín M López Muñoz Thu, 19 Nov 2015 08:08:07 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/11801#comment:1 https://svn.boost.org/trac10/ticket/11801#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">invalid</span> </li> </ul> <p> <code>modify</code> can be used recursively if you're careful. Consider the following pseudocode: </p> <pre class="wiki">c.modify(it,[&amp;](element&amp; x){ x.data=...; // c invariant broken c.find(...); // undefined behavior }); </pre><p> When <code>x.data</code> is modified, the invariants of <code>c</code> (i.e. the sortedness of their indices) are potentially broken (<code>modify</code> precisely restores them, i.e. reindexes, after the user-provided lambda returns), so it is not legal to call <code>f.find</code> in the following line. In your example, you break the invariant of <code>deps_map</code> at lines </p> <pre class="wiki">83: node_info.layer = 0; 88: node_info.layer = CALCULATION_IN_PROGRESS; 94: node_info.layer = UNINITIALIZED; 100: node_info.layer = UNINITIALIZED; 103: node_info.layer = std::max(node_info.layer, dep_it-&gt;layer + 1); </pre><p> 83, 94 and 100 aren't problematic as you return immediately after modification; 88 can be seemingly dispensed with since you don't really have circular references; actually, it is 103 that poses the problem, because after changing <code>node_info.layer</code> you continue using the container in line 91. </p> <p> The solution is to defer element modification to the very end of the user-provided lambda: </p> <pre class="wiki">bool succeeded = deps_map.modify(it, [=](node_info_t&amp; node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t&amp; index_by_name = deps_map.get&lt;by_name&gt;(); // following statement not needed, in the asumption there are no circular refs // node_info.layer = CALCULATION_IN_PROGRESS; int layer=0; for (const std::string&amp; dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } layer = std::max(layer, dep_it-&gt;layer + 1); } node_info.layer=layer; } }); </pre><p> I've tried this and it works. Closing the ticket as a non-bug, please re-open if you think otherwise. </p> Ticket Valentyn Shtronda <valiko.ua@…> Thu, 19 Nov 2015 14:25:39 GMT <link>https://svn.boost.org/trac10/ticket/11801#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:2</guid> <description> <p> Thank you for quick reply. </p> <p> In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively? </p> </description> <category>Ticket</category> </item> <item> <author>Valentyn Shtronda <valiko.ua@…></author> <pubDate>Thu, 19 Nov 2015 14:30:43 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:3</guid> <description> <p> I need to keep circular dependency check because user-supplied data can contain them. I can implement this check other way but current way is so simple and elegant... </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Thu, 19 Nov 2015 16:00:03 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:4</guid> <description> <p> <em>In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively?</em> </p> <p> Unfortunately not: when <code>modify</code> reindexes, it does so on <em>every</em> index; so, if you broke the element pointed to by <code>it</code> and then, further down the code, you do a <code>modify(dep_it,...)</code> (within the recursive call to <code>calc_layer</code>), Boost.MultiIndex does not know that <code>*it</code> is misplaced and havoc ensues. </p> <p> <em>I need to keep circular dependency check because user-supplied data can contain them...</em> </p> <p> OK, there's another way to do this that lets you keep your <code>CALCULATION_IN_PROGRESS</code> feature: </p> <pre class="wiki">bool succeeded = deps_map.modify(it, [=](node_info_t&amp; node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t&amp; index_by_name = deps_map.get&lt;by_name&gt;(); node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&amp;){}); for (const std::string&amp; dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } node_info.layer = std::max(node_info.layer, dep_it-&gt;layer + 1); index_by_name.modify(it,[](node_info_t&amp;){}); } } }); </pre><p> What we're doing here is to reindex immediately after any element modification, without waiting for the lib to do it on function exit. </p> <pre class="wiki">node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&amp;){}); ... node_info.layer = std::max(node_info.layer, dep_it-&gt;layer + 1); index_by_name.modify(it,[](node_info_t&amp;){}); </pre><p> This is done by calling <code>modify</code> with a do-nothing function after we change the value pointed to by <code>it</code> (i.e. <code>node_info</code>). I've checked and it works, please confirm back. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Thu, 19 Nov 2015 16:55:15 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:5</guid> <description> <p> This is a combination of the previous solutions, probably somewhat faster as it incurs in fewer reindexings: </p> <pre class="wiki">bool succeeded = deps_map.modify(it, [=](node_info_t&amp; node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t&amp; index_by_name = deps_map.get&lt;by_name&gt;(); node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&amp;){}); int layer = 0; for (const std::string&amp; dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } layer = std::max(layer, dep_it-&gt;layer + 1); } node_info.layer = layer; } }); </pre> </description> <category>Ticket</category> </item> <item> <author>Valentyn Shtronda <valiko.ua@…></author> <pubDate>Thu, 19 Nov 2015 18:33:11 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:6</guid> <description> <p> I confirm, your suggestion works. Thanks! </p> <p> Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify(). </p> <p> Consider adding reindex() method so that the code was straightforward: </p> <pre class="wiki">deps_map.reindex(); </pre><p> instead of </p> <pre class="wiki">index_by_name.modify(it,[](node_info_t&amp;){}); </pre><p> In documentation for this method you could describe in what case it must be called. </p> </description> <category>Ticket</category> </item> <item> <author>Valentyn Shtronda <valiko.ua@…></author> <pubDate>Thu, 19 Nov 2015 18:34:55 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:7</guid> <description> <p> Update: <code>it</code> must be passed to reindex(). Otherwise it would be full sort. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Fri, 20 Nov 2015 07:21:12 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:8</guid> <description> <p> <em>Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify().</em> </p> <p> Exactly, <code>modify(it,mod)</code> does only (possibly) relocate the element pointed to by <code>it</code>. In fact this is stated <a href="http://www.boost.org/libs/multi_index/doc/reference/ord_indices.html#modify">in the docs</a>: </p> <p> <strong><em>Effects:</em></strong><em> Calls <code>mod(e)</code> where <code>e</code> is the element pointed to by <code>position</code> and rearranges <code>*position</code> into all the indices of the <code>multi_index_container</code> [...]</em> </p> <p> As for how to document permissibly operations within <code>mod(e)</code>, the (unstated) assumption is that the container is not messed with in any manner, and I'm very reluctant to provide extra guarantees beyond this. A safe compromise would be to add something like this in the <strong>Requires</strong> clause: </p> <p> <em><code>mod(e)</code> does not invoke any operation of the <code>multi_index_container</code> except possibly only before <code>e</code> is directly modified, and only as long as <code>position</code> is not rendered invalid.</em> </p> <p> Does this sound sufficiently clear and unscary? By the way this makes the <code>reindex</code> trick legally invalid, but your code could be very easily rewritten to conform: </p> <pre class="wiki">bool succeeded = deps_map.modify(it, [=](node_info_t&amp; node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t&amp; index_by_name = deps_map.get&lt;by_name&gt;(); index_by_name.modify(it,[](node_info_t&amp; node_info_alias){ node_info_alias.layer = CALCULATION_IN_PROGRESS; }); int layer = 0; for (const std::string&amp; dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } layer = std::max(layer, dep_it-&gt;layer + 1); } node_info.layer = layer; } }); </pre><p> The key here is the qualifier <em>directly</em> in "before <code>e</code> is directly modified": you can change <code>e</code> indirectly through a recursive call to <code>modify</code>, as done in </p> <pre class="wiki">index_by_name.modify(it,[](node_info_t&amp; node_info_alias){ node_info_alias.layer = CALCULATION_IN_PROGRESS; }); </pre> </description> <category>Ticket</category> </item> <item> <author>Valentyn Shtronda <valiko.ua@…></author> <pubDate>Fri, 20 Nov 2015 12:29:16 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:9</guid> <description> <blockquote class="citation"> <p> <em><code>mod(e)</code> does not invoke any operation of the <code>multi_index_container</code> except possibly only before <code>e</code> is directly modified, and only as long as <code>position</code> is not rendered invalid.</em> </p> </blockquote> <p> Not very clear. I had to re-read it several times and think for some time to get it. It's very conservative and I understand you. But multi_index is more powerful than allowed by this statement (it's ok to call find on unaffected indices, it's ok to use trick with re-indexing), at least at the moment :) </p> <p> The problem was caused by modification of <strong>key fields</strong> of <strong>more than one element</strong> (and thus made their position invalid) before container got chance to re-position the latest one, and this may be done incorrectly. </p> <p> So I would re-state it somehow more straightforward. How about this draft: </p> <p> <em>Modification of key fields in <code>e</code> makes corresponding indices temporarily broken, until <code>modify()</code> returns. If you need to restore them sooner than <code>modify()</code> returns (for example, for recursive calls to <code>modify()</code>) wrap modification of key fields in its own call to <code>modify()</code>. </em></p> <p> And/or give a link to some example where you could provide more verbose explanation of the problem and solution. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Mon, 23 Nov 2015 06:51:26 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:10</guid> <description> <p> <em>Modification of key fields in <code>e</code> makes corresponding indices temporarily broken, until <code>modify()</code> returns. If you need to restore them sooner than <code>modify()</code> returns (for example, for recursive calls to <code>modify()</code>) wrap modification of key fields in its own call to <code>modify()</code>.</em> </p> <p> Not good :-) for a number of reasons (legalese follows): </p> <ul><li>There is no notion of "key field" in the library. The only concept defined is <code>KeyExtractor</code> which is more general as it doesn't relate to any specific data member (it can be, say, calculated, or made dependent on info indirectly associated to the element). Of course there is an intuitive understanding of what data members contribute to index invariants, but this is nowhere to be found in the reference docs. </li><li>"If you need to restore [the indices] sooner[...]": why should I need to restore them? It is not up to the user to figure it out, but to the reference to state the conditions in which the need arises. </li></ul><p> Moreover, you can't really assume that <em>any</em> part of the container interface is usable once you modify the element. For instance, in <a href="http://www.boost.org/libs/multi_index/doc/tutorial/debug.html#invariant_check">invariant-checking mode</a> any non-const member function will assert if there's some broken index, no matter which index you're actually using. Even allowing <code>mod(e)</code> to use only const member functions after modification limits my implementation leeway in the future. </p> </description> <category>Ticket</category> </item> <item> <author>Valentyn Shtronda <valiko.ua@…></author> <pubDate>Wed, 25 Nov 2015 16:11:02 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:11 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:11</guid> <description> <p> <em>There is no notion of "key field" in the library. The only concept defined is <a class="missing wiki">KeyExtractor</a></em> </p> <p> Yes, you can use "<a class="missing wiki">KeyExtractor</a>" term instead of "key field", and re-formulate accordingly. It was only a draft of my vision about how to make the doc more obvious (and thus more helpful). </p> <p> <em>why should I need to restore them? Moreover, you can't really assume that any part of the container interface is usable once you modify the element.</em> </p> <p> Then instead of saying "for example", you can say "any part of the container interface". So, new draft can sound like this: </p> <p> <em>Modification of any field used by <code>KeyExtractor</code> temporarily breaks container integrity, until <code>modify()</code> returns. If you need to use any interface method before <code>modify()</code> returns, wrap modification of such fields in its own call to <code>modify()</code>.</em> </p> <p> Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now. Thanks again! </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Joaquín M López Muñoz</dc:creator> <pubDate>Thu, 26 Nov 2015 19:35:04 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11801#comment:12 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11801#comment:12</guid> <description> <p> <em>Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now.</em> </p> <p> Well, finally I decided to go with my stricter wording: </p> <p> <a class="ext-link" href="https://github.com/boostorg/multi_index/commit/c05551632089a917c79f97383d69e2fd8171df27"><span class="icon">​</span>https://github.com/boostorg/multi_index/commit/c05551632089a917c79f97383d69e2fd8171df27</a> </p> <p> Thank you for spotting this problem in the documentation, and good luck using Boost.MultiIndex. </p> </description> <category>Ticket</category> </item> </channel> </rss>