#11801 closed Bugs (invalid)
Recursive modify in multi_index sometimes does not update index correctly
Reported by: | Owned by: | Joaquín M López Muñoz | |
---|---|---|---|
Milestone: | To Be Determined | Component: | multi_index |
Version: | Boost 1.59.0 | Severity: | Problem |
Keywords: | multi_index multi_index_container recursive modify | Cc: |
Description
Please use attached simplified file to reproduce the problem. I used boost 1.59 with VS2015. Reproduced in all 4 configurations (debug/release, x86/x64). I'm not sure if it's a bug or limitation of multi_index, but commenting any line in fill_test_date() fixes the problem, so seems like a bug in some specific conditions. If it's multi_index limitation, please mention this in description of modify/modify_key.
The problem: multi_index container has two keys ordered_unique and ordered_non_unique. In calc_layer() I call modify() on first index and change the key of second index. During calculation of new key for second index, other recursive calls to modify() may be performed. All calls to modify() return true but in some cases second index is not correctly ordered in we see message "WRONG SORT ORDER BY LAYER".
Attachments (1)
Change History (13)
by , 7 years ago
Attachment: | DirectedGraph.cpp added |
---|
comment:1 by , 7 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 7 years ago
Thank you for quick reply.
In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively?
comment:3 by , 7 years ago
I need to keep circular dependency check because user-supplied data can contain them. I can implement this check other way but current way is so simple and elegant...
comment:4 by , 7 years ago
In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively?
Unfortunately not: when modify
reindexes, it does so on every index; so, if you broke the element pointed to by it
and then, further down the code, you do a modify(dep_it,...)
(within the recursive call to calc_layer
), Boost.MultiIndex does not know that *it
is misplaced and havoc ensues.
I need to keep circular dependency check because user-supplied data can contain them...
OK, there's another way to do this that lets you keep your CALCULATION_IN_PROGRESS
feature:
bool succeeded = deps_map.modify(it, [=](node_info_t& node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t& index_by_name = deps_map.get<by_name>(); node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&){}); for (const std::string& dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } node_info.layer = std::max(node_info.layer, dep_it->layer + 1); index_by_name.modify(it,[](node_info_t&){}); } } });
What we're doing here is to reindex immediately after any element modification, without waiting for the lib to do it on function exit.
node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&){}); ... node_info.layer = std::max(node_info.layer, dep_it->layer + 1); index_by_name.modify(it,[](node_info_t&){});
This is done by calling modify
with a do-nothing function after we change the value pointed to by it
(i.e. node_info
). I've checked and it works, please confirm back.
comment:5 by , 7 years ago
This is a combination of the previous solutions, probably somewhat faster as it incurs in fewer reindexings:
bool succeeded = deps_map.modify(it, [=](node_info_t& node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t& index_by_name = deps_map.get<by_name>(); node_info.layer = CALCULATION_IN_PROGRESS; index_by_name.modify(it,[](node_info_t&){}); int layer = 0; for (const std::string& dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } layer = std::max(layer, dep_it->layer + 1); } node_info.layer = layer; } });
comment:6 by , 7 years ago
I confirm, your suggestion works. Thanks!
Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify().
Consider adding reindex() method so that the code was straightforward:
deps_map.reindex();
instead of
index_by_name.modify(it,[](node_info_t&){});
In documentation for this method you could describe in what case it must be called.
follow-up: 9 comment:8 by , 7 years ago
Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify().
Exactly, modify(it,mod)
does only (possibly) relocate the element pointed to by it
. In fact this is stated in the docs:
Effects: Calls mod(e)
where e
is the element pointed to by position
and rearranges *position
into all the indices of the multi_index_container
[...]
As for how to document permissibly operations within mod(e)
, the (unstated) assumption is that the container is not messed with in any manner, and I'm very reluctant to provide extra guarantees beyond this. A safe compromise would be to add something like this in the Requires clause:
mod(e)
does not invoke any operation of the multi_index_container
except possibly only before e
is directly modified, and only as long as position
is not rendered invalid.
Does this sound sufficiently clear and unscary? By the way this makes the reindex
trick legally invalid, but your code could be very easily rewritten to conform:
bool succeeded = deps_map.modify(it, [=](node_info_t& node_info) { if (node_info.deps.size() == 0) { node_info.layer = 0; } else { index_by_name_t& index_by_name = deps_map.get<by_name>(); index_by_name.modify(it,[](node_info_t& node_info_alias){ node_info_alias.layer = CALCULATION_IN_PROGRESS; }); int layer = 0; for (const std::string& dep_name : node_info.deps) { index_by_name_t::iterator dep_it = index_by_name.find(dep_name); if (dep_it == deps_map.end()) { node_info.layer = UNINITIALIZED; printf("Unknown dependency.\n"); return; } if (!calc_layer(dep_it)) { node_info.layer = UNINITIALIZED; return; } layer = std::max(layer, dep_it->layer + 1); } node_info.layer = layer; } });
The key here is the qualifier directly in "before e
is directly modified": you can change e
indirectly through a recursive call to modify
, as done in
index_by_name.modify(it,[](node_info_t& node_info_alias){ node_info_alias.layer = CALCULATION_IN_PROGRESS; });
comment:9 by , 7 years ago
mod(e)
does not invoke any operation of themulti_index_container
except possibly only beforee
is directly modified, and only as long asposition
is not rendered invalid.
Not very clear. I had to re-read it several times and think for some time to get it. It's very conservative and I understand you. But multi_index is more powerful than allowed by this statement (it's ok to call find on unaffected indices, it's ok to use trick with re-indexing), at least at the moment :)
The problem was caused by modification of key fields of more than one element (and thus made their position invalid) before container got chance to re-position the latest one, and this may be done incorrectly.
So I would re-state it somehow more straightforward. How about this draft:
Modification of key fields in e
makes corresponding indices temporarily broken, until modify()
returns. If you need to restore them sooner than modify()
returns (for example, for recursive calls to modify()
) wrap modification of key fields in its own call to modify()
.
And/or give a link to some example where you could provide more verbose explanation of the problem and solution.
comment:10 by , 7 years ago
Modification of key fields in e
makes corresponding indices temporarily broken, until modify()
returns. If you need to restore them sooner than modify()
returns (for example, for recursive calls to modify()
) wrap modification of key fields in its own call to modify()
.
Not good :-) for a number of reasons (legalese follows):
- There is no notion of "key field" in the library. The only concept defined is
KeyExtractor
which is more general as it doesn't relate to any specific data member (it can be, say, calculated, or made dependent on info indirectly associated to the element). Of course there is an intuitive understanding of what data members contribute to index invariants, but this is nowhere to be found in the reference docs. - "If you need to restore [the indices] sooner[...]": why should I need to restore them? It is not up to the user to figure it out, but to the reference to state the conditions in which the need arises.
Moreover, you can't really assume that any part of the container interface is usable once you modify the element. For instance, in invariant-checking mode any non-const member function will assert if there's some broken index, no matter which index you're actually using. Even allowing mod(e)
to use only const member functions after modification limits my implementation leeway in the future.
comment:11 by , 7 years ago
There is no notion of "key field" in the library. The only concept defined is KeyExtractor
Yes, you can use "KeyExtractor" term instead of "key field", and re-formulate accordingly. It was only a draft of my vision about how to make the doc more obvious (and thus more helpful).
why should I need to restore them? Moreover, you can't really assume that any part of the container interface is usable once you modify the element.
Then instead of saying "for example", you can say "any part of the container interface". So, new draft can sound like this:
Modification of any field used by KeyExtractor
temporarily breaks container integrity, until modify()
returns. If you need to use any interface method before modify()
returns, wrap modification of such fields in its own call to modify()
.
Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now. Thanks again!
comment:12 by , 7 years ago
Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now.
Well, finally I decided to go with my stricter wording:
https://github.com/boostorg/multi_index/commit/c05551632089a917c79f97383d69e2fd8171df27
Thank you for spotting this problem in the documentation, and good luck using Boost.MultiIndex.
modify
can be used recursively if you're careful. Consider the following pseudocode:When
x.data
is modified, the invariants ofc
(i.e. the sortedness of their indices) are potentially broken (modify
precisely restores them, i.e. reindexes, after the user-provided lambda returns), so it is not legal to callf.find
in the following line. In your example, you break the invariant ofdeps_map
at lines83, 94 and 100 aren't problematic as you return immediately after modification; 88 can be seemingly dispensed with since you don't really have circular references; actually, it is 103 that poses the problem, because after changing
node_info.layer
you continue using the container in line 91.The solution is to defer element modification to the very end of the user-provided lambda:
I've tried this and it works. Closing the ticket as a non-bug, please re-open if you think otherwise.