Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#11801 closed Bugs (invalid)

Recursive modify in multi_index sometimes does not update index correctly

Reported by: Valentyn Shtronda <valiko.ua@…> Owned by: Joaquín M López Muñoz
Milestone: To Be Determined Component: multi_index
Version: Boost 1.59.0 Severity: Problem
Keywords: multi_index multi_index_container recursive modify Cc:

Description

Please use attached simplified file to reproduce the problem. I used boost 1.59 with VS2015. Reproduced in all 4 configurations (debug/release, x86/x64). I'm not sure if it's a bug or limitation of multi_index, but commenting any line in fill_test_date() fixes the problem, so seems like a bug in some specific conditions. If it's multi_index limitation, please mention this in description of modify/modify_key.

The problem: multi_index container has two keys ordered_unique and ordered_non_unique. In calc_layer() I call modify() on first index and change the key of second index. During calculation of new key for second index, other recursive calls to modify() may be performed. All calls to modify() return true but in some cases second index is not correctly ordered in we see message "WRONG SORT ORDER BY LAYER".

Attachments (1)

DirectedGraph.cpp (5.7 KB ) - added by Valentyn Shtronda <valiko.ua@…> 7 years ago.

Download all attachments as: .zip

Change History (13)

by Valentyn Shtronda <valiko.ua@…>, 7 years ago

Attachment: DirectedGraph.cpp added

comment:1 by Joaquín M López Muñoz, 7 years ago

Resolution: invalid
Status: newclosed

modify can be used recursively if you're careful. Consider the following pseudocode:

c.modify(it,[&](element& x){
  x.data=...; // c invariant broken
  c.find(...); // undefined behavior
});

When x.data is modified, the invariants of c (i.e. the sortedness of their indices) are potentially broken (modify precisely restores them, i.e. reindexes, after the user-provided lambda returns), so it is not legal to call f.find in the following line. In your example, you break the invariant of deps_map at lines

83:    node_info.layer = 0;
88:    node_info.layer = CALCULATION_IN_PROGRESS;
94:    node_info.layer = UNINITIALIZED;
100:   node_info.layer = UNINITIALIZED;
103:   node_info.layer = std::max(node_info.layer, dep_it->layer + 1);

83, 94 and 100 aren't problematic as you return immediately after modification; 88 can be seemingly dispensed with since you don't really have circular references; actually, it is 103 that poses the problem, because after changing node_info.layer you continue using the container in line 91.

The solution is to defer element modification to the very end of the user-provided lambda:

bool succeeded = deps_map.modify(it,
    [=](node_info_t& node_info)
    {
        if (node_info.deps.size() == 0)
        {
            node_info.layer = 0;
        }
        else
        {
            index_by_name_t& index_by_name = deps_map.get<by_name>();
            // following statement not needed, in the asumption there are no circular refs
            // node_info.layer = CALCULATION_IN_PROGRESS;
            int layer=0;
            for (const std::string& dep_name : node_info.deps)
            {
                index_by_name_t::iterator dep_it = index_by_name.find(dep_name);
                if (dep_it == deps_map.end())
                {
                    node_info.layer = UNINITIALIZED;
                    printf("Unknown dependency.\n");
                    return;
                }
                if (!calc_layer(dep_it))
                {
                    node_info.layer = UNINITIALIZED;
                    return;
                }
                layer = std::max(layer, dep_it->layer + 1);
            }
            node_info.layer=layer;
        }
    });

I've tried this and it works. Closing the ticket as a non-bug, please re-open if you think otherwise.

Last edited 7 years ago by Joaquín M López Muñoz (previous) (diff)

comment:2 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

Thank you for quick reply.

In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively?

comment:3 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

I need to keep circular dependency check because user-supplied data can contain them. I can implement this check other way but current way is so simple and elegant...

comment:4 by Joaquín M López Muñoz, 7 years ago

In my lambda I modify layer member and do not change name member. Doesn't this mean that first index remains sorted, is not broken, and I can use find on it even inside lambda recursively?

Unfortunately not: when modify reindexes, it does so on every index; so, if you broke the element pointed to by it and then, further down the code, you do a modify(dep_it,...) (within the recursive call to calc_layer), Boost.MultiIndex does not know that *it is misplaced and havoc ensues.

I need to keep circular dependency check because user-supplied data can contain them...

OK, there's another way to do this that lets you keep your CALCULATION_IN_PROGRESS feature:

bool succeeded = deps_map.modify(it,
    [=](node_info_t& node_info)
    {
        if (node_info.deps.size() == 0)
        {
            node_info.layer = 0;
        }
        else
        {
            index_by_name_t& index_by_name = deps_map.get<by_name>();
            node_info.layer = CALCULATION_IN_PROGRESS;
            index_by_name.modify(it,[](node_info_t&){});
            for (const std::string& dep_name : node_info.deps)
            {
                index_by_name_t::iterator dep_it = index_by_name.find(dep_name);
                if (dep_it == deps_map.end())
                {
                    node_info.layer = UNINITIALIZED;
                    printf("Unknown dependency.\n");
                    return;
                }
                if (!calc_layer(dep_it))
                {
                    node_info.layer = UNINITIALIZED;
                    return;
                }
                node_info.layer = std::max(node_info.layer, dep_it->layer + 1);
                index_by_name.modify(it,[](node_info_t&){});
            }
        }
    });

What we're doing here is to reindex immediately after any element modification, without waiting for the lib to do it on function exit.

node_info.layer = CALCULATION_IN_PROGRESS;
index_by_name.modify(it,[](node_info_t&){});
...
node_info.layer = std::max(node_info.layer, dep_it->layer + 1);
index_by_name.modify(it,[](node_info_t&){});

This is done by calling modify with a do-nothing function after we change the value pointed to by it (i.e. node_info). I've checked and it works, please confirm back.

comment:5 by Joaquín M López Muñoz, 7 years ago

This is a combination of the previous solutions, probably somewhat faster as it incurs in fewer reindexings:

bool succeeded = deps_map.modify(it,
    [=](node_info_t& node_info)
    {
        if (node_info.deps.size() == 0)
        {
            node_info.layer = 0;
        }
        else
        {
            index_by_name_t& index_by_name = deps_map.get<by_name>();
            node_info.layer = CALCULATION_IN_PROGRESS;
            index_by_name.modify(it,[](node_info_t&){});
            int layer = 0;
            for (const std::string& dep_name : node_info.deps)
            {
                index_by_name_t::iterator dep_it = index_by_name.find(dep_name);
                if (dep_it == deps_map.end())
                {
                    node_info.layer = UNINITIALIZED;
                    printf("Unknown dependency.\n");
                    return;
                }
                if (!calc_layer(dep_it))
                {
                    node_info.layer = UNINITIALIZED;
                    return;
                }
                layer = std::max(layer, dep_it->layer + 1);
            }
            node_info.layer = layer;
        }
    });

comment:6 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

I confirm, your suggestion works. Thanks!

Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify().

Consider adding reindex() method so that the code was straightforward:

deps_map.reindex();

instead of

index_by_name.modify(it,[](node_info_t&){});

In documentation for this method you could describe in what case it must be called.

comment:7 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

Update: it must be passed to reindex(). Otherwise it would be full sort.

comment:8 by Joaquín M López Muñoz, 7 years ago

Please confirm that my understanding is correct: the problem occurs because during reindexing modify() assumes that only *it element is changed, so it does not perform full sort (for performance reasons). If I'm correct, I would mention this in the documentation of modify().

Exactly, modify(it,mod) does only (possibly) relocate the element pointed to by it. In fact this is stated in the docs:

Effects: Calls mod(e) where e is the element pointed to by position and rearranges *position into all the indices of the multi_index_container [...]

As for how to document permissibly operations within mod(e), the (unstated) assumption is that the container is not messed with in any manner, and I'm very reluctant to provide extra guarantees beyond this. A safe compromise would be to add something like this in the Requires clause:

mod(e) does not invoke any operation of the multi_index_container except possibly only before e is directly modified, and only as long as position is not rendered invalid.

Does this sound sufficiently clear and unscary? By the way this makes the reindex trick legally invalid, but your code could be very easily rewritten to conform:

bool succeeded = deps_map.modify(it,
    [=](node_info_t& node_info)
    {
        if (node_info.deps.size() == 0)
        {
            node_info.layer = 0;
        }
        else
        {
            index_by_name_t& index_by_name = deps_map.get<by_name>();
            index_by_name.modify(it,[](node_info_t& node_info_alias){
                node_info_alias.layer = CALCULATION_IN_PROGRESS;
            });
            int layer = 0;
            for (const std::string& dep_name : node_info.deps)
            {
                index_by_name_t::iterator dep_it = index_by_name.find(dep_name);
                if (dep_it == deps_map.end())
                {
                    node_info.layer = UNINITIALIZED;
                    printf("Unknown dependency.\n");
                    return;
                }
                if (!calc_layer(dep_it))
                {
                    node_info.layer = UNINITIALIZED;
                    return;
                }
                layer = std::max(layer, dep_it->layer + 1);
            }
            node_info.layer = layer;
        }
    });

The key here is the qualifier directly in "before e is directly modified": you can change e indirectly through a recursive call to modify, as done in

index_by_name.modify(it,[](node_info_t& node_info_alias){
    node_info_alias.layer = CALCULATION_IN_PROGRESS;
});

in reply to:  8 comment:9 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

mod(e) does not invoke any operation of the multi_index_container except possibly only before e is directly modified, and only as long as position is not rendered invalid.

Not very clear. I had to re-read it several times and think for some time to get it. It's very conservative and I understand you. But multi_index is more powerful than allowed by this statement (it's ok to call find on unaffected indices, it's ok to use trick with re-indexing), at least at the moment :)

The problem was caused by modification of key fields of more than one element (and thus made their position invalid) before container got chance to re-position the latest one, and this may be done incorrectly.

So I would re-state it somehow more straightforward. How about this draft:

Modification of key fields in e makes corresponding indices temporarily broken, until modify() returns. If you need to restore them sooner than modify() returns (for example, for recursive calls to modify()) wrap modification of key fields in its own call to modify().

And/or give a link to some example where you could provide more verbose explanation of the problem and solution.

comment:10 by Joaquín M López Muñoz, 7 years ago

Modification of key fields in e makes corresponding indices temporarily broken, until modify() returns. If you need to restore them sooner than modify() returns (for example, for recursive calls to modify()) wrap modification of key fields in its own call to modify().

Not good :-) for a number of reasons (legalese follows):

  • There is no notion of "key field" in the library. The only concept defined is KeyExtractor which is more general as it doesn't relate to any specific data member (it can be, say, calculated, or made dependent on info indirectly associated to the element). Of course there is an intuitive understanding of what data members contribute to index invariants, but this is nowhere to be found in the reference docs.
  • "If you need to restore [the indices] sooner[...]": why should I need to restore them? It is not up to the user to figure it out, but to the reference to state the conditions in which the need arises.

Moreover, you can't really assume that any part of the container interface is usable once you modify the element. For instance, in invariant-checking mode any non-const member function will assert if there's some broken index, no matter which index you're actually using. Even allowing mod(e) to use only const member functions after modification limits my implementation leeway in the future.

comment:11 by Valentyn Shtronda <valiko.ua@…>, 7 years ago

There is no notion of "key field" in the library. The only concept defined is KeyExtractor

Yes, you can use "KeyExtractor" term instead of "key field", and re-formulate accordingly. It was only a draft of my vision about how to make the doc more obvious (and thus more helpful).

why should I need to restore them? Moreover, you can't really assume that any part of the container interface is usable once you modify the element.

Then instead of saying "for example", you can say "any part of the container interface". So, new draft can sound like this:

Modification of any field used by KeyExtractor temporarily breaks container integrity, until modify() returns. If you need to use any interface method before modify() returns, wrap modification of such fields in its own call to modify().

Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now. Thanks again!

comment:12 by Joaquín M López Muñoz, 7 years ago

Please note that I don't insist on modification the documentation in any way. You already helped me a lot and for me personally it's all clear now.

Well, finally I decided to go with my stricter wording:

https://github.com/boostorg/multi_index/commit/c05551632089a917c79f97383d69e2fd8171df27

Thank you for spotting this problem in the documentation, and good luck using Boost.MultiIndex.

Note: See TracTickets for help on using tickets.