Opened 10 years ago

Last modified 6 years ago

#7630 assigned Bugs

Range adaptors do not play nicely with range-based for loops

Reported by: gromer@… Owned by: Neil Groves
Milestone: To Be Determined Component: range
Version: Boost 1.51.0 Severity: Problem
Keywords: range Cc: jyasskin@…, roman.perepelitsa@…, mikhailberis@…

Description

Consider the following C++11 code, adapted from the Range documentation:

std::vector<int> vec;
for (int val : vec | boost::adaptors::reversed
                   | boost::adaptors::uniqued) {
  // Do stuff with val
}

The behavior of this natural-seeming code is actually undefined, due to a dangling reference: per the C++ standard, it is equivalent to

{
  auto && __range = (vec | boost::adaptors::reversed
                         | boost::adaptors::uniqued);
  for ( auto __begin = __range.begin(),
             __end = __range.end();
        __begin != __end;
        ++__begin ) {
    int val = *__begin;
    // Do stuff with val
  }
}

The problem is that the value returned by the subexpression vec | boost::adaptors::reversed is a temporary, so its lifetime ends at the end of the statement containing it, namely the declaration of __range. Thus, __range is left holding a dangling reference to the range it's adapting.

The fix is for each range adaptor to use an rvalue-reference overload to detect whether the input range is a temporary, and if so, move it into the adaptor (i.e. with std::move) in order to extend its lifetime.

Attachments (1)

boost-range.patch (24.8 KB ) - added by Jeffrey Yasskin <jyasskin@…> 9 years ago.
Patch demonstrating the fix on boost::adaptors::filtered

Download all attachments as: .zip

Change History (20)

comment:1 by Jeffrey Yasskin <jyasskin@…>, 10 years ago

Cc: jyasskin@… added

comment:2 by roman.perepelitsa@…, 10 years ago

There is one problem with the proposed solution: return types for all range adaptors are documented. For example, both vec | reversed and and vector<int>() | reversed are specified to have type reversed_range<vector<int>>, although in the first case the adaptor should store just a reference to the vector and in the second case it should have the vector stored by value. This can be done by storing variant<vector<int>*, vector<int>> in reversed_range<vector<int>>, but it has space and performance overhead.

If return types of range adaptors weren't specified, the problem would not exist because vec | reversed and and vector<int>() | reversed could have different types, each implemented with zero overhead. This is a cleaner design, because specific types of the adaptors aren't of much use and can always be deduced with decltype, auto or even erased. Unfortunately, this would be a breaking change.

comment:3 by roman.perepelitsa@…, 10 years ago

Cc: roman.perepelitsa@… added
Keywords: range added

comment:4 by Dean Michael Berris, 9 years ago

Cc: mikhailberis@… added

comment:5 by gromer@…, 9 years ago

One point of clarification, since it confused even me when I reread this: operator | is left-associative, so the adaptor expression is equivalent to ((vec | boost::adaptors::reversed) | boost::adaptors::uniqued). The temporary returned by the whole expression gets bound to __range, and therefore has its lifetime extended to match __range, but the temporary returned by the subexpression vec | boost::adaptors::reversed is not bound to a reference in the scope of the loop, so its lifetime is not extended.

in reply to:  2 comment:6 by Jeffrey Yasskin <jyasskin@…>, 9 years ago

Replying to roman.perepelitsa@…:

There is one problem with the proposed solution: return types for all range adaptors are documented.

Even though the adaptor return types are documented, they're not accurate. For example, http://www.boost.org/doc/libs/1_53_0/libs/range/doc/html/range/reference/adaptors/reference/filtered.html says it returns a "boost::filtered_range<typeof(rng)>" (what is "typeof"? :-P), but it actually returns filtered_range<Pred, Rng>. adjacent_filtered, filtered, copied, replaced_if, and transformed list the wrong return types. indexed, indirected, map_keys, map_values, replaced, reversed, sliced, type_erased, and uniqued list the right ones. strided and tokenized don't list a return type.

The natural return type for the fixed adapters is to have, say, lvalue|reversed return a reversed_range<Rng&> and rvalue|reversed return a reversed_range<Rng>, since that's what the natural function template deduction will produce. I could instead have lvalue|reversed return reversed_range<Rng> and rvalue|reversed return reversed_range<Rng&&> in order to maintain compatibility for the safe existing uses of these type names.

I would prefer to use the natural types because then any utilities will be useful for non-Boost consumers.

Thoughts?

by Jeffrey Yasskin <jyasskin@…>, 9 years ago

Attachment: boost-range.patch added

Patch demonstrating the fix on boost::adaptors::filtered

comment:7 by Jeffrey Yasskin <jyasskin@…>, 9 years ago

Note that the attached patch is only the fix for the filtered adaptor. I expect to make any changes you want there, and then extend it to the rest of the adaptors after the design is right.

comment:8 by pbazant@…, 9 years ago

Is the original problem really a problem? The following links suggests otherwise:

http://stackoverflow.com/questions/9657708/c11-the-range-based-for-statement-range-init-lifetime

comment:9 by anonymous, 9 years ago

Yes, this really is a problem. See comment 5 above: the issue isn't the lifetime of the value of ((vec | boost::adaptors::reversed) | boost::adaptors::uniqued), it's the lifetime of the subexpression (vec | boost::adaptors::reversed), which is not bound to a local reference, and so does not have its lifetime extended.

comment:10 by Igor Lubashev <ilubashe@…>, 9 years ago

See #9578 for a related issue, related to BOOST_FOREACH and adaptors.

I view comment:2 regarding the documented types as a very minor issue. Since the application of adaptors to r-values has previously resulted in undefined behavior, changing the data type returned for the application of adaptors to r-values should not break almost any properly working code that is relying on this documentation.

The only kind of working code that could rely on this would be something like foo(map<int,int>() | map_keys). In such cases, I would expect that foo is actually declared as a template that automatically captures the range type, so no harm done.

comment:11 by Nathan Ridge, 9 years ago

I reviewed this patch over email (shortly after it was posted) and made some (minor) suggestions. I'm posting them here now so that if someone would like to pick up the work on this bug, they are able to:

  • In the constructor of the primary template of extend_temporary, we know that the constructor argument will be an rvalue, right? So, would it make sense to use std::move instead of std::forward?

  • Given that the primary template of extend_temporary will never be used in C++98 mode (i.e., with no rvalue references), why even define it?

  • Regarding the change to range_iterator to have it accept a reference to a range as an input type:

I think has_range_iterator should be updated correspondingly (notice how it uses range_mutable_iterator and range_const_iterator directly right now).

While most range metafunctions go through range_iterator, and will therefore also start working on references to ranges, there is a partial specialization of range_size in boost/range/size_type.hpp that will break if its argument is a reference to a range. I think that should be updated too for consistency.

Since these changes can stand on their own, I think they make sense as an independent patch. In fact, such a patch would resolve https://svn.boost.org/trac/boost/ticket/7885 .

  • Regarding the choice of range adaptor return type: I understand that

adaptor<Range&> for lvalue ranges and

adaptor<Range> for rvalue ranges

is what falls naturally out of template argument deduction. However, would something that makes the lvalue/rvalueness of the wrapped ranges more explicit be more readable? For example,

adaptor<Range&> for lvalue ranges and

adaptor<Range&&> for rvalue ranges

or

adaptor<Range, lvalue_tag> for lvalue ranges and

adaptor<Range, rvalue_tag> for rvalue ranges?

comment:12 by Neil Groves, 9 years ago

Status: newassigned

comment:13 by Neil Groves, 9 years ago

I don't think the provided example does suffer undefined behaviour. It doesn't matter that the subexpression lifetime is short since the subexpression was used as input to the outer expression. The outer expression holds iterators by value by copying them. It doesn't hold onto the source range as stated in the report.

If one were to have a slightly different example with, for example, a vector returned by value from a function and then directly passed into the adaptors this would be undefined. I'm not convinced that this use case is worth the general penalty.

in reply to:  13 comment:14 by Nathan Ridge, 9 years ago

Replying to neilgroves:

I don't think the provided example does suffer undefined behaviour. It doesn't matter that the subexpression lifetime is short since the subexpression was used as input to the outer expression. The outer expression holds iterators by value by copying them. It doesn't hold onto the source range as stated in the report.

If one were to have a slightly different example with, for example, a vector returned by value from a function and then directly passed into the adaptors this would be undefined.

Correct.

I'm not convinced that this use case is worth the general penalty.

I have run into this situation, and been bitten by the undefined behaviour, on a number of occasions. Move constructors are generally cheap, so I believe the general penalty for fixing this would be low.

comment:15 by Neil Groves, 9 years ago

Okay let me think about this some more. It is an interesting and challenging problem. For many use cases I have the move penalty is completely unacceptable due to cache effects. If I can be sure I can create a zero overhead solution I'll do it. Obviously if anyone else beats me to an answer I'll happily accept it. My interpretation of the current attached example is that it incurs a space and speed overhead even when not using rvalue references.

comment:16 by Neil Groves, 9 years ago

I think I have a solution. I'm starting prototypes and will attempt to apply this to some large codebases to ensure that backward compatibility is maximised.

I believe we can alter the return types to have an additional template paramter or N template parameters (to be decided) that describe some additional range category decoration. These will be used to determine if the range should be held / moved. By using a default template paramter(s) for these new parameters we can maintain old behaviour for old code while allowing new code to work in the manner requested. I believe that this fits nicely with other new features that I intend to release.

comment:17 by Dean Michael Berris, 8 years ago

Is the fix ready? Can we test it somewhere? Or at least see what the test is so we can confirm whether it's actually fixed?

comment:18 by rob.desbois@…, 6 years ago

What's the current status of this? I have scoured Boost Trac and mailing lists recently and still feel uncertain whether the described pattern is generally safe. My typical use case is slightly different, though suffers from the same potentially-undefined behaviour: functions which return a range produced by piping a container through some adaptor chain.

Looking through the code for some of these adaptors my best understanding is that it is safe based on Neil's comment:13 regarding copying of underlying iterators. I don't feel comfortable with that though: what if the implementation changes?.

Perhaps while there is no fix (if there still isn't) could a summary of the issue and some advice be given in the Boost.Range docs? Even if it simply says "Here's the possible issue, but the implementation all of Boost's range adaptors means this is never UB" that would suffice: this is too concerning an issue regarding a common usage pattern to have no official recommendation for IMHO.

in reply to:  18 comment:19 by Nathan Ridge, 6 years ago

Replying to rob.desbois@…:

What's the current status of this?

My impression is that the C++ community's thinking about Ranges in C++ has evolved a lot in the past couple of years. I think the state-of-the-art Ranges library is no longer Boost.Range, but Eric Niebler's Range-v3, which is now standards-track (forming the basis of the C++ Ranges TS).

Ranges-v3 does things a bit differently from Boost.Range, and solves this issue (among others).

I'm not sure what the plan is for Boost.Range going forward. We could possibly borrow some ideas from Range-v3, such as its approach for solving this problem, and put them into Boost.Range. Or Range-v3 itself could be brought into Boost (not sure what Eric's plans are in that regard).

Note: See TracTickets for help on using tickets.