Opened 6 years ago

Last modified 5 years ago

#12859 new Feature Requests

std::string_view as token

Reported by: Johel Ernesto Guerrero Peña <johelegp@…> Owned by: jsiek
Milestone: To Be Determined Component: tokenizer
Version: Boost 1.63.0 Severity: Optimization
Keywords: string_view Cc:

Description

When I use std::string_view as a token for the tokenizer, I get an error because it attempts to construct a std::string_view with two iterators, which it doesn't support.

It'd be very nice to support this efficiency-enhacing type by, perhaphs, constructing the token from the underlying character array and a size when two iterators would be ill-formed.

Change History (5)

comment:1 by damian.meden@…, 6 years ago

Hi, please provide more information on what you are trying to achieve? a code sample of would be great(TokenizerFunc, etc). There are few things you can do with a string_view, but I do not want to do any answer before seeing what you are trying to.

Thanks Dam.

comment:2 by Johel Ernesto Guerrero Peña <johelegp@…>, 6 years ago

This is my particular use case:

auto tokenize(std::string_view to_tokenize, const char* separators)
{
    using TokenizerFunction = boost::char_separator<char>;
    using Tokenizer         = boost::tokenizer<
        TokenizerFunction, std::string_view::iterator, std::string_view>;

    return Tokenizer{to_tokenize, TokenizerFunction{separators}};
}

I want the Tokenizer to recognize that the token type is not constructible from two iterators, and try to construct it from a {pointer,size} pair instead.

comment:3 by Dam <damian.meden@…>, 5 years ago

Few things here,

Base on what the tokenizer is, which basically it's a non owning helper that relies on the fact that someone else owns the std::string, and the connection between the owner and the tokenizer class is a pure pair of iterator, this make me think that it can be a little bit ambiguous, like, adding one layer between the std::string and the tokenizer, so instead of : Std::string -> boost::tokenizer now it will become std::string -> std::string_view -> boost::tokenizer. See my point? This could be a good discussion, I would like to know your thoughts about this. Also, from top of my mind I can think that this is still in most compilers experimental features, so that may be an issue to deal with at boost level. I would check that deep.

Anyway, I did play a little with this and I came up with some POC, which can’t be added to boost anyway as it wasn’t properly tested, but at least it helped me to see how it may look like. You can check the diff here -> https://github.com/boostorg/tokenizer/compare/develop...dmeden:not_safe_string_view_use_v1

Tested with GCC 6.3.

Again, this code is not boost ready yet, there are many things(iterators, etc) that needs to be tested and validated that I didn’t put it into the code yet.

Thanks, Dam

comment:4 by Johel Ernesto Guerrero Peña <johelegp@…>, 5 years ago

I'm starting to think that its better to leave the code as-is. This problem might be better solved elsewhere, like a std::string_view wrapper constructible from a pair of iterators. Does this answer your RFC?

comment:5 by Dam <damian.meden@…>, 5 years ago

There was a discussion about that (having a constructor that takes a pair of iterators) in the standard:

Sure you can wrap it it to cover your needs. You can have a look at that thread and follow some notes they've exposed.

Thanks,

Damian.

Note: See TracTickets for help on using tickets.