Opened 12 years ago

Last modified 5 years ago

#4704 assigned Feature Requests

Support for multicapture and balancing groups

Reported by: erik@… Owned by: Eric Niebler
Milestone: To Be Determined Component: xpressive
Version: Boost 1.44.0 Severity: Not Applicable
Keywords: xpressive Cc:

Description

Feature request ticked to jog Eric Nieblers memory to take a look at some code.

I've added support for multicapture and balancing groups to Boost::Xpressive.

Syntax for pop capture:

dynamic: (?P<-name>stuff)
static: (name -= stuff)

Syntax for capture conditional:

dynamic: (?P(name)stuff)
static: (name &= stuff)

The changes are in the vault and can be found here: http://tinyurl.com/3aak7mp

It can be unpacked against trunk from 2010-10-02 or the 1.44.0 release. I've run the dynamic regression tests without errors and I have added some tests for the new functionality. The code it only tested on Visual Studio 2010 since I don't have access to any other compiler.

Erik Rydgren

Change History (8)

comment:1 by Eric Niebler, 12 years ago

Status: newassigned

comment:2 by Eric Niebler, 12 years ago

The capture conditional:

dynamic: (?P(name)stuff)

Looking at the implementation of this in parse_group, "stuff" is parsed by calling parse_sequence. Is it really the case that stuff can not have alternates? Can I not say "(?P(name)this|that)"? If so, this would be the only grouping construct that does not permit alternates. I have a bad feeling about that. Note that your implementation of that feature ends up returning early---the only one to do so---and doesn't reset the traits flags. In short, this feels like a great big ugly hack to me. Can you clean this up?

Also, would be very nice if you followed my coding convention. Eg:

   if(this)
   {
       // 4 space indent
   }

not this:

   if (this) {
     // 2 space indent
   }

Finally, all these new features need documentation.

Still reviewing your code, more comments to come I'm sure. Don't be discouraged, I still am very happy to have your contribution and have every intention of accepting it, once the kinks are worked out.

comment:3 by Eric Niebler, 12 years ago

In parser_traits.hpp:

            case BOOST_XPR_CHAR_(char_type, 'P'):
                this->eat_ws_(++begin, end);
                BOOST_XPR_ENSURE_(begin != end, error_paren, "incomplete extension");
                switch(*begin)
                {
                case BOOST_XPR_CHAR_(char_type, '<'):
                    this->eat_ws_(++begin, end);
                    switch (*begin)
                    ^^^^^^^^^^^^^^^ 
Whoops! Doesn't check that begin != end before dereferencing begin!

comment:4 by Eric Niebler, 12 years ago

There's probably a simple answer, but why did you change the handling of token_named_mark to allow the same capture name to be reused? Previously it was an error.

comment:5 by Eric Niebler, 12 years ago

Oh wait! I just realized from looking at your change to mark_end_matcher that you are always populating the captures member of the sub_match struct (for non-hidden captures). This will have disastrous performance implications. There cannot potentially be an allocation every time you match a marked sub-expression. I can't accept this change.

I suggest you find a way to make this behavior opt-in.

in reply to:  4 comment:6 by Erik Rydgren <erik@…>, 12 years ago

Replying to eric_niebler:

There's probably a simple answer, but why did you change the handling of token_named_mark to allow the same capture name to be reused? Previously it was an error.

I thought it to be nice to have the same capabilities in a dynamic expression as in static one. A static expression (s1 = whatever) << (s1 = whatever) is perfectly legal but the dynamic compiler frowned upon (?P<name> whatever) (?P<name> whatever). Now they behave the same. It makes it hard to figure out the resulting group numbers though, so these constructs are really only useful when using named multi captures. You can capture on several places in the expression to the same named stack of captures. Can be useful at times.

in reply to:  5 comment:7 by Erik Rydgren <erik@…>, 12 years ago

Replying to eric_niebler:

Oh wait! I just realized from looking at your change to mark_end_matcher that you are always populating the captures member of the sub_match struct (for non-hidden captures). This will have disastrous performance implications. There cannot potentially be an allocation every time you match a marked sub-expression. I can't accept this change.

I suggest you find a way to make this behavior opt-in.

I have now uploaded an updated version of the multicapture functionality. Multicapture is now opt-in with very little impact on performance if not used.

Some documentation about multicapture and the new syntax constructs is added and I've also gone over the changes to make sure it follows the same coding standard as the rest of the code.

I also changed the mark conditional construct to accept alternates, (?P(name)this|that|andwhatnot).

The new syntax for static pop capture required some extra code in the grammar, is_pure and width_of. I would be grateful for some extra reviewing in those areas since I am uncertain that I fully grasp all the details about what's going on there.

The new code can be reached through the same URL as the old version.

comment:8 by matt.c, 5 years ago

Did this ever get integrated?

Note: See TracTickets for help on using tickets.