Opened 12 years ago
Last modified 5 years ago
#4704 assigned Feature Requests
Support for multicapture and balancing groups
Reported by: | Owned by: | Eric Niebler | |
---|---|---|---|
Milestone: | To Be Determined | Component: | xpressive |
Version: | Boost 1.44.0 | Severity: | Not Applicable |
Keywords: | xpressive | Cc: |
Description
Feature request ticked to jog Eric Nieblers memory to take a look at some code.
I've added support for multicapture and balancing groups to Boost::Xpressive.
Syntax for pop capture:
dynamic: (?P<-name>stuff)
static: (name -= stuff)
Syntax for capture conditional:
dynamic: (?P(name)stuff)
static: (name &= stuff)
The changes are in the vault and can be found here: http://tinyurl.com/3aak7mp
It can be unpacked against trunk from 2010-10-02 or the 1.44.0 release. I've run the dynamic regression tests without errors and I have added some tests for the new functionality. The code it only tested on Visual Studio 2010 since I don't have access to any other compiler.
Erik Rydgren
Change History (8)
comment:1 by , 12 years ago
Status: | new → assigned |
---|
comment:2 by , 12 years ago
comment:3 by , 12 years ago
In parser_traits.hpp:
case BOOST_XPR_CHAR_(char_type, 'P'): this->eat_ws_(++begin, end); BOOST_XPR_ENSURE_(begin != end, error_paren, "incomplete extension"); switch(*begin) { case BOOST_XPR_CHAR_(char_type, '<'): this->eat_ws_(++begin, end); switch (*begin) ^^^^^^^^^^^^^^^ Whoops! Doesn't check that begin != end before dereferencing begin!
follow-up: 6 comment:4 by , 12 years ago
There's probably a simple answer, but why did you change the handling of token_named_mark
to allow the same capture name to be reused? Previously it was an error.
follow-up: 7 comment:5 by , 12 years ago
Oh wait! I just realized from looking at your change to mark_end_matcher
that you are always populating the captures
member of the sub_match
struct (for non-hidden captures). This will have disastrous performance implications. There cannot potentially be an allocation every time you match a marked sub-expression. I can't accept this change.
I suggest you find a way to make this behavior opt-in.
comment:6 by , 12 years ago
Replying to eric_niebler:
There's probably a simple answer, but why did you change the handling of
token_named_mark
to allow the same capture name to be reused? Previously it was an error.
I thought it to be nice to have the same capabilities in a dynamic expression as in static one. A static expression (s1 = whatever) << (s1 = whatever) is perfectly legal but the dynamic compiler frowned upon (?P<name> whatever) (?P<name> whatever). Now they behave the same. It makes it hard to figure out the resulting group numbers though, so these constructs are really only useful when using named multi captures. You can capture on several places in the expression to the same named stack of captures. Can be useful at times.
comment:7 by , 12 years ago
Replying to eric_niebler:
Oh wait! I just realized from looking at your change to
mark_end_matcher
that you are always populating thecaptures
member of thesub_match
struct (for non-hidden captures). This will have disastrous performance implications. There cannot potentially be an allocation every time you match a marked sub-expression. I can't accept this change.I suggest you find a way to make this behavior opt-in.
I have now uploaded an updated version of the multicapture functionality. Multicapture is now opt-in with very little impact on performance if not used.
Some documentation about multicapture and the new syntax constructs is added and I've also gone over the changes to make sure it follows the same coding standard as the rest of the code.
I also changed the mark conditional construct to accept alternates, (?P(name)this|that|andwhatnot).
The new syntax for static pop capture required some extra code in the grammar, is_pure and width_of. I would be grateful for some extra reviewing in those areas since I am uncertain that I fully grasp all the details about what's going on there.
The new code can be reached through the same URL as the old version.
The capture conditional:
Looking at the implementation of this in parse_group, "stuff" is parsed by calling parse_sequence. Is it really the case that stuff can not have alternates? Can I not say "(?P(name)this|that)"? If so, this would be the only grouping construct that does not permit alternates. I have a bad feeling about that. Note that your implementation of that feature ends up returning early---the only one to do so---and doesn't reset the traits flags. In short, this feels like a great big ugly hack to me. Can you clean this up?
Also, would be very nice if you followed my coding convention. Eg:
not this:
Finally, all these new features need documentation.
Still reviewing your code, more comments to come I'm sure. Don't be discouraged, I still am very happy to have your contribution and have every intention of accepting it, once the kinks are worked out.