Opened 6 years ago
#12619 new Bugs
Boost.Regex partial_match fails (see also Ticket #11776 feature request)
Reported by: | Owned by: | John Maddock | |
---|---|---|---|
Milestone: | To Be Determined | Component: | regex |
Version: | Boost 1.61.0 | Severity: | Problem |
Keywords: | partial_match | Cc: |
Description
Boost.Regex is a great library that we use extensively. I am re-raising Ticket 11776 as a bug. The partial_match
implementation is broken because regex repetitions (*, +) may behave lazy or greedy depending on input text buffer size. This is very unfortunate, because partial_match
provides the only possible mechanism to search streaming input text without buffering the entire text. To restrict the regex to simple forms that do not include repetitions (*, +) is not a viable workaround. There are use cases in which we must take interactive input (i.e. buffering one char at a time) or take large files in which the pattern searched may not fit in the current buffer allocated, thus not producing the longest match, and worse we don't know if the buffer must be enlarged to continue iterating to find the longest match.
The correct partial_match
algorithm should consider that as long as backtracking on a repetition pattern in the regex is still possible given some partial input text, Boost.Regex should flag the result as a partial match instead of a full match.. With this change, matching "abc.*123
" may require the whole input, but in this case that is OK! We need this flexibility of the matcher with a buffering approach.
Unfortunately, the suggested workaround by the Boost.Regex documentation to check if the pattern matched the input up to the buffer end (which indicates a partial match) does not always work.
Small example to demonstrate the issue