id,summary,reporter,owner,description,type,status,milestone,component,version,severity,resolution,keywords,cc
11776,Effective way to find all regex matches in large file,der-storch-85@…,John Maddock,"Finding all regexes in a file via boost::regex_iterator is a very complicated task as you can normally not load the whole file into a buffer (could be too large).

A possible solution is presented in the documentation of regular expressions in section [http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/boost_regex/partial_matches.html Partial Matches], see the second example.

Unfortunately, it is not correct: Consider a file with content ""12abc"", a regex ""[a-z]+"", and a buffer size of 4. This would result in the matches ab and c, but should be abc. The first match is not partial and touches the end of the buffer. Increasing the buffer size does not solve the problem in general, and with more complex regexes it even gets worse. Another example: same as earlier except with regex ""[a-z]{2,}"" (i. e. words with at least two letters), what results in one match ab, but should be abc.

The easiest solution seems to be to add a new match flag (“range_incomplete” or “input_incomplete” (?)), that checks if the beginning of the current match and the end of the buffer build a partial or full match. In that case this “match” should be marked to the user as possibly incomplete (e. g. by the already existing member sub_match::matched). There probably exist better solutions.

If you do not want to or cannot follow this feature request, I ask you at least to update the discussed 2nd example in the partial matches section. Thanks!",Feature Requests,reopened,To Be Determined,regex,Boost 1.59.0,Optimization,,,