Opened 17 years ago

Closed 17 years ago

#521 closed Bugs (Invalid)

[Regex] Splitting string: last empty token is "compressed"

Reported by: nobody Owned by: John Maddock
Milestone: Component: regex
Version: None Severity:
Keywords: Cc:

Description

toto_lists@hotbox.ru
Last empty token is "compressed" when using regex_token_iterator for splitting strings.
BCB 5.64, boost 1.33.0
See the example, compare it with string_algo splitting:

#include <boost\regex.hpp>
#include <string>
#include <vector>
#include <iostream>
#include <boost/algorithm/string/finder.hpp>
#include <boost/algorithm/string/find_iterator.hpp>

void with_stringAlgo() {
  using namespace std;

  vector<string> result;
  string to_split("&|&_field2_&|&_field3_&|&");

  boost::split_iterator<string::iterator>
    i(to_split.begin(),to_split.end(),boost::first_finder("&|&")),j;
  for(;i!=j;++i) {
    result.push_back(boost::copy_range<std::string>(*i));
  }

  cout<<"size is "<<result.size()<<endl;
  for(int i=0;i<result.size();++i)
    cout<<result[i]<<endl;
}

void with_regex() {
  using namespace std;
  using namespace boost;
  using namespace boost::regex_constants;
  string s("&|&_field2_&|&_field3_&|&");

  boost::regex r("&\\|&");//use &|& as delimiter
  boost::sregex_token_iterator i(s.begin(),s.end(),r,-1,
    match_default),j;
  //
  vector<string> v;
  copy(i,j,back_inserter(v));
  //
  cout<<"size is "<<v.end()-v.begin()<<endl;
   copy(v.begin(),v.end(),ostream_iterator<string>(cout,"\n"));
}

int main() {
  with_stringAlgo();
  with_regex();
}

Change History (1)

comment:1 by John Maddock, 17 years ago

Status: assignedclosed
Logged In: YES 
user_id=14804

It's by design, and that's the way we codified things in the
C++ Standard Technical Report 1 (TR1) so it's not going to
change now, unless the TR1 does of course.

The rational is that you often have a series of fields each
of which is terminated by a specific string.  In this case
you want an empty last field to be suppressed.

John.
Note: See TracTickets for help on using tickets.