Index: libs/tokenizer/introduc.htm =================================================================== --- libs/tokenizer/introduc.htm (revision 50620) +++ libs/tokenizer/introduc.htm (working copy) @@ -16,9 +16,9 @@
The boost Tokenizer package provides a flexible and easy to - use way to break of a string or other character sequence into a series of - tokens. Below is a simple example that will break up a phrase into +
The Boost Tokenizer package provides a flexible and + easy-to-use way to break a string or other character sequence into a series + of tokens. Below is a simple example that will break up a phrase into words.
You can choose how the string gets broken up. You do this - by specifying the TokenizerFunction. If you do not specify anything, the - default TokenizerFunction is char_delimiters_separator<char> which +
You can choose how the string gets parsed by using the + TokenizerFunction. If you do not specify anything, the default + TokenizerFunction is char_delimiters_separator<char> which defaults to breaking up a string based on space and punctuation. Here is an - example of using another TokenizerFunction called escaped_list_separator. - This TokenizerFunction parses a superset of comma separated value (csv) - lines. The format looks like this
+ example using another TokenizerFunction called + escaped_list_separator. This TokenizerFunction parses a superset + of comma-separated value (CSV) lines. The format looks like this:Field 1,"putting quotes around fields, allows commas",Field 3
Below is an example that will break the previous line into - its 3 fields
+ its three fields.@@ -73,14 +73,12 @@
Finally, for some TokenizerFunctions you have to pass in +
Finally, for some TokenizerFunctions you have to pass something into the constructor in order to do anything interesting. An - example is offset_separator. This class breaks a string into tokens based - on offsets for example
+ example is the offset_separator. This class breaks a string into tokens based + on offsets. For example, when 12252001 is parsed using offsets of + 2,2,4 it becomes 12 25 2001. Below is the code used. -12252001 when parsed using offsets of 2,2,4 becomes 12 25 - 2001. Below is an example to parse this.
-