| 1 | [/==============================================================================
|
|---|
| 2 | Copyright (C) 2001-2008 Joel de Guzman
|
|---|
| 3 | Copyright (C) 2001-2008 Hartmut Kaiser
|
|---|
| 4 |
|
|---|
| 5 | Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|---|
| 6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|---|
| 7 | ===============================================================================/]
|
|---|
| 8 |
|
|---|
| 9 | [section Introduction]
|
|---|
| 10 |
|
|---|
| 11 | Boost Spirit is an object-oriented, recursive-descent parser and output generation
|
|---|
| 12 | library for C++. It allows you to write grammars and format descriptions using a
|
|---|
| 13 | format similar to EBNF (Extended Backus Naur Form, see [4]) directly in
|
|---|
| 14 | C++. These inline grammar specifications can mix freely with other C++ code and,
|
|---|
| 15 | thanks to the generative power of C++ templates, are immediately executable.
|
|---|
| 16 | In retrospect, conventional compiler-compilers or parser-generators have to
|
|---|
| 17 | perform an additional translation step from the source EBNF code to C or C++
|
|---|
| 18 | code.
|
|---|
| 19 |
|
|---|
| 20 | The syntax and semantics of the libraries' API directly form domain-specific
|
|---|
| 21 | embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
|
|---|
| 22 | user:
|
|---|
| 23 |
|
|---|
| 24 | * one for creating parser grammars,
|
|---|
| 25 | * one for the specification of the required tokens to be used for parsing,
|
|---|
| 26 | * and one for the description of the required output formats.
|
|---|
| 27 |
|
|---|
| 28 | Since the target input grammars and output formats are written entirely in C++
|
|---|
| 29 | we do not need any separate tools to compile, preprocess or integrate those
|
|---|
| 30 | into the build process. __spirit__ allows seamless integration of the parsing
|
|---|
| 31 | and output generation process with other C++ code. Often this allows for
|
|---|
| 32 | simpler and more efficient code.
|
|---|
| 33 |
|
|---|
| 34 | Both the created parsers and generators are fully attributed which allows you to
|
|---|
| 35 | easily build and handle hierarchical data structures in memory. These data
|
|---|
| 36 | structures resemble the structure of the input data and can directly be used to
|
|---|
| 37 | generate arbitrarily-formatted output.
|
|---|
| 38 |
|
|---|
| 39 | The [link spirit.spiritstructure figure] below depicts the overall structure
|
|---|
| 40 | of the Boost Spirit library. The library consists of 4 major parts:
|
|---|
| 41 |
|
|---|
| 42 | * __classic__: This is the almost-unchanged code base taken from the
|
|---|
| 43 | former Boost Spirit V1.8 distribution. It has been moved into the namespace
|
|---|
| 44 | boost::spirit::classic. A special compatibility layer has been added to
|
|---|
| 45 | ensure complete compatibility with existing code using Spirit V1.8.
|
|---|
| 46 | * __qi__: This is the parser library allowing you to build recursive
|
|---|
| 47 | descent parsers. The exposed domain-specific language can be used to describe
|
|---|
| 48 | the grammars to implement, and the rules for storing the parsed information.
|
|---|
| 49 | * __lex__: This is the library usable to create tokenizers (lexers). The
|
|---|
| 50 | domain-specific language exposed by __lex__
|
|---|
| 51 | * __karma__: This is the generator library allowing you to create code for
|
|---|
| 52 | recursive descent, data type-driven output formatting. The exposed
|
|---|
| 53 | domain-specific language is almost equivalent to the parser description language
|
|---|
| 54 | used in __qi__, except that it is used to describe the required output
|
|---|
| 55 | format to generate from a given data structure.
|
|---|
| 56 |
|
|---|
| 57 | [fig ./images/spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
|
|---|
| 58 |
|
|---|
| 59 | The separate sublibraries __qi__, __karma__ and __lex__ are well integrated
|
|---|
| 60 | with any of the other parts. Because of their similar structure and identical
|
|---|
| 61 | underlying technology these are usable either separately or together at the
|
|---|
| 62 | same time. For instance is it possible to directly feed the hierarchical data
|
|---|
| 63 | structures generated by __qi__ into output generators created using __karma__;
|
|---|
| 64 | or to use the token sequence generated by __lex__ as the input for a parser
|
|---|
| 65 | generated by __qi__.
|
|---|
| 66 |
|
|---|
| 67 |
|
|---|
| 68 | The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
|
|---|
| 69 | some input being converted to some internal representation. After some
|
|---|
| 70 | (optional) transformation these data are converted back into some different,
|
|---|
| 71 | external representation. The picture highlights Spirit's place in this data
|
|---|
| 72 | transformation flow.
|
|---|
| 73 |
|
|---|
| 74 | [fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
|
|---|
| 75 |
|
|---|
| 76 | [heading A Quick Overview of Parsing with __qi__]
|
|---|
| 77 |
|
|---|
| 78 | __qi__ is Spirit's sublibrary dealing with generating parsers based on a given
|
|---|
| 79 | target grammar (essentially a format description of the input data to read).
|
|---|
| 80 |
|
|---|
| 81 | A simple EBNF grammar snippet:
|
|---|
| 82 |
|
|---|
| 83 | group ::= '(' expression ')'
|
|---|
| 84 | factor ::= integer | group
|
|---|
| 85 | term ::= factor (('*' factor) | ('/' factor))*
|
|---|
| 86 | expression ::= term (('+' term) | ('-' term))*
|
|---|
| 87 |
|
|---|
| 88 | is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
|
|---|
| 89 | code snippet:
|
|---|
| 90 |
|
|---|
| 91 | group = '(' >> expression >> ')';
|
|---|
| 92 | factor = integer | group;
|
|---|
| 93 | term = factor >> *(('*' >> factor) | ('/' >> factor));
|
|---|
| 94 | expression = term >> *(('+' >> term) | ('-' >> term));
|
|---|
| 95 |
|
|---|
| 96 | Through the magic of expression templates, this is perfectly valid and
|
|---|
| 97 | executable C++ code. The production rule `expression` is, in fact, an object that
|
|---|
| 98 | has a member function `parse` that does the work given a source code written in
|
|---|
| 99 | the grammar that we have just declared. Yes, it's a calculator. We shall
|
|---|
| 100 | simplify for now by skipping the type declarations and the definition of the
|
|---|
| 101 | rule `integer` invoked by `factor`. Now, the production rule `expression` in our
|
|---|
| 102 | grammar specification, traditionally called the `start` symbol, can recognize
|
|---|
| 103 | inputs such as:
|
|---|
| 104 |
|
|---|
| 105 | 12345
|
|---|
| 106 | -12345
|
|---|
| 107 | +12345
|
|---|
| 108 | 1 + 2
|
|---|
| 109 | 1 * 2
|
|---|
| 110 | 1/2 + 3/4
|
|---|
| 111 | 1 + 2 + 3 + 4
|
|---|
| 112 | 1 * 2 * 3 * 4
|
|---|
| 113 | (1 + 2) * (3 + 4)
|
|---|
| 114 | (-1 + 2) * (3 + -4)
|
|---|
| 115 | 1 + ((6 * 200) - 20) / 6
|
|---|
| 116 | (1 + (2 + (3 + (4 + 5))))
|
|---|
| 117 |
|
|---|
| 118 | Certainly we have done some modifications to the original EBNF syntax. This is
|
|---|
| 119 | done to conform to C++ syntax rules. Most notably we see the abundance of
|
|---|
| 120 | shift >> operators. Since there are no 'empty' operators in C++, it is simply
|
|---|
| 121 | not possible to write something like:
|
|---|
| 122 |
|
|---|
| 123 | a b
|
|---|
| 124 |
|
|---|
| 125 | as seen in math syntax, for example, to mean multiplication or, in our case,
|
|---|
| 126 | as seen in EBNF syntax to mean sequencing (b should follow a). Spirit
|
|---|
| 127 | uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
|
|---|
| 128 | with arrows pointing to the right, to mean "is followed by". Thus we write:
|
|---|
| 129 |
|
|---|
| 130 | a >> b
|
|---|
| 131 |
|
|---|
| 132 | The alternative operator `|` and the parentheses `()` remain as is. The
|
|---|
| 133 | assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
|
|---|
| 134 | the Kleene star `*` which used to be a postfix operator in EBNF becomes a
|
|---|
| 135 | prefix. Instead of:
|
|---|
| 136 |
|
|---|
| 137 | a* //... in EBNF syntax,
|
|---|
| 138 |
|
|---|
| 139 | we write:
|
|---|
| 140 |
|
|---|
| 141 | *a //... in Spirit.
|
|---|
| 142 |
|
|---|
| 143 | since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
|
|---|
| 144 | rule with the ubiquitous semi-colon, `;`.
|
|---|
| 145 |
|
|---|
| 146 |
|
|---|
| 147 | [heading A Quick Overview of Output Generation with __karma__]
|
|---|
| 148 |
|
|---|
| 149 | Spirit not only allows you to describe the structure of the input. Starting with
|
|---|
| 150 | Version 2.0 it enables the specification of the output format for your data
|
|---|
| 151 | in a similar way, and based on a single syntax and compatible semantics.
|
|---|
| 152 |
|
|---|
| 153 | Let's assume we need to generate a textual representation from a simple data
|
|---|
| 154 | structure such as a `std::vector<int>`. Conventional code probably would look like:
|
|---|
| 155 |
|
|---|
| 156 | std::vector<int> v (initialize_and_fill());
|
|---|
| 157 | std::vector<int>::iterator end = v.end();
|
|---|
| 158 | for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
|
|---|
| 159 | std::cout << *it << std::endl;
|
|---|
| 160 |
|
|---|
| 161 | which is not very flexible and quite difficult to maintain when it comes to
|
|---|
| 162 | changing the required output format. Spirit's sublibrary /Karma/ allows you to
|
|---|
| 163 | specify output formats for arbitrary data structures in a very flexible way.
|
|---|
| 164 | The following snippet is the /Karma/ format description used to create the
|
|---|
| 165 | same output as the traditional code above:
|
|---|
| 166 |
|
|---|
| 167 | *(int_ << eol)
|
|---|
| 168 |
|
|---|
| 169 | Here are some more examples of format descriptions for different output
|
|---|
| 170 | representations of the same `std::vector<int>`:
|
|---|
| 171 |
|
|---|
| 172 | [table Different output formats for `std::vector<int>`
|
|---|
| 173 | [ [Format] [Example] [Description] ]
|
|---|
| 174 | [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
|
|---|
| 175 | [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),]`] [Comma separated list of integers in parenthesis] ]
|
|---|
| 176 | [ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
|
|---|
| 177 | [ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
|
|---|
| 178 | ]
|
|---|
| 179 |
|
|---|
| 180 | The syntax is similar to /Qi/ with the exception that we use the `<<`
|
|---|
| 181 | operator for output concatenation. This should be easy to understand as it
|
|---|
| 182 | follows the conventions used in the Standard's I/O streams.
|
|---|
| 183 |
|
|---|
| 184 | Another important feature of /karma/ allows you to fully decouple the data
|
|---|
| 185 | type from the output format. You can use the same output format with different
|
|---|
| 186 | data types as long as these conform conceptually. The next table gives some
|
|---|
| 187 | related examples.
|
|---|
| 188 |
|
|---|
| 189 | [table Different data types usable with the output format `(*int_ << eol)`
|
|---|
| 190 | [ [Data type] ]
|
|---|
| 191 | [ [`int i[4]`] [C style arrays] ]
|
|---|
| 192 | [ [`std::vector<int>`] [Standard vector] ]
|
|---|
| 193 | [ [`std::list<int>`] [Standard list] ]
|
|---|
| 194 | [ [`boost::array<long, 20>`] [Boost array] ]
|
|---|
| 195 | ]
|
|---|
| 196 |
|
|---|
| 197 | [endsect]
|
|---|