1 | [/==============================================================================
|
---|
2 | Copyright (C) 2001-2008 Joel de Guzman
|
---|
3 | Copyright (C) 2001-2008 Hartmut Kaiser
|
---|
4 |
|
---|
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying
|
---|
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
---|
7 | ===============================================================================/]
|
---|
8 |
|
---|
9 | [section Introduction]
|
---|
10 |
|
---|
11 | Boost Spirit is an object-oriented, recursive-descent parser and output generation
|
---|
12 | library for C++. It allows you to write grammars and format descriptions using a
|
---|
13 | format similar to EBNF (Extended Backus Naur Form, see [4]) directly in
|
---|
14 | C++. These inline grammar specifications can mix freely with other C++ code and,
|
---|
15 | thanks to the generative power of C++ templates, are immediately executable.
|
---|
16 | In retrospect, conventional compiler-compilers or parser-generators have to
|
---|
17 | perform an additional translation step from the source EBNF code to C or C++
|
---|
18 | code.
|
---|
19 |
|
---|
20 | The syntax and semantics of the libraries' API directly form domain-specific
|
---|
21 | embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
|
---|
22 | user:
|
---|
23 |
|
---|
24 | * one for creating parser grammars,
|
---|
25 | * one for the specification of the required tokens to be used for parsing,
|
---|
26 | * and one for the description of the required output formats.
|
---|
27 |
|
---|
28 | Since the target input grammars and output formats are written entirely in C++
|
---|
29 | we do not need any separate tools to compile, preprocess or integrate those
|
---|
30 | into the build process. __spirit__ allows seamless integration of the parsing
|
---|
31 | and output generation process with other C++ code. Often this allows for
|
---|
32 | simpler and more efficient code.
|
---|
33 |
|
---|
34 | Both the created parsers and generators are fully attributed which allows you to
|
---|
35 | easily build and handle hierarchical data structures in memory. These data
|
---|
36 | structures resemble the structure of the input data and can directly be used to
|
---|
37 | generate arbitrarily-formatted output.
|
---|
38 |
|
---|
39 | The [link spirit.spiritstructure figure] below depicts the overall structure
|
---|
40 | of the Boost Spirit library. The library consists of 4 major parts:
|
---|
41 |
|
---|
42 | * __classic__: This is the almost-unchanged code base taken from the
|
---|
43 | former Boost Spirit V1.8 distribution. It has been moved into the namespace
|
---|
44 | boost::spirit::classic. A special compatibility layer has been added to
|
---|
45 | ensure complete compatibility with existing code using Spirit V1.8.
|
---|
46 | * __qi__: This is the parser library allowing you to build recursive
|
---|
47 | descent parsers. The exposed domain-specific language can be used to describe
|
---|
48 | the grammars to implement, and the rules for storing the parsed information.
|
---|
49 | * __lex__: This is the library usable to create tokenizers (lexers). The
|
---|
50 | domain-specific language exposed by __lex__
|
---|
51 | * __karma__: This is the generator library allowing you to create code for
|
---|
52 | recursive descent, data type-driven output formatting. The exposed
|
---|
53 | domain-specific language is almost equivalent to the parser description language
|
---|
54 | used in __qi__, except that it is used to describe the required output
|
---|
55 | format to generate from a given data structure.
|
---|
56 |
|
---|
57 | [fig ./images/spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
|
---|
58 |
|
---|
59 | The separate sublibraries __qi__, __karma__ and __lex__ are well integrated
|
---|
60 | with any of the other parts. Because of their similar structure and identical
|
---|
61 | underlying technology these are usable either separately or together at the
|
---|
62 | same time. For instance is it possible to directly feed the hierarchical data
|
---|
63 | structures generated by __qi__ into output generators created using __karma__;
|
---|
64 | or to use the token sequence generated by __lex__ as the input for a parser
|
---|
65 | generated by __qi__.
|
---|
66 |
|
---|
67 |
|
---|
68 | The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
|
---|
69 | some input being converted to some internal representation. After some
|
---|
70 | (optional) transformation these data are converted back into some different,
|
---|
71 | external representation. The picture highlights Spirit's place in this data
|
---|
72 | transformation flow.
|
---|
73 |
|
---|
74 | [fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
|
---|
75 |
|
---|
76 | [heading A Quick Overview of Parsing with __qi__]
|
---|
77 |
|
---|
78 | __qi__ is Spirit's sublibrary dealing with generating parsers based on a given
|
---|
79 | target grammar (essentially a format description of the input data to read).
|
---|
80 |
|
---|
81 | A simple EBNF grammar snippet:
|
---|
82 |
|
---|
83 | group ::= '(' expression ')'
|
---|
84 | factor ::= integer | group
|
---|
85 | term ::= factor (('*' factor) | ('/' factor))*
|
---|
86 | expression ::= term (('+' term) | ('-' term))*
|
---|
87 |
|
---|
88 | is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
|
---|
89 | code snippet:
|
---|
90 |
|
---|
91 | group = '(' >> expression >> ')';
|
---|
92 | factor = integer | group;
|
---|
93 | term = factor >> *(('*' >> factor) | ('/' >> factor));
|
---|
94 | expression = term >> *(('+' >> term) | ('-' >> term));
|
---|
95 |
|
---|
96 | Through the magic of expression templates, this is perfectly valid and
|
---|
97 | executable C++ code. The production rule `expression` is, in fact, an object that
|
---|
98 | has a member function `parse` that does the work given a source code written in
|
---|
99 | the grammar that we have just declared. Yes, it's a calculator. We shall
|
---|
100 | simplify for now by skipping the type declarations and the definition of the
|
---|
101 | rule `integer` invoked by `factor`. Now, the production rule `expression` in our
|
---|
102 | grammar specification, traditionally called the `start` symbol, can recognize
|
---|
103 | inputs such as:
|
---|
104 |
|
---|
105 | 12345
|
---|
106 | -12345
|
---|
107 | +12345
|
---|
108 | 1 + 2
|
---|
109 | 1 * 2
|
---|
110 | 1/2 + 3/4
|
---|
111 | 1 + 2 + 3 + 4
|
---|
112 | 1 * 2 * 3 * 4
|
---|
113 | (1 + 2) * (3 + 4)
|
---|
114 | (-1 + 2) * (3 + -4)
|
---|
115 | 1 + ((6 * 200) - 20) / 6
|
---|
116 | (1 + (2 + (3 + (4 + 5))))
|
---|
117 |
|
---|
118 | Certainly we have done some modifications to the original EBNF syntax. This is
|
---|
119 | done to conform to C++ syntax rules. Most notably we see the abundance of
|
---|
120 | shift >> operators. Since there are no 'empty' operators in C++, it is simply
|
---|
121 | not possible to write something like:
|
---|
122 |
|
---|
123 | a b
|
---|
124 |
|
---|
125 | as seen in math syntax, for example, to mean multiplication or, in our case,
|
---|
126 | as seen in EBNF syntax to mean sequencing (b should follow a). Spirit
|
---|
127 | uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
|
---|
128 | with arrows pointing to the right, to mean "is followed by". Thus we write:
|
---|
129 |
|
---|
130 | a >> b
|
---|
131 |
|
---|
132 | The alternative operator `|` and the parentheses `()` remain as is. The
|
---|
133 | assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
|
---|
134 | the Kleene star `*` which used to be a postfix operator in EBNF becomes a
|
---|
135 | prefix. Instead of:
|
---|
136 |
|
---|
137 | a* //... in EBNF syntax,
|
---|
138 |
|
---|
139 | we write:
|
---|
140 |
|
---|
141 | *a //... in Spirit.
|
---|
142 |
|
---|
143 | since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
|
---|
144 | rule with the ubiquitous semi-colon, `;`.
|
---|
145 |
|
---|
146 |
|
---|
147 | [heading A Quick Overview of Output Generation with __karma__]
|
---|
148 |
|
---|
149 | Spirit not only allows you to describe the structure of the input. Starting with
|
---|
150 | Version 2.0 it enables the specification of the output format for your data
|
---|
151 | in a similar way, and based on a single syntax and compatible semantics.
|
---|
152 |
|
---|
153 | Let's assume we need to generate a textual representation from a simple data
|
---|
154 | structure such as a `std::vector<int>`. Conventional code probably would look like:
|
---|
155 |
|
---|
156 | std::vector<int> v (initialize_and_fill());
|
---|
157 | std::vector<int>::iterator end = v.end();
|
---|
158 | for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
|
---|
159 | std::cout << *it << std::endl;
|
---|
160 |
|
---|
161 | which is not very flexible and quite difficult to maintain when it comes to
|
---|
162 | changing the required output format. Spirit's sublibrary /Karma/ allows you to
|
---|
163 | specify output formats for arbitrary data structures in a very flexible way.
|
---|
164 | The following snippet is the /Karma/ format description used to create the
|
---|
165 | same output as the traditional code above:
|
---|
166 |
|
---|
167 | *(int_ << eol)
|
---|
168 |
|
---|
169 | Here are some more examples of format descriptions for different output
|
---|
170 | representations of the same `std::vector<int>`:
|
---|
171 |
|
---|
172 | [table Different output formats for `std::vector<int>`
|
---|
173 | [ [Format] [Example] [Description] ]
|
---|
174 | [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
|
---|
175 | [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),]`] [Comma separated list of integers in parenthesis] ]
|
---|
176 | [ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
|
---|
177 | [ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
|
---|
178 | ]
|
---|
179 |
|
---|
180 | The syntax is similar to /Qi/ with the exception that we use the `<<`
|
---|
181 | operator for output concatenation. This should be easy to understand as it
|
---|
182 | follows the conventions used in the Standard's I/O streams.
|
---|
183 |
|
---|
184 | Another important feature of /karma/ allows you to fully decouple the data
|
---|
185 | type from the output format. You can use the same output format with different
|
---|
186 | data types as long as these conform conceptually. The next table gives some
|
---|
187 | related examples.
|
---|
188 |
|
---|
189 | [table Different data types usable with the output format `(*int_ << eol)`
|
---|
190 | [ [Data type] ]
|
---|
191 | [ [`int i[4]`] [C style arrays] ]
|
---|
192 | [ [`std::vector<int>`] [Standard vector] ]
|
---|
193 | [ [`std::list<int>`] [Standard list] ]
|
---|
194 | [ [`boost::array<long, 20>`] [Boost array] ]
|
---|
195 | ]
|
---|
196 |
|
---|
197 | [endsect]
|
---|