Ticket #2610: introduction.qbk

File introduction.qbk, 9.0 KB (added by anonymous, 14 years ago)
Line 
1[/==============================================================================
2 Copyright (C) 2001-2008 Joel de Guzman
3 Copyright (C) 2001-2008 Hartmut Kaiser
4
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section Introduction]
10
11Boost Spirit is an object-oriented, recursive-descent parser and output generation
12library for C++. It allows you to write grammars and format descriptions using a
13format similar to EBNF (Extended Backus Naur Form, see [4]) directly in
14C++. These inline grammar specifications can mix freely with other C++ code and,
15thanks to the generative power of C++ templates, are immediately executable.
16In retrospect, conventional compiler-compilers or parser-generators have to
17perform an additional translation step from the source EBNF code to C or C++
18code.
19
20The syntax and semantics of the libraries' API directly form domain-specific
21embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
22user:
23
24* one for creating parser grammars,
25* one for the specification of the required tokens to be used for parsing,
26* and one for the description of the required output formats.
27
28Since the target input grammars and output formats are written entirely in C++
29we do not need any separate tools to compile, preprocess or integrate those
30into the build process. __spirit__ allows seamless integration of the parsing
31and output generation process with other C++ code. Often this allows for
32simpler and more efficient code.
33
34Both the created parsers and generators are fully attributed which allows you to
35easily build and handle hierarchical data structures in memory. These data
36structures resemble the structure of the input data and can directly be used to
37generate arbitrarily-formatted output.
38
39The [link spirit.spiritstructure figure] below depicts the overall structure
40of the Boost Spirit library. The library consists of 4 major parts:
41
42* __classic__: This is the almost-unchanged code base taken from the
43 former Boost Spirit V1.8 distribution. It has been moved into the namespace
44 boost::spirit::classic. A special compatibility layer has been added to
45 ensure complete compatibility with existing code using Spirit V1.8.
46* __qi__: This is the parser library allowing you to build recursive
47 descent parsers. The exposed domain-specific language can be used to describe
48 the grammars to implement, and the rules for storing the parsed information.
49* __lex__: This is the library usable to create tokenizers (lexers). The
50 domain-specific language exposed by __lex__
51* __karma__: This is the generator library allowing you to create code for
52 recursive descent, data type-driven output formatting. The exposed
53 domain-specific language is almost equivalent to the parser description language
54 used in __qi__, except that it is used to describe the required output
55 format to generate from a given data structure.
56
57[fig ./images/spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
58
59The separate sublibraries __qi__, __karma__ and __lex__ are well integrated
60with any of the other parts. Because of their similar structure and identical
61underlying technology these are usable either separately or together at the
62same time. For instance is it possible to directly feed the hierarchical data
63structures generated by __qi__ into output generators created using __karma__;
64or to use the token sequence generated by __lex__ as the input for a parser
65generated by __qi__.
66
67
68The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
69some input being converted to some internal representation. After some
70(optional) transformation these data are converted back into some different,
71external representation. The picture highlights Spirit's place in this data
72transformation flow.
73
74[fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
75
76[heading A Quick Overview of Parsing with __qi__]
77
78__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
79target grammar (essentially a format description of the input data to read).
80
81A simple EBNF grammar snippet:
82
83 group ::= '(' expression ')'
84 factor ::= integer | group
85 term ::= factor (('*' factor) | ('/' factor))*
86 expression ::= term (('+' term) | ('-' term))*
87
88is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
89code snippet:
90
91 group = '(' >> expression >> ')';
92 factor = integer | group;
93 term = factor >> *(('*' >> factor) | ('/' >> factor));
94 expression = term >> *(('+' >> term) | ('-' >> term));
95
96Through the magic of expression templates, this is perfectly valid and
97executable C++ code. The production rule `expression` is, in fact, an object that
98has a member function `parse` that does the work given a source code written in
99the grammar that we have just declared. Yes, it's a calculator. We shall
100simplify for now by skipping the type declarations and the definition of the
101rule `integer` invoked by `factor`. Now, the production rule `expression` in our
102grammar specification, traditionally called the `start` symbol, can recognize
103inputs such as:
104
105 12345
106 -12345
107 +12345
108 1 + 2
109 1 * 2
110 1/2 + 3/4
111 1 + 2 + 3 + 4
112 1 * 2 * 3 * 4
113 (1 + 2) * (3 + 4)
114 (-1 + 2) * (3 + -4)
115 1 + ((6 * 200) - 20) / 6
116 (1 + (2 + (3 + (4 + 5))))
117
118Certainly we have done some modifications to the original EBNF syntax. This is
119done to conform to C++ syntax rules. Most notably we see the abundance of
120shift >> operators. Since there are no 'empty' operators in C++, it is simply
121not possible to write something like:
122
123 a b
124
125as seen in math syntax, for example, to mean multiplication or, in our case,
126as seen in EBNF syntax to mean sequencing (b should follow a). Spirit
127uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
128with arrows pointing to the right, to mean "is followed by". Thus we write:
129
130 a >> b
131
132The alternative operator `|` and the parentheses `()` remain as is. The
133assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
134the Kleene star `*` which used to be a postfix operator in EBNF becomes a
135prefix. Instead of:
136
137 a* //... in EBNF syntax,
138
139we write:
140
141 *a //... in Spirit.
142
143since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
144rule with the ubiquitous semi-colon, `;`.
145
146
147[heading A Quick Overview of Output Generation with __karma__]
148
149Spirit not only allows you to describe the structure of the input. Starting with
150Version 2.0 it enables the specification of the output format for your data
151in a similar way, and based on a single syntax and compatible semantics.
152
153Let's assume we need to generate a textual representation from a simple data
154structure such as a `std::vector<int>`. Conventional code probably would look like:
155
156 std::vector<int> v (initialize_and_fill());
157 std::vector<int>::iterator end = v.end();
158 for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
159 std::cout << *it << std::endl;
160
161which is not very flexible and quite difficult to maintain when it comes to
162changing the required output format. Spirit's sublibrary /Karma/ allows you to
163specify output formats for arbitrary data structures in a very flexible way.
164The following snippet is the /Karma/ format description used to create the
165same output as the traditional code above:
166
167 *(int_ << eol)
168
169Here are some more examples of format descriptions for different output
170representations of the same `std::vector<int>`:
171
172[table Different output formats for `std::vector<int>`
173 [ [Format] [Example] [Description] ]
174 [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
175 [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),]`] [Comma separated list of integers in parenthesis] ]
176 [ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
177 [ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
178]
179
180The syntax is similar to /Qi/ with the exception that we use the `<<`
181operator for output concatenation. This should be easy to understand as it
182follows the conventions used in the Standard's I/O streams.
183
184Another important feature of /karma/ allows you to fully decouple the data
185type from the output format. You can use the same output format with different
186data types as long as these conform conceptually. The next table gives some
187related examples.
188
189[table Different data types usable with the output format `(*int_ << eol)`
190 [ [Data type] ]
191 [ [`int i[4]`] [C style arrays] ]
192 [ [`std::vector<int>`] [Standard vector] ]
193 [ [`std::list<int>`] [Standard list] ]
194 [ [`boost::array<long, 20>`] [Boost array] ]
195]
196
197[endsect]