Context Navigation

Back to Ticket #2610

Ticket #2610: introduction.qbk

File introduction.qbk, 9.0 KB (added by anonymous, 14 years ago)

Line
1	[/==============================================================================
2	Copyright (C) 2001-2008 Joel de Guzman
3	Copyright (C) 2001-2008 Hartmut Kaiser
4
5	Distributed under the Boost Software License, Version 1.0. (See accompanying
6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7	===============================================================================/]
8
9	[section Introduction]
10
11	Boost Spirit is an object-oriented, recursive-descent parser and output generation
12	library for C++. It allows you to write grammars and format descriptions using a
13	format similar to EBNF (Extended Backus Naur Form, see [4]) directly in
14	C++. These inline grammar specifications can mix freely with other C++ code and,
15	thanks to the generative power of C++ templates, are immediately executable.
16	In retrospect, conventional compiler-compilers or parser-generators have to
17	perform an additional translation step from the source EBNF code to C or C++
18	code.
19
20	The syntax and semantics of the libraries' API directly form domain-specific
21	embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
22	user:
23
24	* one for creating parser grammars,
25	* one for the specification of the required tokens to be used for parsing,
26	* and one for the description of the required output formats.
27
28	Since the target input grammars and output formats are written entirely in C++
29	we do not need any separate tools to compile, preprocess or integrate those
30	into the build process. __spirit__ allows seamless integration of the parsing
31	and output generation process with other C++ code. Often this allows for
32	simpler and more efficient code.
33
34	Both the created parsers and generators are fully attributed which allows you to
35	easily build and handle hierarchical data structures in memory. These data
36	structures resemble the structure of the input data and can directly be used to
37	generate arbitrarily-formatted output.
38
39	The [link spirit.spiritstructure figure] below depicts the overall structure
40	of the Boost Spirit library. The library consists of 4 major parts:
41
42	* __classic__: This is the almost-unchanged code base taken from the
43	former Boost Spirit V1.8 distribution. It has been moved into the namespace
44	boost::spirit::classic. A special compatibility layer has been added to
45	ensure complete compatibility with existing code using Spirit V1.8.
46	* __qi__: This is the parser library allowing you to build recursive
47	descent parsers. The exposed domain-specific language can be used to describe
48	the grammars to implement, and the rules for storing the parsed information.
49	* __lex__: This is the library usable to create tokenizers (lexers). The
50	domain-specific language exposed by __lex__
51	* __karma__: This is the generator library allowing you to create code for
52	recursive descent, data type-driven output formatting. The exposed
53	domain-specific language is almost equivalent to the parser description language
54	used in __qi__, except that it is used to describe the required output
55	format to generate from a given data structure.
56
57	[fig ./images/spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
58
59	The separate sublibraries __qi__, __karma__ and __lex__ are well integrated
60	with any of the other parts. Because of their similar structure and identical
61	underlying technology these are usable either separately or together at the
62	same time. For instance is it possible to directly feed the hierarchical data
63	structures generated by __qi__ into output generators created using __karma__;
64	or to use the token sequence generated by __lex__ as the input for a parser
65	generated by __qi__.
66
67
68	The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
69	some input being converted to some internal representation. After some
70	(optional) transformation these data are converted back into some different,
71	external representation. The picture highlights Spirit's place in this data
72	transformation flow.
73
74	[fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
75
76	[heading A Quick Overview of Parsing with __qi__]
77
78	__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
79	target grammar (essentially a format description of the input data to read).
80
81	A simple EBNF grammar snippet:
82
83	group ::= '(' expression ')'
84	factor ::= integer \| group
85	term ::= factor (('' factor) \| ('/' factor))
86	expression ::= term (('+' term) \| ('-' term))*
87
88	is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
89	code snippet:
90
91	group = '(' >> expression >> ')';
92	factor = integer \| group;
93	term = factor >> (('' >> factor) \| ('/' >> factor));
94	expression = term >> *(('+' >> term) \| ('-' >> term));
95
96	Through the magic of expression templates, this is perfectly valid and
97	executable C++ code. The production rule `expression` is, in fact, an object that
98	has a member function `parse` that does the work given a source code written in
99	the grammar that we have just declared. Yes, it's a calculator. We shall
100	simplify for now by skipping the type declarations and the definition of the
101	rule `integer` invoked by `factor`. Now, the production rule `expression` in our
102	grammar specification, traditionally called the `start` symbol, can recognize
103	inputs such as:
104
105	12345
106	-12345
107	+12345
108	1 + 2
109	1 * 2
110	1/2 + 3/4
111	1 + 2 + 3 + 4
112	1 * 2 * 3 * 4
113	(1 + 2) * (3 + 4)
114	(-1 + 2) * (3 + -4)
115	1 + ((6 * 200) - 20) / 6
116	(1 + (2 + (3 + (4 + 5))))
117
118	Certainly we have done some modifications to the original EBNF syntax. This is
119	done to conform to C++ syntax rules. Most notably we see the abundance of
120	shift >> operators. Since there are no 'empty' operators in C++, it is simply
121	not possible to write something like:
122
123	a b
124
125	as seen in math syntax, for example, to mean multiplication or, in our case,
126	as seen in EBNF syntax to mean sequencing (b should follow a). Spirit
127	uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
128	with arrows pointing to the right, to mean "is followed by". Thus we write:
129
130	a >> b
131
132	The alternative operator `\|` and the parentheses `()` remain as is. The
133	assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
134	the Kleene star `*` which used to be a postfix operator in EBNF becomes a
135	prefix. Instead of:
136
137	a* //... in EBNF syntax,
138
139	we write:
140
141	*a //... in Spirit.
142
143	since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
144	rule with the ubiquitous semi-colon, `;`.
145
146
147	[heading A Quick Overview of Output Generation with __karma__]
148
149	Spirit not only allows you to describe the structure of the input. Starting with
150	Version 2.0 it enables the specification of the output format for your data
151	in a similar way, and based on a single syntax and compatible semantics.
152
153	Let's assume we need to generate a textual representation from a simple data
154	structure such as a `std::vector<int>`. Conventional code probably would look like:
155
156	std::vector<int> v (initialize_and_fill());
157	std::vector<int>::iterator end = v.end();
158	for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
159	std::cout << *it << std::endl;
160
161	which is not very flexible and quite difficult to maintain when it comes to
162	changing the required output format. Spirit's sublibrary /Karma/ allows you to
163	specify output formats for arbitrary data structures in a very flexible way.
164	The following snippet is the /Karma/ format description used to create the
165	same output as the traditional code above:
166
167	*(int_ << eol)
168
169	Here are some more examples of format descriptions for different output
170	representations of the same `std::vector<int>`:
171
172	[table Different output formats for `std::vector<int>`
173	[ [Format] [Example] [Description] ]
174	[ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
175	[ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),]`] [Comma separated list of integers in parenthesis] ]
176	[ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
177	[ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
178	]
179
180	The syntax is similar to /Qi/ with the exception that we use the `<<`
181	operator for output concatenation. This should be easy to understand as it
182	follows the conventions used in the Standard's I/O streams.
183
184	Another important feature of /karma/ allows you to fully decouple the data
185	type from the output format. You can use the same output format with different
186	data types as long as these conform conceptually. The next table gives some
187	related examples.
188
189	[table Different data types usable with the output format `(*int_ << eol)`
190	[ [Data type] ]
191	[ [`int i[4]`] [C style arrays] ]
192	[ [`std::vector<int>`] [Standard vector] ]
193	[ [`std::list<int>`] [Standard list] ]
194	[ [`boost::array<long, 20>`] [Boost array] ]
195	]
196
197	[endsect]

Download in other formats:

Original Format