Version 5 (modified by 14 years ago) ( diff ) | ,
---|
Google Summer of Code 2009
Boost.Serialization
Here are a few small projects which would be really helpful.
Performance Testing and Profiling
I've managed to setup performance profiling using the following:
- current (as I write this) Boost.Build tools.
- the gcc compiler.
- and a shell script - profile.sh
- library_status program from the tools/regression/src directory
Invoking profile script produces a table which shows the results of each test and links to the actual profile.
The first thing I did was include some of the serialization library tests. It became immediately apparent that these tests were totally unsuitable for performance testing and that new tests needed to be written for this purpose. These tests would highlight the location of any performance bottlenecks in the serialization library. Whenever I've subjected my code in the past to this type of analysis, I've always been suprised to find bottlenecks in totally unanticipated places and fixing those has always lead to large improvements in performance. I expect that this project would have a huge impact on the utility of the serialization library.
Back Versioning
It has been suggested that a useful feature of the library would be the ability to create "older versions" of archives. Currently, the library permits one make programs that are guarenteed the ability to load archives with classes of a previous version. But there is not way to save classes in accordance with a previous version. At first I dismissed this a a huge project with small demand. A cursory examination of the code revealed that this would not be very difficult. It would require some small changes in code and some additional tests. Also it would require special treatment in the documentation - perhaps a case study.
Environments without RTTI
I note that some have commented that this library requires RTTI. This is not strictly true. The examples and almost all the tests presume the existence of RTTI. But it should be possible to use the library without it. The example used for testing is an extended_typeinfo implemenation which presumes that all classes names have been exported. So, to make this library compatible for platforms without RTTI, a set of tests, examples and new manual section would have to be created
Portable Archives (straszheim)
A method for testing portability was suggested (save and load to a portable binary archive, and verify that the checksum of the binary archive matches some checksum. This would involve development of the archive and tests.
Maximum version check (straszheim)
See https://svn.boost.org/trac/boost/ticket/2830
Export name aliasing (straszheim)
If you change the name under which a class is BOOST_CLASS_EXPORT'ed you break backwards compatibility... you can't read older archives. In addition there is no way to have more than one class exported under the same name.
Boost.Python
- Python 3.0 support
- Ability to extend the fundamental PyTypeObject used by boost.python
- Thread safety
- PyFinalize support
- Easier methods to write to_python/from_python converters
Boost.Proto
support for two-level (van Wijngaarden) grammars (niebler)
Proto is essentially a compiler construction toolkit for DSELs. It allows you to define the grammar for the DSEL, but currently has no native support for DSEL type systems. Support for two-level grammars would fill that hole.
The job would need a student who has experience with type theory and a solid grasp of C++ template metaprogramming. The can read more about the problem in this thread:
http://groups.google.com/group/boost-list/browse_frm/thread/df6ecfb0089b28fd
Serialization/Frames
Library based on an extension of the Archive concept making it bidirectional. From wikipedia "A frame is a data packet of fixed or variable length which has been encoded by a data link layer communications protocol for digital transmission over a node-to-node link. Each frame consists of a header frame synchronization and perhaps bit synchronization, payload (useful information, or a packet at higher protocol layer) and trailer. Examples are Ethernet frames and Point-to-point protocol (PPP) frames."
The Boost.serialization saving archive concept allows to save serializable data at the end of the archive, and the loading archive concept allows to read serializable data from the beginning of the archive. The saving frame concept will allows to save serializable data either at the end or the begin of the frame, and the loading frame concept allows to read serializable data from the beginning or the end of the archive. I'm not sure which syntax will be the more appropriated. The serialization library use the <<, >>, and & operators that are associative from left to right. We need the equivalent operators from right to left. <<=, >>=, and &= seams to be the more natural candidates but I don't know if it is correct in C++ to define this operators with this prototype
template <typename T> frame& operator<<=(const T&, frame&); template <typename T> frame& operator>>=(const T&, frame&); template <typename T> frame& operator&=(const T&, frame&);
h1 >>= h2 >>= sf << t2 << t1
is equivalent to
(h1 >>= (h2 >>= ((sf << t2) << t1)))
and should be equivalent to
sa & h1 & h2 & t2 & t1
if sf and sa were empty.
The main difference is that we can do it hierarchically. The top layer will create a frame, and serialize its own information elements.
frame sf; h_n >>= sf << p_n << t_n;
Then this top layer will use a primitive of the lower level having as parameter the frame as payload.
primitive_n-1(sf);
A primitive at the k level will add its own header and trailer information element to the payload of the upper level
void primitive_k(frame& sf) { // ... h_k >>= sf << t_k; // ... another_primitive_k_1(sf); // ... }
So the frame allows to serialize top-down. To get the same result with the archive concept the serialization must be done bottom-up, needing to chain each one of the information element in a list and only when we have all off them we can start the serialization. I think that the frame approach should be more efficient because it avoid to store in dynamic memory the information elements to be serialized, instead they are serialized directly.
Loading a frame works as loading an archive, except that we can load also at the end of the frame. This avoids to load the complete archive to load the trailer of the lower levels.
lf >>= h_1; // ... t_1 << lf;
In addition, it would be great to have saving/loading frames (I'm not sure but I think that an archive can not be saving and loading at the same time). The same frame can be used as loading, analyzing only some lower levels, and as saving in order to construct other lower levels. This will be very useful for gateways and routers.
| L4 |<------------------------------>| L4 | | L3 | | ADAPTATION | | L3 | | L2 |<-->| L2 | | X2 | <--> | X2 | | L1 |<-->| L1 | | X1 | <--> | X1 |
It would be great also that the same data that can be serialized on archives, could be serializable on a frame using the same save function. But this works only when we save at the end of the frame. Let me see this using a little example: Class C has 3 information elements to serialize (a, b and c). So the save functions could be something like
template <typename ARCHIVE> void C::save(ARCHIVE& sa) { sa & a & b & c; }
This save function works well from left to right, but can not be used when saving at the beginning of a frame, because the expected result when saving at the beginning is
a b c sa
but the result will be
c b a sa
So unfortunately we need a different save function
template <typename FRAME> void C::save(FRAME& sa, begin_tag&) { a >>= b >>= c >>= sa; // a >>= (b >>= (c >>= sa)); }