wiki:SoC2010

Version 23 (modified by Andrew Sutton, 13 years ago) ( diff )

--

For information about Google Summer of Code see the program home page. This site contains information about registration, submitting proposals, and mentoring information. A calendar of dates and deadlines can be found here.

This page will be used to document any Boost-specific policies regarding student acceptance and project management and also serves as a place to list project ideas.

Previous Summer of Code projects can be found here.

Please see the submission template for information on submitting a student proposal. Don't forget to read the hints before proposing.

Project Ideas

Please list ideas for possible student projects.

Boost.Build

There are a few interesting improvements that can be done withing GSoC timeframe:

  • Signature-based update engine. This means that a MD5 checksum of a command used to update each target is recorded, and if that changes on the next run, the target is rebuilt. This will considerably improve reliability.
  • IDE integration. There's already initial plugin for Eclipse CDT plugin, which can be improved. Plugins for other IDEs, such as KDevelop or Visual Studio are also possible.
  • Packaging support. It would be nice to automatically create packages in popular distribution formats.
  • Python Port can offer many tasks, though it's relatively high challenge.

Boost.Process

A first version of Boost.Process was created in the SoC 2006 program (see http://code.google.com/soc/2006/boost/appinfo.html?csaid=7ADC016A60772A9C). Since then there have been several attempts to finish this library (one by myself in 2009). While there is considerable interest in this library noone had the time yet to do the job.

The status quo of the library (which is available at http://svn.boost.org/svn/boost/sandbox/process/) is documented at http://www.highscore.de/cpp/process/. There have been various discussions in the Boost mailing lists last year, too.

As far as I remember some problems are:

  • As platforms are very different when it comes to supporting higher level features it's not entirely clear how the architecture should look like (there have been different implementations in the past).
  • Some parts of the library could become libraries of their own or even be moved to other libraries like Boost.Interprocess (eg. pipes).
  • Supporting asynchronous I/O works already but doesn't really follow Boost.Asio patterns.
  • Unit tests are different than in other libraries (as processes have to be created) and have to be improved.

The goal would be to finish the library after 4 years that it can finally be reviewed. ;)

Boris, boris@…

Boost.XML

I started a 'boost.xml' library (right now in https://svn.boost.org/svn/boost/sandbox/xml/) a long time ago, which I have never found enough time and energy to bring to a formal boost submission. Having a proper XML library as part of boost would be very useful, in particular if "proper XML" goes beyond the basics such as parsing. (There already are lots of half-baked XML parsers, but few fully conform to the spec, or yield well performing in-memory representations that can be traversed via XML-specific tools such as XPath.)

The student needs to have a basic understanding of the XML and related specifications, as well as some exposure to existing XML APIs (such as DOM, SAX, XMLReader) to understand the problem domain.

Boost.Python

There are a number of ways in which the Boost.Python library may be improved.

  • Add (better) support for NumPy bindings.
  • Improve the converter registry to avoid conflicts when multiple extension modules try to expose the same C++ types.
  • Add bindings for other Python C API functions, such as exception handling.

Students working on this project should have some basic Python knowledge (as well as NumPy in the first case above) and past experience with Boost.Python.

Boost.Graph

For projects on the Boost.Graph library, students should have a working knowledge of graph theory and graph data structures. An sound understanding of techniques for generic programming are also required.

In addition to the graph library that is currently in Boost, Indiana University has also created parallel extensions to BGL (the Parallel Boost Graph Library) for distributed-memory computers. Several of the projects below would also be worthwhile to integrate into the parallel framework.

  • Topology Generators - One useful set of algorithms not included in the BGL is the ability to easily generate graphs of specific topologies (e.g., path, cycle, star, etc.). In addition to creating graphs of these topologies, it would be useful to induce these topologies on an existing set of vertices, and query a set of vertices to determine if the topology is admitted by a set of vertices.

  • Hierarchical Layout Algorithms - The Boost Graph Library (BGL) currently only implements a small number of graph layout algorithms, primarily spring-based layout algorithms. This project would involve implementing other layout algorithms, especially hierarchical algorithms such as that used in the dot program of Graphviz (http://www.graphviz.org/).
  • Semantic Graphs - A semantic graph is a graph that can have multiple distinct types of vertices and edges, with restrictions on connections between vertices and edges based on their types. There may also be more efficient representations of the graph based on storing different vertex and edge types separately. This project would be to define appropriate concepts, algorithms, and data structures for semantic graphs in BGL.
  • Computer Vision Algorithms - This project would be to implement computer vision algorithms based on max-flow problems on graphs using BGL. Implementing a specialized graph type for grids would also be needed in order to avoid storing the graph topology explicitly.
  • Closures and Reductions - Implement a suite of generic graph algorithms that computer closures and reductions on graphs. The BGL already contains a transitive_closure and a transitive_reduction. Other kinds of computable closures and reductions include reflexive, symmetric, and congruence. This project can also include predicates to test whether or not a graph exhibits these properties (e.g., is_symmetric).
  • Bindings to Python (or Other Languages) - BGL has some bindings to Python and R, but they do not exist for all algorithms. This project would be to either extend the Python or R bindings, or write a set of BGL bindings to another language.
  • Network Workbench - The Network Workbench (http://nwb.slis.indiana.edu/) is a GUI-based framework for analysis of large-scale networks. This project would involve implementing components for NWB based on BGL and/or Parallel BGL.

Heaps & Queues

There are a number of data structures in boost/pending that implement different kinds of heaps and queues that are used in a number of different Boost.Graph (BGL) algorithms. These can be cleaned up, decoupled from the BGL and redeveloped into a useful library for advanced data structures.

The library could include a binary (i.e. the std heap), fibonacci, binomial, and relaxed heaps. The binomial heap would be a new data structure. A second requirement is that heaps may be "mutable", meaning that a value already stored in the heap can be modified and the heap efficiently updated to accommodate the new change.

Priority queues are an adaptations on heap data structures.

This project requires a working knowledge of C++, templates and data structures.

Bits & Ints

Boost could use a utility library that brings together and extends a number of existing data structrues and utilities for working with binary data. This library might include:

  • Multidimensional bitfields or dynamic_bitfields.
  • Compressed value array and vector (like vector<bool> but also for multi-bit values)
  • Programming utilities for bit and bitmask manipulation

This may also cover a number of integral special functions, including:

  • Functions for getting carry bits from addition and the high-half of an integer multiply
  • Sign extension from a length of m bits to a length of n
  • Rounding right-shifts (correctly rounded integer divide by powers of 2)
  • Saturating arithmetic operations
  • Multi-word shifts (used for things like inserting in the middle of a vector<bool>)
  • Bit reversal
  • Two-word divide by one-word value (there is often a special instruction for that)
  • Bit-interleaved arithmetic operations (used for Morton order/cache-oblivious traversals of matrices and similar things)
  • Everything from chapter 5 of https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF7785256996007558C6

Students should be familiar with C++ templates, STL-style data structures, and computer arithmetic.

BigInt

Implement a "big integer" data type which supports all C++ arithmetic and binary operators to do arbitrary precision calculations. This data type might also be specialized for fixed size big integers once the algorithms are in place which gives the compiler a better chance to optimize and which gets rid of the allocator parameter. Basic operations are relatively easy to implement.

Simply wrapping libraries like GMP will be insufficient for solving this problem due to licensing issues.

Students will require knowledge of C++ and a have a good math background. Familiarity with literature (Knuth's Seminumerical Algorithms) or other code on the subject is a plus. A successful proposal will include a discussion of licensing concerns.

Sweepline Algorithm

2D medial axis, Veronoi diagrams and Delaunay triangulation are three classical problems in computational geometry. Veronoi diagrams are well known to be the dual graph of delaunay triangulation, so if you solve one you have solved the other. Medial axis is also related to the other two because it can also be solved with sweepline. All three could be solved by a single generic-parameterized implementation of a sweepline algorithm.

Student should be familiar with C++, generic programming techniques, and have some knowledge of computational geometry.

Porting to C++0x

Port your favorite Boost C++ Library to C++0x! Many of the mainstream (but experimental) compilers support a number of C++0x features including rvalue references and move semantics, lambda functions, variadic templates, auto and decltype type deduction, etc. Rewriting your favorite library will not only give Boost a head start in the migration process, it will help identify cases where the new language features might be used to improve the design of a library or data structure.

Students should have a strong background in C++, some knowledge of the C++0x language features, and experience using their suggested library. Proposals should include information about the kinds of C++0x features that could be used to improve the library when porting.

Boost.Phoenix

Boost.Phoenix is currently living inside the internals of Boost.Spirit. Meanwhile it has been accepted as a main citizen of Boost under the condition it gets rewritten in term of Boost.Proto. Currently, I developed a few prototypes of such a version but they still expose a lot of Proto internals themselves to the user, thus making the extension of Phoenix cumbersome for advanced users. This project aims at exploring the various existing prototype, propose a proper interface and defines an easy-to-use extension mechanism on top of these so Phoenix v3.0 can starts its own life.

Student should have a strong background in C++ and experience with Expression Templates with or without Boost.Proto. Base knowledge in functional programming could be an advantage.

Summer of Code Policies

Forthcoming...

Note: See TracWiki for help on using the wiki.