wiki:SoC2011

Version 15 (modified by David Bellot, 10 years ago) ( diff )

--

Google Summer of Code 2011

Welcome to the Boost C++ Libraries' home page for Google Summer of Code (GSoc). This page provides information about student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with Boost.

This year Boost is looking to fund work on a number of different kinds of proposals:

  • toolkit-like extensions to existing libraries,
  • finishing or extend sandbox libraries,
  • new data structures and algorithms, and
  • multiple competing proposals for the same project.

For projects involving new or experimental libraries, the process of getting source code "Boost-branded" can take much longer than a single summer. In many cases, it can take much longer than a single year. Even if a library is accepted, there is an expectation that the original author will continue to maintain it. Building a library as part of Boost can easily entail a multi-year commitment. For this reason, we are willing to consider multi-year GSoC projects. However, prospective students must limit the scope of their work to a single summer. We may invite the most successful students to re-apply in 2012.

Requirements

Students must submit a proposal. A template for the proposal can be found here here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for Boost discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

Projects

The following projects have been suggested by potential mentors. If the descriptions of these projects seem a little vague... Well, that's intentional We are looking for students to develop requirements for their proposals by doing initial background research on the topic, and interacting with the community on the mailing list to help identify expectations.

Projects from previous years can be found here. There are still a number of interesting projects found in these pages.

Boost.Polygon edge, polyline and edge set concepts

The polygon library is missing an edge concept. There is a lot of code in the library that operates on edges, but no interface that exposes those operations to the user. Also, there are a number of "edge set" algorithms that are of interest, so in addition to edge concept the library also needs an "edge set" concept. Operations on edge set include booleans (intersection, union) and connectivity extraction. Generalizing these to map overlay of edge sets would allow a single generic algorithm to implement all of these operations similar to how polygon set operations are already implemented. The addition of a polyline concept, of which polygon would be a refinement that restricts to closed cycle polylines, would allow polygons to interoperate with edge sets in a very natural and productive syntax. This project would be a great opportunity to learn about concept based generic programming and type systems, how to design and implement generic algorithms and computational geometry in general.

Mentor: Lucanus J. Simonson

Boost.Python and NumPy

Boost.Python currently has limited support for NumPy arrays. The library can be extended to support and interoperate with these and other Python data structures.

Mentors: Stefan Seefeld

Checks & Hashes

Check strings and digits are an invaluable tool for avoiding mistakes in data entry, storage and transmission.

There are many public algorithms available, but not a coherent collection of C++ functions.

The suggested project is to provide such a collection which is in a coherent format, fully tested (using Boost.Test) (including tests with various faulty input) and very fully documented to Boost Quality, using Quickbook, Doxygen, and AutoIndex in both html and pdf.

A key target is to get it to a finished state, rather than to deal with all possible check types.

Much code is already available (from Boost and elsewhere) (and I can contribute some to get off to a quicker start) so the project involves gathering it, testing and documenting rather than much complex coding.

A key design decision the student must take is what format (and names) the functions should take.

Any platform is OK, but it must use bjam to drive the build process. A good demonstration would be to 'package up' something trivially simple like ISBN or something from Boost Cyclic redundancy checks, preparing a jamfile, some Boost style tests, and some skeleton documentation in Quickbook.

A few sample possible checks:

Simple modulo 256 etc check values and digits.

Boost's Cyclic redundancy checks codes http://www.boost.org/doc/libs/1_45_0/libs/crc/index.html

http://www.netrino.com/Embedded-Systems/How-To/CRC-Calculation-C-Code

crc_16_type BISYNCH, ARC crc_ccitt_type designated by CCITT (Comité Consultatif International Télégraphique et Téléphonique) crc_xmodem_type XMODEM crc_32_type PKZip, AUTODIN II, Ethernet, FDDI

MD5 hash http://www.md5.net/

SHA hashes http://en.wikipedia.org/wiki/SHA-1 ...

Luhn algorithm http://en.wikipedia.org/wiki/Luhn_algorithm

Verhoeff algorithm http://en.wikipedia.org/wiki/Verhoeff_algorithm

(These two are used by many of the others below).

European Article numbering EAN Symbol Specification Manual,

Universal Product Code, Uniform Code Council, Dayton, Ohio, USA.

Version of check used by Mastercard, VISA, and most other credit card companies. http://www.beachnet.com/~hstiles/cardtype.html

Generalised to arbitrary radix version allowing any characters (not just digits). Gene Callahan, Dr Dobb's Journal, Dec 1995, 131, 132 & 149. Generating Sequential keys in an Arbitrary Radix.

IBAN International Banking format http://en.wikipedia.org/wiki/International_Bank_Account_Number

ISBN http://en.wikipedia.org/wiki/International_Standard_Book_Number

ISSN http://en.wikipedia.org/wiki/International_Standard_Serial_Number

And there are many, many more potential.

Mentor(s): Paul A. Bristow and others?

SIMD library

SIMD is a class of instruction sets in processors that allow to execute an operation on multiple data elements simultaneously; those instructions are also referred to as vector instructions.
Popular examples of SIMD instruction sets include MMX, SSE, and AltiVec.

A SIMD abstraction component has been in development for several years as part of the NT2 project, and effort is being done to retrofit it to a Boost library.

The project involves consolidating support for non-intel processors, in particular the AltiVec instruction set, and polishing the library on all aspects (better docs, examples, tests, benchmarks and general boostification improvements).
Benchmarks are especially important due to the nature of the library, and are necessary so as to validate the work that has been done.

Implementing saturated arithmetic (i.e. values stay at the minimum or maximum instead of overflowing) could also be part of the project. Those are provided as-is by AltiVec, but a software fallback should be implemented as well.

A talk is planned for Boostcon 2011 (May 15-20) to demonstrate the library, which by that point will already be somewhat boostified by NT2 developers.

SSH access to PowerPC G5 and Cell computers will be given so as to execute the work.

Requirements for the students are as follows:

  • Solid knowledge of modern C++, comfortable with both low-level code that deals with memory and template code.
    Experience with Boost.Proto a plus, but not strictly required.
  • Basic understanding of SIMD. AltiVec is very orthogonal, and therefore very simple, in comparison to SSE, so it can be gotten used to quickly.

Mentors: Joel Falcou and Mathias Gaunard

ConcepTraits library

The ConceptTraits library was abandoned when Concepts became part of the C++ standard. Unfortunately the concept feature will be missing for the next standard.

This library was composed mainly of 3 parts:

  • operators traits
  • macros to generate member traits
  • concept traits

The two first parts have been managed by Boost.TypeTraits operator extension reviewed in Mars and Boost.TTI respectively.

It would be great to finish the 3rd part and adapt it to the new libraries.

Boost.ConceptTraits

Mentors: Vicente J. Botet Escriba

Boost.Process

Boost.Process should become the library to manage system processes. While a first version was created in 2006 (in a GSoC project), we never managed to finish the library. Not that there were no attempts in the past years - we even had a review in February 2011. However even with the latest version (known as 0.4) developers in the Boost community were not happy with.

A lot of work was done in the GSoC 2010 program when we created the current version 0.4. You should make yourself familiar with that version - please find the documentation at http://www.highscore.de/boost/gsoc2010/ and the library at http://www.highscore.de/boost/gsoc2010/process.zip. Please also read the conclusions of the review at http://article.gmane.org/gmane.comp.lib.boost.user/66363. They give you an idea what the next steps will be and what you could work on.

Mentor: Boris Schaeling (boris[at]highscore.de)

Boost.uBLAS

First of all, we have a page with the list of future and desired new features here: http://sourceforge.net/apps/mediawiki/ublas/index.php?title=Main_Page. Students are encourage to consult this page.

Boost.uBLAS is a fast implementation of linear algebra procedures, of for short it's a vector and matrix library. Actually it's indeed a vector AND matrix library and this is one of the main problem. Vectors are matrices, at least in standard math textbook, but not in Boost.uBLAS. They are represented as 2 separate classes and do not share code really. Not enough to be efficient. Moreover, vector being considered as fixed-sized vectors, they are not as versatile as STL vectors (but it's not the same concept) and not as accurate as a true linear algebra vector, that is they do not implement the notion of being row-vector or column-vector.

Said like that, it's not that important you would say, but by merging vector and matrix classes, we could share a lot of code and optimize even further. Second of all, by having a unified architecture we could start implementing modern acceleration techniques that lacks in Boost.uBLAS, like SIMD computations, multi-core, GPU, etc...

Our ideas for a GSOC project are the following:

  • unify representation of vector and matrices into a unique matrix class. Keep compatibility with previous code by providing a default vector<> representation that would inherit for matrix<>. Improve it too by adding more template parameters like size and orientation.
  • use this new architecture to propose implementation for the following:
    • fixed-sized vectors and matrices with optimization
    • a true * operator for vector/vector, vector/matrix and matrix/matrix multiplication
    • an architecture to choose at compile time the best algorithm to apply if several are available (very relevant to multiplication for example),
    • ideas and examples on how to implement SIMD and multicore operations.

Inspiration from other libraries like Eigen, Armadillo, GotoBLAS, etc... is highly recommended (after all, that's one of the raison d'etre of Free Software).

Mentor: David Bellot (david.bellot[at]gmail.com)

Note: See TracWiki for help on using the wiki.