wiki:SoC2015

Version 8 (modified by Niall Douglas, 8 years ago) ( diff )

Added AFIO project ideas.

Boost Google Summer of Code 2015

Welcome to the Boost C++ Libraries' home page for Google Summer of Code (GSoC) 2015. This page provides information about suggested student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with Boost.

Changes in process for both mentors and students over preceding years (PLEASE READ)

This year, 2015, is going to be slightly different from preceding years of how Boost has operated GSoC. The first new change was that last year we trialled extending the really superb GSoC 2014 projects by an additional three month period funded by Boost, and in 2014 the project we selected for extension was proposed Boost.Hana by student Louis Dionne mentored by Joel Falcou. If these extensions prove successful, we may continue to extend the really outstanding GSoC projects in 2015. Note that the Boost Steering Committee can only consider extensions where the mentor is more than happy to continue mentoring the project in an extension, and where the GSoC administrators and the other GSoC mentors and Boost Steering Committee members all unanimously agree on the merit of the extension. To reach such a standard is exceptionally hard, and students interested may wish to review Hana's source code to see what is demanded for a Boost funded GSoC extension.

How we are implementing student programming aptitude testing in 2015

The second new change is in how projects will be ranked by the community. Up until now, the community ranked student proposals by awarding a score to their written proposal, and we recommended to Google to select all those proposals which matched a single mentor to a single student based on the top of that score ranking. Unfortunately, this process has led to some disappointments for both student and mentor especially where the student was far better at writing proposal texts than writing code, and we also worry that the old system discriminates against non-native English speakers. As of this year, we therefore request a new part to student proposal ideas whether proposed by student or mentor, that being of an exam or test the students interested in that proposal can take to demonstrate their programming capability. In 2015, student proposals with accompanying ability test scores for that candidate will be preferentially ranked above proposals without test scores. If the suggested proposal from this ideas page or from preceding years written by a mentor lacks a programming test, or the student fails to supply answers to a test, their proposal will not be preferentially ranked no matter its otherwise quality or merit.

This runs the risk of excluding student proposals not suggested by this ideas page - in fact, Louis Dionne's Hana was not suggested here, it came entirely from Louis. We therefore add a second way of getting preferentially ranked: candidates who have written existing C++ libraries of at least 1,000 lines of code (excluding comments and whitespace) and who propose their own project, AND have prearranged a mentor through the Boost developer's mailing list, will also be preferentially ranked.

Let us be very clear here: if you propose your own project idea and haven't prearranged a mentor on the developer's mailing list, your proposal will NOT be preferentially ranked. To prearrange a mentor you almost certainly need to supply lengthy and substantial (> 1,000 lines) examples of C++ libraries you have already written.

The scoring by the community will work as before based on the community voting on proposals, but preferentially ranked candidates and proposals will be entirely consumed before unpreferentially ranked candidates and proposals, even if the candidate's exam results were poor (though note that in this case no mentor may wish to mentor such a student, and we shall have to exclude such a candidate as a result).

To help mentors write a student proposal idea matching the newly requested format, please see the "Concurrent hash tables" proposal as a template.

Requirements

Students must submit a proposal. A template for the proposal can be found here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for Boost discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

Projects from previous years can be found here. There are still a number of interesting projects found in these pages not listed here.

Github's for standalone GSoCs past and present

Since 2013 with Boost's transition to git we have kept a single umbrella org on github for those GSoCs which are fairly self standing. Incremental extensions to existing libraries usually enter that library's main git repo as an experimental branch. Here are those orgs:

Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

INSTRUCTIONS FOR ADDING SUGGESTED PROPOSALS

Please follow the following proposal structure. See the concurrent hash tables proposal below for an example.

No. Title

Potential mentors: names

Background

Explanation of proposal background, history etc with hyperlinks.

GSoC project proposal

Bullet points of items to complete for the GSoC.

Potential project extension funded by Boost

Not as much detail as the GSoC proposal itself, but enough detail that such an extension could be specced out.

If there is no good extension, simply say N/A here.

Programming competency test

Some code for them to write to prove they can program sufficiently. Ideally it should be a small part of the proposal itself modifying real Boost library code, but there is nothing wrong with an actual exam paper, except that maybe they might cheat if you give them too much time and flexibility in when to do it, so if setting an exam question make sure you say here it will be written and set to interested students on some date after GSoC applications close and they'll get X days to finish it.

How they submit their results is up to you. They can append them to their project proposal, or provide a gist, or email them to you for marking, or whatever.

If there is no programming competency test, please add your proposal to the second section below the special section for proposals without competency testing.

Self-contained standalone GSoC Projects with programming competency test

The following projects and programming competency test have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal and that your proposal will be preferentially ranked (see above). A template for the proposal can be found here.

1. Concurrent Hash Tables (boost::concurrent_unordered_[multi]set|[multi]map)

Potential mentors: Niall Douglas

Background

Strangely enough, Boost does not provide an implementation of thread safe concurrent associative sets and maps despite that the unordered implementation from Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library was proposed some years ago for standardisation in N3425 Concurrent Unordered Associative Containers for C++. This is probably because that implementation (the split ordered list algorithm by Shalev and Shavit, 2006) had the serious shortcoming of not being thread safe for some operations, most importantly item deletion, and the workarounds proposed by N3425 did not appeal. Moreover, there are other concurrent hash table algorithms which do not have such shortcomings.

It is long overdue for Boost to gain a thread safe concurrent associative containers implementation, and to that end you will find a concurrent_unordered_map prototype here with its doxygen documentation here. This particular design provides a complete STL compliant interface with every operation being thread safe and most operations permitting concurrency, but sacrifices the constant time performance of empty() and size() to achieve this, plus rehash() is extremely slow. Moreover, its find() is only completely wait free if compiled with a transactional memory C++ compiler.

GSoC project proposal

  1. To refactor the implementation of the concurrent_unordered_map prototype into a generic, reusable layer from which the following containers can be implemented:
    1. boost::concurrent_unordered_set
    2. boost::concurrent_unordered_multiset
    3. boost::concurrent_unordered_map
    4. boost::concurrent_unordered_multimap
  1. To complete the following missing functionality:
    1. Const iterators
    2. Const member functions
    3. Local iterators
    4. Copy and move construction plus copy and move assignment
    5. Better transactional memory C++ compiler support.
  1. To write unit, functional and performance regression testing for all of the above, including exception safety testing and multi threaded testing, to a 100% code coverage level.
  1. To write documentation to Boost quality levels for the new container classes, including time and space complexity guarantees and exception guarantees for each API and each use of each API.

Potential project extension funded by Boost

To implement ordered and/or nearly-ordered concurrent associative containers in addition to unordered ones.

Programming competency test

Implement item 2d (copy and move construction and copy and move assignment) for the concurrent_unordered_map prototype, and more importantly complete and thorough unit and functional and performance testing for the new functionality (add the tests to this file). You will find studying the rehash() implementation ought to make the implementation of this easy, but the true test of your competency will be in how well you design and write the testing for the new functionality.

Submission of the programming test should be via copying and pasting the parts you wrote into the end of the proposal you submit to Google Melange. Do NOT copy and paste the entire class or whole files!

2. Asynchronous file i/o backend for Linux, FreeBSD, Mac OS X or WinRT

Potential mentors: Niall Douglas

Background

A library called Boost.AFIO implementing portable asynchronous file i/o was ported to Boost during GSoC 2013 (documentation, github). It provides a fully asynchronous backend for the NT kernel directly using the NT kernel API, plus a generic POSIX backend which generates concurrency in the filing system using a thread pool of workers. It does not yet provide native asynchronous backends for Linux (KAIO), FreeBSD (POSIX AIO + kqueues), Mac OS X (unsure if this is technically feasible) nor WinRT.

GSoC project proposal

  1. For a Linux backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using a Linux KAIO API. You can find more information about the Linux KAIO API here. Note that the Linux KAIO API is only partially asynchronous, and therefore will need to still be combined with a thread pool - therefore some elements of afio::async_file_io_dispatcher_compat ought to be reused, perhaps even subclassed.

For a FreeBSD backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using a combination of the POSIX AIO API and the kqueue EVFILT_AIO event watcher via which completions of asynchronous file operations are notified back to AFIO. As the POSIX AIO API only covers reading and writing, it will almost certainly subclass afio::async_file_io_dispatcher_compat.

For a Mac OS X backend: OS X lacks the kqueue EVFILT_AIO interface, and therefore requires POSIX AIO clients to be notified of completions via signals. This probably makes a native backend for OS X technically infeasible, but feel free to submit a proposal proving me wrong!

For a WinRT backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using the WinRT API. WinRT provides a rich suite of fully asynchronous file i/o APIs (see Windows.Storage), and when combined with the experimental C++ 17 resumable functions support in Visual Studio 2015 it should make the implementation of this backend quite straightforward.

  1. To implement one of the above designs.
  1. To write unit, functional and performance regression testing for all of the above, including exception safety testing and multi threaded testing, to a 100% code coverage level (excluding fatal abort handling code).
  1. To write documentation to Boost quality levels for the new backend, including benchmarking the new backend against the compatibility backend under cold, warm and semi-warm file cache conditions on magnetic and solid state storage devices.

Potential project extension funded by Boost

  1. Fast, scalable portable directory contents change monitoring.
  1. Extended attributes support.
  1. ACL support.
  1. A Filesystem TS and iostreams wrapper for AFIO.

Programming competency test

Using Boost.ASIO (not AFIO!), write a program which registers a Linux KAIO, FreeBSD POSIX AIO + kqueues or WinRT asynchronous file i/o operation with ASIO (either a file read or a file write). ASIO should invoke a handler which prints to stdout "Hello file i/o completion" when the asynchronous file i/o operation completes.

Self-contained standalone GSoC Projects (no competency test)

The following projects have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal. You should only choose one of these proposals if you already have at least 1,000 lines of C++ library code out there we can examine for your programming competency, otherwise your proposal cannot be preferentially ranked (see above). Make SURE you provide a link to your existing C++ code base in the proposal you submit to Melange. A template for the proposal can be found here.

1. Boost.odeint Implicit Routines

Potential mentors: Karsten Ahnert and Mario Mulansky

Background

Boost.odeint http://www.odeint.com is a library devoted to finding numerical solutions of ordinary differential equations (ODEs). It is developed in a generic way using Template Metaprogramming which leads to extraordinary high flexibility at top performance. It is widely used in academic and industrial projects. Boost.odeint has been a Google Summer of Code project in 2011 http://google-melange.appspot.com/gsoc/project/google/gsoc2011/mulansky/14001 and we would like again to see students involved in the development of this numerical library.

The current focus of odeint is on explicit routines, namely the Runge-Kutta schemes or the multi-step methods. However, we would like to expand odeint by adding implicit routines in the same flexible manner. Implicit routines are important when dealing with stiff systems, but also ensure stability in some cases of discretized partial differential equations (PDEs). At the moment, odeint provides an implicit Euler and Rosenbrock algorithm, but the implementation is restricted to Boost.uBlas. As one of the main objectives of odeint is to provide highly flexible algorithms, we want to change this implementation to have a greater flexibility and interchangeability similar to that of the explicit routines. This project would not only require pure coding, but first also a considerable amount of design work.

GSoC project proposal

This project does not only require profound knowledge on C++ and generic programming, but we would also like to see some experience on numerical algorithms and solving ODEs, if possible.

  • Develop a design on how the requirements on implicit routines can be modularized
  • Implement a prototype showing the usability of the design
  • Change the existing routines to the new design
  • Implement other backends (MTL, eigen,...) to demonstrate the flexibility
  • Provide examples and documentation
  • (Add more implicit routines, if time permits)

Potential project extension funded by Boost

N/A

Ideas

If after reading all of the standalone project ideas above you are finding that none excites you, we have collected below a series of smaller work items none of which alone would be big enough to make up a full GSoC, but which could be combined to form a very valuable GSoC indeed. The following would be excellent start points for students to propose projects themselves (see below on how to do this). While these ideas are by far not as concrete as projects in the list above, they give students an idea what else they can work on within Boost. They also give students a chance to propose exactly the kind of project they'd like to work on instead of picking one from the predefined projects from above.

Students interested in proposing a mix of the below work items should propose their mix on the Boost mailing list using the instructions below. We will then figure out a suitable mentor for you.

How to propose a mix of the below work items on the Boost mailing list

Please do not simply arrive on the list and ask "I want to do (insert some random project idea here) because I don't want to do one of the predesigned GSoC projects on the list. Give me a mentor." as you will likely be ignored as a probable time waster. If you are to propose your own GSoC project, it must be of equivalent quality to the suggested GSoC project proposals written by mentors already on the ideas page, and by "equivalent", we really do mean equivalent i.e. very high, and probably including a programming competency test of your own design. This is why mentors take the time to write project proposals for you, because for mentors who are domain experts in their field it is easier to write a high quality GSoC proposal likely to pass peer review and Google's own proposal review than for students who are not domain experts. If you are still really sure you want to invest the very considerable work to propose your own project, do the following before writing to the Boost mailing list with your own proposal:

  1. Research prior art i.e. provide proof in your approach email that you have researched alternative implementations in C++ and other languages of your project proposal. A list of pros and cons of those alternative implementations would be useful and show you really did study the problem.
  1. Have a well designed, concrete, realistic as a summer project design based on synthesis of the prior art research. It's great to list known unknowns as far as possible. When designing, don't forget well over half the effort in writing a new Boost library goes exclusively into writing documentation and writing unit and functional testing. This is a big reason we need students to have good English, because a lot of writing Boost code is actually writing its documentation.
  1. Ideally students who provide proof they have successfully written a high quality STL algorithm implementation before i.e. can supply a link to an open source copy of their work are far more likely to be well received than those students with no such proof of capability and depth of understanding just how hard writing high quality C++ code is. Be aware we don't really care what grades you received in some CS class - we want to see code you've written.
  1. Failing item three, any evidence or proof demonstrating a good work ethic, and are humble and willing to learn, is always helpful. Good examples of such evidence are code commits fixing bugs in open source projects, or a history of improving documentation for open source projects, or a history of porting code to new platforms or fixing platform specific build problems - all these are excellent pieces of proof because we can verify their accuracy and look at the quality of your work.

If you can provide at least two of the above four points, your custom GSoC proposal is highly likely to be taken seriously, and you'll get plenty of feedback on what needs fixing, plus mentors will volunteer themselves. Miss all of the above factors and no one - student or otherwise - will be taken seriously, and any emails by you to the Boost developers list will probably be ignored.

With all that said, here are the smaller work items which could be combined to form a very valuable GSoC project:

Boost libraries (extending/overhauling/finishing)

Algorithms

  • Radix sort
  • Approximate string matching
  • Full text search
  • Near Duplicate Detection (shingling)
  • Parallel algorithms (sort, for_each)
  • Algorithms for gpgpu
  • Kinetic scrolling

Data strutures

  • Trie data structure, extending the work done during GSoC 2013.
  • B-tree data structure, extending the work previously done by Beman Dawes.
  • Concurrent containers (unordered_map, unordered_set, vector, forward_list)
  • Slim string (as opposed to the fat std::string interface, maybe immutable, maybe policy based)

Media

  • Hardware graphics (OpenGL/OpenGL ES/DirectX abstraction)
  • Audio library (OpenAL/FMOD/etc)
  • Video processing library (gil for video)

Games

  • Physics library (ODE/Havok/PhysX/etc abstraction)
  • Input library (like DirectInput/XInput)
  • Ranking algorithms (elo/TrueSkill)

Databases

Math

  • Geometry library (convince the developers to submit Eigen 3 to Boost)
  • Investigate exporting multiple-precision types to Python via Boost.Python and interoperability with "symbolic python". See: http://sympy.org/en/index.html

Memory

  • Memcache library
  • Memswap algorithm

File formats

  • JSON parsing libary
  • XML library

Communication

GUI

  • GUI library

C++ "extensions"

Hardware

  • CPUID
Note: See TracWiki for help on using the wiki.