wiki:SoC2015

Version 16 (modified by Niall Douglas, 8 years ago) ( diff )

Added the prearranged projects

  1. Boost Google Summer of Code 2015
      1. Changes in process for both mentors and students over preceding years …
        1. How we are implementing student programming aptitude testing in 2015
    1. Requirements
      1. Github's for standalone GSoCs past and present
  2. Suggested GSoC project proposals
    1. Self-contained standalone GSoC Projects with programming competency test
      1. 1. Concurrent Hash Tables …
        1. Background
        2. GSoC project proposal
        3. Potential project extension funded by Boost
        4. Programming competency test
      2. 2. Asynchronous file i/o backend for Linux, FreeBSD, Mac OS X or WinRT
        1. Background
        2. GSoC project proposal
        3. Potential project extension funded by Boost
        4. Programming competency test
      3. 3. Boost Documant Library development
        1. Background
        2. GSoC project proposal
        3. Links
      4. 4. Boost.uBLAS a library for matrix computations
        1. Background
        2. GSoC project proposal
        3. Programming competency test
    2. Self-contained standalone GSoC Projects (no competency test)
      1. 1. Boost.odeint Implicit Routines
        1. Background
        2. GSoC project proposal
        3. Potential project extension funded by Boost
  3. GSoC Projects which have been prearranged by students with mentors for 2015
    1. 1. Safe Float (boost::safe_numerics<float>) (Damian Vicino mentored by …
      1. Background
      2. GSoC project proposal
    2. 2. Improve associative data structures in Hana (Louis Dionne mentored …
      1. Background
      2. GSoC project proposal
  4. Mentors who may be available for GSoC in 2015
    1. 1. Kyle Lutz, Boost.Compute
  5. Ideas
      1. How to propose a mix of the below work items on the Boost mailing list
    1. Boost libraries (extending/overhauling/finishing)
    2. Algorithms
    3. Data strutures
    4. Media
    5. Games
    6. Databases
    7. Math
    8. Memory
    9. File formats
    10. Communication
    11. GUI
    12. C++ "extensions"
    13. Hardware

Boost Google Summer of Code 2015

Welcome to the Boost C++ Libraries' home page for Google Summer of Code (GSoC) 2015. This page provides information about suggested student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with Boost.

Changes in process for both mentors and students over preceding years (PLEASE READ)

This year, 2015, is going to be slightly different from preceding years of how Boost has operated GSoC. The first new change was that last year we trialled extending the really superb GSoC 2014 projects by an additional three month period funded by Boost, and in 2014 the project we selected for extension was proposed Boost.Hana by student Louis Dionne mentored by Joel Falcou. If these extensions prove successful, we may continue to extend the really outstanding GSoC projects in 2015. Note that the Boost Steering Committee can only consider extensions where the mentor is more than happy to continue mentoring the project in an extension, and where the GSoC administrators and the other GSoC mentors and Boost Steering Committee members all unanimously agree on the merit of the extension. To reach such a standard is exceptionally hard, and students interested may wish to review Hana's source code to see what is demanded for a Boost funded GSoC extension.

How we are implementing student programming aptitude testing in 2015

The second new change is in how projects will be ranked by the community. Up until now, the community ranked student proposals by awarding a score to their written proposal, and we recommended to Google to select all those proposals which matched a single mentor to a single student based on the top of that score ranking. Unfortunately, this process has led to some disappointments for both student and mentor especially where the student was far better at writing proposal texts than writing code, and we also worry that the old system discriminates against non-native English speakers. As of this year, we therefore request a new part to student proposal ideas whether proposed by student or mentor, that being of an exam or test the students interested in that proposal can take to demonstrate their programming capability. In 2015, student proposals with accompanying ability test scores for that candidate will be preferentially ranked above proposals without test scores. If the suggested proposal from this ideas page or from preceding years written by a mentor lacks a programming test, or the student fails to supply answers to a test, their proposal will not be preferentially ranked no matter its otherwise quality or merit.

This runs the risk of excluding student proposals not suggested by this ideas page - in fact, Louis Dionne's Hana was not suggested here, it came entirely from Louis. We therefore add a second way of getting preferentially ranked: candidates who have written existing C++ libraries of at least 1,000 lines of code (excluding comments and whitespace) and who propose their own project, AND have prearranged a mentor through the Boost developer's mailing list, will also be preferentially ranked.

Let us be very clear here: if you propose your own project idea and haven't prearranged a mentor on the developer's mailing list, your proposal will NOT be preferentially ranked. To prearrange a mentor you almost certainly need to supply lengthy and substantial (> 1,000 lines) examples of C++ libraries you have already written.

The scoring by the community will work as before based on the community voting on proposals, but preferentially ranked candidates and proposals will be entirely consumed before unpreferentially ranked candidates and proposals, even if the candidate's exam results were poor (though note that in this case no mentor may wish to mentor such a student, and we shall have to exclude such a candidate as a result).

To help mentors write a student proposal idea matching the newly requested format, please see the "Concurrent hash tables" proposal as a template.

Requirements

Students must submit a proposal. A template for the proposal can be found here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for Boost discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

Projects from previous years can be found here. There are still a number of interesting projects found in these pages not listed here.

Github's for standalone GSoCs past and present

Since 2013 with Boost's transition to git we have kept a single umbrella org on github for those GSoCs which are fairly self standing. Incremental extensions to existing libraries usually enter that library's main git repo as an experimental branch. Here are those orgs:

Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

Self-contained standalone GSoC Projects with programming competency test

The following projects and programming competency test have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal and that your proposal will be preferentially ranked (see above). A template for the proposal can be found here.

MENTORS PLEASE FOLLOW THE GSOC IDEA TEMPLATE HERE WHEN ADDING IDEAS.

1. Concurrent Hash Tables (boost::concurrent_unordered_[multi]set|[multi]map)

Potential mentors: Niall Douglas

Background

Strangely enough, Boost does not provide an implementation of thread safe concurrent associative sets and maps despite that the unordered implementation from Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library was proposed some years ago for standardisation in N3425 Concurrent Unordered Associative Containers for C++. This is probably because that implementation (the split ordered list algorithm by Shalev and Shavit, 2006) had the serious shortcoming of not being thread safe for some operations, most importantly item deletion, and the workarounds proposed by N3425 did not appeal. Moreover, there are other concurrent hash table algorithms which do not have such shortcomings.

It is long overdue for Boost to gain a thread safe concurrent associative containers implementation, and to that end you will find a concurrent_unordered_map prototype here with its doxygen documentation here. This particular design provides a complete STL compliant interface with every operation being thread safe and most operations permitting concurrency, but sacrifices the constant time performance of empty() and size() to achieve this, plus rehash() is extremely slow. Moreover, its find() is only completely wait free if compiled with a transactional memory C++ compiler.

GSoC project proposal

  1. To refactor the implementation of the concurrent_unordered_map prototype into a generic, reusable layer from which the following containers can be implemented:
    1. boost::concurrent_unordered_set
    2. boost::concurrent_unordered_multiset
    3. boost::concurrent_unordered_map
    4. boost::concurrent_unordered_multimap
  1. To complete the following missing functionality:
    1. Const iterators
    2. Const member functions
    3. Local iterators
    4. Copy and move construction plus copy and move assignment
    5. Better transactional memory C++ compiler support.
  1. To write unit, functional and performance regression testing for all of the above, including exception safety testing and multi threaded testing, to a 100% code coverage level.
  1. To write documentation to Boost quality levels for the new container classes, including time and space complexity guarantees and exception guarantees for each API and each use of each API.

Potential project extension funded by Boost

To implement ordered and/or nearly-ordered concurrent associative containers in addition to unordered ones.

Programming competency test

Implement item 2d (copy and move construction and copy and move assignment) for the concurrent_unordered_map prototype, and more importantly complete and thorough unit and functional and performance testing for the new functionality (add the tests to this file). You will find studying the rehash() implementation ought to make the implementation of this easy, but the true test of your competency will be in how well you design and write the testing for the new functionality.

Submission of the programming test should be via copying and pasting the parts you wrote into the end of the proposal you submit to Google Melange. Do NOT copy and paste the entire class or whole files!

2. Asynchronous file i/o backend for Linux, FreeBSD, Mac OS X or WinRT

Potential mentors: Niall Douglas

Background

A library called Boost.AFIO implementing portable asynchronous file i/o was ported to Boost during GSoC 2013 (documentation, github). It provides a fully asynchronous backend for the NT kernel directly using the NT kernel API, plus a generic POSIX backend which generates concurrency in the filing system using a thread pool of workers. It does not yet provide native asynchronous backends for Linux (KAIO), FreeBSD (POSIX AIO + kqueues), Mac OS X (unsure if this is technically feasible) nor WinRT.

GSoC project proposal

  1. For a Linux backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using a Linux KAIO API. You can find more information about the Linux KAIO API here. Note that the Linux KAIO API is only partially asynchronous, and therefore will need to still be combined with a thread pool - therefore some elements of afio::async_file_io_dispatcher_compat ought to be reused, perhaps even subclassed.

For a FreeBSD backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using a combination of the POSIX AIO API and the kqueue EVFILT_AIO event watcher via which completions of asynchronous file operations are notified back to AFIO. As the POSIX AIO API only covers reading and writing, it will almost certainly subclass afio::async_file_io_dispatcher_compat.

For a Mac OS X backend: OS X lacks the kqueue EVFILT_AIO interface, and therefore requires POSIX AIO clients to be notified of completions via signals. This probably makes a native backend for OS X technically infeasible, but feel free to submit a proposal proving me wrong!

For a WinRT backend: To draw up a design document describing how the afio::async_file_io_dispatcher_base interface should be implemented using the WinRT API. WinRT provides a rich suite of fully asynchronous file i/o APIs (see Windows.Storage), and when combined with the experimental C++ 17 resumable functions support in Visual Studio 2015 it should make the implementation of this backend quite straightforward.

  1. To implement one of the above designs.
  1. To write unit, functional and performance regression testing for all of the above, including exception safety testing and multi threaded testing, to a 100% code coverage level (excluding fatal abort handling code).
  1. To write documentation to Boost quality levels for the new backend, including benchmarking the new backend against the compatibility backend under cold, warm and semi-warm file cache conditions on magnetic and solid state storage devices.

Potential project extension funded by Boost

  1. Fast, scalable portable directory contents change monitoring.
  1. Extended attributes support.
  1. ACL support.
  1. A Filesystem TS and iostreams wrapper for AFIO.

Programming competency test

Using Boost.ASIO (not AFIO!), write a program which registers a Linux KAIO, FreeBSD POSIX AIO + kqueues or WinRT asynchronous file i/o operation with ASIO (either a file read or a file write). ASIO should invoke a handler which prints to stdout "Hello file i/o completion" when the asynchronous file i/o operation completes.

Submission of the programming test should be via copying and pasting what you wrote into the end of the proposal you submit to Google Melange.

3. Boost Documant Library development

Potential mentors: Antony Polukhin

Background

Document Library is meant for integration with spreadsheet applications Excel/LibreOfficeCalc/OpenOfficeCalc and other office programs. Main idea is to unify APIs of different Office suits and provide a clear header only library that is capable of doing simple tasks with office documents (creation, pdf exporting, file format changes, data extraction and cells manipulations). Such library is essential for banking software, CRMs and many other programs.

GSoC project proposal

Start development of a new Boost library called Boost.Document Library.

This project requires a good C++ knowledge, generic programming. Experience with Office's API would be an advantage.

Implement a generic wrapper around one of the existing Office suits. Wrapper must have the following methods:

struct format {
    enum type {
        PDF,
    };
};

void open(const filesystem::path& path); // opens a document
void export(const filesystem::path& filename, format::type format = format::PDF); // exports a document in specified format

export function must be able to export in PDF format at least. Other formats support is welcomed. Prototypes that are able to work with different Office suits will be highly appreciated.

Prototype must be submitted with unit tests and build notes.

Submission of the programming test should be via copying and pasting the parts you wrote into the end of the proposal you submit to Google Melange.

4. Boost.uBLAS a library for matrix computations

Potential mentors: David Bellot

Background

uBLAS is a library for linear algebra and matrix computations. Using recursive templates, it allows the compiler to optimize any complex linear algebra expressions as if it were written by hand by the programmer. Basic classes are matrix and vector. The library has all the basic functionalities and a few standard algorithms. However, it lacks more advanced algoritms like solvers and algoritms for decomposition. We have an old LU decomposition and that's it. We would like to improve the functionality of this library by adding more numerical solvers, especially for big problems in Big Data and algorithms for matrix decomposition like QR, Cholesky and also eigensolvers to do Schur or Jordan decomposition. All those algorithms are very important in Machine Learning nowadays and the base of many learning algorithms. Another topic, that is of interest for Big Data and Machine Learning is doing statistics on matrices. Languages like R or Python (with Pandas) uses the notion of Data Frame and have many aggregation or grouping algorithm to generate all sorts of statistics on huge matrices. As it became a very important topic we would like to have similar functions in uBLAS. For example you can see libraries like Pandas (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) or a very powerful R package name data.table (http://cran.r-project.org/web/packages/data.table/index.html). Having similar functionalities in ublas would be a must !

GSoC project proposal

  1. For the solvers project:
  • study the main solving and decomposition algorithms,
  • study how to integrate an algorithm in uBLAS
  • implement the algorithm
  • write extensive test for not only testing the correctness of the algorithms but also the speed and the numerical stability. The last point is presumably one of the most important
  1. For the data frame project:
  • well... simply replication R data.frame and that's great !
  • ideally, implement the aggregate and summary function for data.frame and matrix like it is in R
  • add rows and columns statistics like it is in Matlab

Programming competency test

Implement a Gaussian elimination as described in http://en.wikipedia.org/wiki/Gaussian_elimination in uBLAS. Write a code to test the numerical stability of the algorithm. The competency test will be valid if 1- the algorithm is correctly implemented (of course) 2- integrated into uBLAS 3- has a testing code

Submission of the programming test should be via copying and pasting what you wrote into the end of the proposal you submit to Google Melange.

Self-contained standalone GSoC Projects (no competency test)

The following projects have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal. You should only choose one of these proposals if you already have at least 1,000 lines of C++ library code out there we can examine for your programming competency, otherwise your proposal cannot be preferentially ranked (see above). Make SURE you provide a link to your existing C++ code base in the proposal you submit to Melange. A template for the proposal can be found here.

MENTORS PLEASE FOLLOW THE GSOC IDEA TEMPLATE HERE WHEN ADDING IDEAS.

1. Boost.odeint Implicit Routines

Potential mentors: Karsten Ahnert and Mario Mulansky

Background

Boost.odeint http://www.odeint.com is a library devoted to finding numerical solutions of ordinary differential equations (ODEs). It is developed in a generic way using Template Metaprogramming which leads to extraordinary high flexibility at top performance. It is widely used in academic and industrial projects. Boost.odeint has been a Google Summer of Code project in 2011 http://google-melange.appspot.com/gsoc/project/google/gsoc2011/mulansky/14001 and we would like again to see students involved in the development of this numerical library.

The current focus of odeint is on explicit routines, namely the Runge-Kutta schemes or the multi-step methods. However, we would like to expand odeint by adding implicit routines in the same flexible manner. Implicit routines are important when dealing with stiff systems, but also ensure stability in some cases of discretized partial differential equations (PDEs). At the moment, odeint provides an implicit Euler and Rosenbrock algorithm, but the implementation is restricted to Boost.uBlas. As one of the main objectives of odeint is to provide highly flexible algorithms, we want to change this implementation to have a greater flexibility and interchangeability similar to that of the explicit routines. This project would not only require pure coding, but first also a considerable amount of design work.

GSoC project proposal

This project does not only require profound knowledge on C++ and generic programming, but we would also like to see some experience on numerical algorithms and solving ODEs, if possible.

  • Develop a design on how the requirements on implicit routines can be modularized
  • Implement a prototype showing the usability of the design
  • Change the existing routines to the new design
  • Implement other backends (MTL, eigen,...) to demonstrate the flexibility
  • Provide examples and documentation
  • (Add more implicit routines, if time permits)

Potential project extension funded by Boost

N/A

GSoC Projects which have been prearranged by students with mentors for 2015

The following students successfully sought out mentors and prearranged GSoC project proposals. They have also proven their programming competency by supplying C++ library code they have written for inspection.

1. Safe Float (boost::safe_numerics<float>) (Damian Vicino mentored by Robert Ramey)

Background

Arithmetic operations in C++ are NOT guaranteed to yield a correct mathematical result. This feature is inherited from the early days of C. The behavior of int, unsigned int and others were designed to map closely to the underlying hardware. Computer hardware implements these types as a fixed number of bits. When the result of arithmetic operations exceeds this number of bits, the result will not be arithmetically correct. In the past 2 years, the library safe_numerics was developed in Boost Incubator with a focus in safe_integer operations. Now, we are interesting in pushing the project forward to include the floating point arithmetics, an important milestone for the project.

GSoC project proposal

To design a drop-in replacements for float and double (safe<float> and safe<double> which guarantee that, when used in expressions in a C++ program, no incorrect arithmetic results will produced. To implement the safe_numerics<float> and safe_numerics<double>. To write unit and functional regression testing for all of the above to a 100% code coverage level. To write documentation to Boost quality levels for the new container classes, including time and space complexity guarantees and exception guarantees for each API and each use of each API.

2. Improve associative data structures in Hana (Louis Dionne mentored by Joel Falcou)

Background

Hana[1] is a library providing data structures and algorithms working on heterogeneous objects. It is basically a standard library where you would replace std::vector by std::tuple (which makes it quite different to the standard library). The project was started by Louis Dionne as part of GSoC 2014.

There is still room for improvement in many parts of the library. Some of the most lacking parts are the associative data structures, Set and Map, which provide a limited interface and are implemented using tuples and linear searching internally, which is inefficient. It would be great to have a new implementation for those data structures, and also to explore how the interface of these structures could be improved.

GSoC project proposal

This project requires good knowledge of:

  • Prior art in C++ template metaprogramming (MPL, Fusion, MPL11 and many other smaller libraries) - C++11/14 template metaprogramming implementation techniques - Functional programming; Hana is functional in nature and most concepts used in the library come from there. To design a new interface for the associative data structures, you must be able to navigate through this.

The project is easy to specify:

  • Find what functionality is missing in the associative data structures - Find a way to incorporate the new functionality in the current concepts, or modify those concepts to improve their generality and consistency. - Explore possible implementation techniques for Maps and Sets. Benchmark the compile-time (and runtime) performance.
  • Re-implement those data structures, with unit tests, examples and documentation.

Mentors who may be available for GSoC in 2015

1. Kyle Lutz, Boost.Compute

Ideas

If after reading all of the standalone project ideas above you are finding that none excites you, we have collected below a series of smaller work items none of which alone would be big enough to make up a full GSoC, but which could be combined to form a very valuable GSoC indeed. The following would be excellent start points for students to propose projects themselves (see below on how to do this). While these ideas are by far not as concrete as projects in the list above, they give students an idea what else they can work on within Boost. They also give students a chance to propose exactly the kind of project they'd like to work on instead of picking one from the predefined projects from above.

Students interested in proposing a mix of the below work items should propose their mix on the Boost mailing list using the instructions below. We will then figure out a suitable mentor for you.

How to propose a mix of the below work items on the Boost mailing list

Please do not simply arrive on the list and ask "I want to do (insert some random project idea here) because I don't want to do one of the predesigned GSoC projects on the list. Give me a mentor." as you will likely be ignored as a probable time waster. If you are to propose your own GSoC project, it must be of equivalent quality to the suggested GSoC project proposals written by mentors already on the ideas page, and by "equivalent", we really do mean equivalent i.e. very high, and probably including a programming competency test of your own design. This is why mentors take the time to write project proposals for you, because for mentors who are domain experts in their field it is easier to write a high quality GSoC proposal likely to pass peer review and Google's own proposal review than for students who are not domain experts. If you are still really sure you want to invest the very considerable work to propose your own project, do the following before writing to the Boost mailing list with your own proposal:

  1. Research prior art i.e. provide proof in your approach email that you have researched alternative implementations in C++ and other languages of your project proposal. A list of pros and cons of those alternative implementations would be useful and show you really did study the problem.
  1. Have a well designed, concrete, realistic as a summer project design based on synthesis of the prior art research. It's great to list known unknowns as far as possible. When designing, don't forget well over half the effort in writing a new Boost library goes exclusively into writing documentation and writing unit and functional testing. This is a big reason we need students to have good English, because a lot of writing Boost code is actually writing its documentation.
  1. Ideally students who provide proof they have successfully written a high quality STL algorithm implementation before i.e. can supply a link to an open source copy of their work are far more likely to be well received than those students with no such proof of capability and depth of understanding just how hard writing high quality C++ code is. Be aware we don't really care what grades you received in some CS class - we want to see code you've written.
  1. Failing item three, any evidence or proof demonstrating a good work ethic, and are humble and willing to learn, is always helpful. Good examples of such evidence are code commits fixing bugs in open source projects, or a history of improving documentation for open source projects, or a history of porting code to new platforms or fixing platform specific build problems - all these are excellent pieces of proof because we can verify their accuracy and look at the quality of your work.

If you can provide at least two of the above four points, your custom GSoC proposal is highly likely to be taken seriously, and you'll get plenty of feedback on what needs fixing, plus mentors will volunteer themselves. Miss all of the above factors and no one - student or otherwise - will be taken seriously, and any emails by you to the Boost developers list will probably be ignored.

With all that said, here are the smaller work items which could be combined to form a very valuable GSoC project:

Boost libraries (extending/overhauling/finishing)

Algorithms

  • Radix sort
  • Approximate string matching
  • Full text search
  • Near Duplicate Detection (shingling)
  • Parallel algorithms (sort, for_each)
  • Algorithms for gpgpu
  • Kinetic scrolling

Data strutures

  • Trie data structure, extending the work done during GSoC 2013.
  • B-tree data structure, extending the work previously done by Beman Dawes.
  • Concurrent containers (unordered_map, unordered_set, vector, forward_list)
  • Slim string (as opposed to the fat std::string interface, maybe immutable, maybe policy based)

Media

  • Hardware graphics (OpenGL/OpenGL ES/DirectX abstraction)
  • Audio library (OpenAL/FMOD/etc)
  • Video processing library (gil for video)

Games

  • Physics library (ODE/Havok/PhysX/etc abstraction)
  • Input library (like DirectInput/XInput)
  • Ranking algorithms (elo/TrueSkill)

Databases

Math

  • Geometry library (convince the developers to submit Eigen 3 to Boost)
  • Investigate exporting multiple-precision types to Python via Boost.Python and interoperability with "symbolic python". See: http://sympy.org/en/index.html

Memory

  • Memcache library
  • Memswap algorithm

File formats

  • JSON parsing libary
  • XML library

Communication

GUI

  • GUI library

C++ "extensions"

Hardware

  • CPUID
Note: See TracWiki for help on using the wiki.