wiki:SoC2015

Version 3 (modified by Niall Douglas, 8 years ago) ( diff )

Added concurrent unordered map detail

Google Summer of Code 2015 (CURRENTLY BEING WRITTEN NOTHING HERE IS FINAL!)

Welcome to the Boost C++ Libraries' home page for Google Summer of Code (GSoC) 2015. This page provides information about suggested student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with Boost.

Changes in process for both mentors and students over preceding years

This year, 2015, is going to be slightly different from preceding years of how Boost has operated GSoC. The first new change was that last year we trialled extending the really superb GSoC 2014 projects by an additional three month period funded by Boost, and in 2014 the project we selected for extension was proposed Boost.Hana by student Louis Dionne mentored by Joel Falcou. If these extensions prove successful, we may continue to extend the really outstanding GSoC projects in 2015. Note that the Boost Steering Committee can only consider extensions where the mentor is more than happy to continue mentoring the project in an extension, and where the GSoC administrators and the other GSoC mentors and Boost Steering Committee members all unanimously agree on the merit of the extension. To reach such a standard is exceptionally hard, and students interested may wish to review Hana's source code to see what is demanded for a Boost funded GSoC extension.

The second new change is in how projects will be ranked by the community. Up until now, the community ranked student proposals by awarding a score, and we recommended to Google to select all those proposals which matched a single mentor to a single student based on the top of the score ranking. Unfortunately, this process has led to some disappointments for both student and mentor especially where the student was far better at writing proposals than writing code, and we also worry that the old system discriminates against non-native English speakers. As of this year, we therefore request a new part to student proposal ideas, that being of an exam or test the students interested in that proposal can take to demonstrate their programming capability. In 2015, student proposals with accompanying ability test scores for that candidate will be preferentially ranked above proposals without test scores.

This runs the risk of excluding student proposals not suggested by this ideas page - in fact, Louis Dionne's Hana was not suggested here, it came entirely from Louis. We therefore add a second way of getting preferentially ranked: candidates who have written existing C++ libraries of at least 1,000 lines of code (excluding comments and whitespace) and who propose their own project, AND have prearranged a mentor through the Boost developer's mailing list, will also be preferentially ranked.

The scoring by the community will work as before, but preferentially ranked candidates and proposals will be entirely consumed before unpreferentially ranked candidates and proposals, even if the candidate's exam results were poor (though note that in this case no mentor may wish to mentor such a student, and we shall have to exclude such a candidate as a result).

To help mentors write a student proposal idea matching the newly requested format, please see the "Concurrent hash tables" proposal as a template.

Requirements

Students must submit a proposal. A template for the proposal can be found here here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for Boost discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

Projects from previous years can be found here. There are still a number of interesting projects found in these pages not listed here.

Github's for standalone GSoCs past and present

Since 2013 with Boost's transition to git we have kept a single umbrella org on github for those GSoCs which are fairly self standing. Incremental extensions to existing libraries usually enter that library's main git repo as an experimental branch. Here are those orgs:

Students may find examining past GSoC source code and commit histories of use.

Suggested precanned GSoC project proposals

Self-contained standalone GSoC Projects with programming competency test

The following projects and programming competency test have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal and that your proposal will be preferentially ranked (see above).

1. Concurrent hash tables (boost::concurrent_unordered_[multi]set|[multi]map)

Potential mentors: Niall Douglas

Background

Strangely enough, Boost does not provide an implementation of thread safe concurrent associative sets and maps despite that the unordered implementation from Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library was proposed for standardisation in N3425 Concurrent Unordered Associative Containers for C++. This is probably because that implementation (the split ordered list algorithm by Shalev and Shavit, 2006) had the serious shortcoming of not being thread safe for some operations, most importantly item deletion, and the workarounds proposed by N3425 did not appeal. Moreover, there are other concurrent hash table algorithms which do not have such shortcomings.

It is long overdue for Boost to gain a thread safe concurrent associative containers implementation, and to that end you will find a concurrent_unordered_map prototype here with its doxygen documentation here. This particular design provides a complete STL compliant interface with every operation being thread safe and most operations permitting concurrency, but sacrifices the constant time performance of empty() and size() to achieve this. Moreover, its find() is only completely wait free if compiled with a transactional memory compiler.

GSoC project proposal

  1. To refactor the implementation of the concurrent_unordered_map prototype into a generic, reusable layer from which the following containers can be implemented:
    1. boost::concurrent_unordered_set
    2. boost::concurrent_unordered_multiset
    3. boost::concurrent_unordered_map
    4. boost::concurrent_unordered_multimap
  1. To complete the following missing functionality:
    1. Const iterators
    2. Const member functions
    3. Local iterators
    4. Copy and move construction plus copy and move assignment
    5. Better transactional memory C++ compiler support.
  1. To write unit, functional and performance regression testing for all of the above, including exception safety testing and multi threaded testing, to a 100% code coverage level.
  1. To write documentation to Boost quality levels for the new container classes, including time and space complexity guarantees and exception guarantees for each API and each use of each API.

Potential project extension funded by Boost

To implement ordered and/or nearly-ordered concurrent associative containers in addition to unordered ones.

Programming competency test

Implement item 2d (copy and move construction and copy and move assignment) for the concurrent_unordered_map prototype, and more importantly complete and thorough unit and functional and performance testing for the new functionality (add the tests to this file). You will find studying the rehash() implementation ought to make the implementation of this easy, but the true test of your competency will be in how well you design and write the testing for the new functionality.

Submission of the programming test should be via copying and pasting the parts you wrote into the end of the proposal you submit to Google Melange. Do NOT copy and paste the entire class or whole files!

Self-contained standalone GSoC Projects (no competency test)

The following projects have been suggested by potential mentors. Selecting one of these, in consultation with the Boost developers mailing list, provides the highest chance that a mentor can be found for your GSoC project proposal. You should only choose one of these proposals if you already have at least 1,000 lines of C++ library code out there we can examine for your programming competency, otherwise your proposal cannot be preferentially ranked (see above). Make SURE you provide a link to your existing C++ code base in the proposal you submit to Melange.

(Insert proposals here)

Ideas

If after reading all of the standalone project ideas above you are finding that none excites you, we have collected below a series of smaller work items none of which alone would be big enough to make up a full GSoC, but which could be combined to form a very valuable GSoC indeed. The following would be excellent start points for students to propose projects themselves (see below on how to do this). While these ideas are by far not as concrete as projects in the list above, they give students an idea what else they can work on within Boost. They also give students a chance to propose exactly the kind of project they'd like to work on instead of picking one from the predefined projects from above.

Students interested in proposing a mix of the below work items should propose their mix on the Boost mailing list using the instructions below. We will then figure out a suitable mentor for you.

How to propose a mix of the below work items on the Boost mailing list

Please do not simply arrive on the list and ask "I want to do (insert some random project idea here) because I don't want to do one of the predesigned GSoC projects on the list. Give me a mentor." as you will likely be ignored as a probable time waster. If you are to propose your own GSoC project, it must be of equivalent quality to the precanned GSoC project proposals written by mentors already on the ideas page, and by "equivalent", we really do mean equivalent i.e. very high, and probably including a programming competency test of your own design. This is why mentors take the time to write project proposals for you, because for mentors who are domain experts in their field it is easier to write a high quality GSoC proposal likely to pass peer review and Google's own proposal review than for students who are not domain experts. If you are still really sure you want to invest the very considerable work to propose your own project, do the following before writing to the Boost mailing list with your own proposal:

  1. Research prior art i.e. provide proof in your approach email that you have researched alternative implementations in C++ and other languages of your project proposal. A list of pros and cons of those alternative implementations would be useful and show you really did study the problem.
  1. Have a well designed, concrete, realistic as a summer project design based on synthesis of the prior art research. It's great to list known unknowns as far as possible. When designing, don't forget well over half the effort in writing a new Boost library goes exclusively into writing documentation and writing unit and functional testing. This is a big reason we need students to have good English, because a lot of writing Boost code is actually writing its documentation.
  1. Ideally students who provide proof they have successfully written a high quality STL algorithm implementation before i.e. can supply a link to an open source copy of their work are far more likely to be well received than those students with no such proof of capability and depth of understanding just how hard writing high quality C++ code is. Be aware we don't really care what grades you received in some CS class - we want to see code you've written.
  1. Failing item three, any evidence or proof demonstrating a good work ethic, and are humble and willing to learn, is always helpful. Good examples of such evidence are code commits fixing bugs in open source projects, or a history of improving documentation for open source projects, or a history of porting code to new platforms or fixing platform specific build problems - all these are excellent pieces of proof because we can verify their accuracy and look at the quality of your work.

If you can provide at least two of the above four points, your custom GSoC proposal is highly likely to be taken seriously, and you'll get plenty of feedback on what needs fixing, plus mentors will volunteer themselves. Miss all of the above factors and no one - student or otherwise - will be taken seriously, and any emails by you to the Boost developers list will probably be ignored.

With all that said, here are the smaller work items which could be combined to form a very valuable GSoC project:

Boost libraries (extending/overhauling/finishing)

Algorithms

  • Radix sort
  • Approximate string matching
  • Full text search
  • Near Duplicate Detection (shingling)
  • Parallel algorithms (sort, for_each)
  • Algorithms for gpgpu
  • Kinetic scrolling

Data strutures

  • Trie data structure, extending the work done during GSoC 2013.
  • B-tree data structure, extending the work previously done by Beman Dawes.
  • Concurrent containers (unordered_map, unordered_set, vector, forward_list)
  • Slim string (as opposed to the fat std::string interface, maybe immutable, maybe policy based)

Media

  • Hardware graphics (OpenGL/OpenGL ES/DirectX abstraction)
  • Audio library (OpenAL/FMOD/etc)
  • Video processing library (gil for video)

Games

  • Physics library (ODE/Havok/PhysX/etc abstraction)
  • Input library (like DirectInput/XInput)
  • Ranking algorithms (elo/TrueSkill)

Databases

Math

  • Geometry library (convince the developers to submit Eigen 3 to Boost)
  • Investigate exporting multiple-precision types to Python via Boost.Python and interoperability with "symbolic python". See: http://sympy.org/en/index.html

Memory

  • Memcache library
  • Memswap algorithm

File formats

  • JSON parsing libary
  • XML library

Communication

GUI

  • GUI library

C++ "extensions"

Hardware

  • CPUID
Note: See TracWiki for help on using the wiki.