[[PageOutline]] = Boost Google Summer of Code 2016 - NOT RUNNING IN 2016! = ---- > **Boost was NOT approved by Google as a Summer of Code organisation this year, so this page is kept for subsequent years only. Please see http://boost.2283326.n4.nabble.com/gsoc16-Boost-rejected-as-GSoC-2016-org-td4683986.html for the announcement.** ---- Welcome to the Boost C++ Libraries' home page for Google Summer of Code (GSoC) 2016. This page provides information about suggested student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with Boost. === Quick summary of policies and processes for this year === After two extremely successful years of GSoC at Boost during which even some of the old timers have been wowed by the C++ code some of the students have brought us, we are minded to keep in place the more rigorous candidate selection process which involves preferentially selecting over all others every GSoC Boost candidate who takes a '''C++ programming aptitude test''' or provides links to at least 1,000 lines (excluding comments and whitespace) non-coursework C++ ''library'' (not application nor solution) open source code. Note if following the second route, code should have been open sourced at least three months ago, and show a log of commits improving the library over time. ==== What students should do now ==== 1. Students should review the [wiki:SoCPrevious list of ideas from previous GSoCs] and the archives of [http://www.boost.org/community/groups.html#main the Boost developer's mailing list] relating to GSoC ('''tip:''' try searching boost-dev for subjects tagged ![gsoc16] or ![gsoc15] etc). [http://boost.2283326.n4.nabble.com/Boost-Dev-f2600599.html You may find this searchable archive of boost-dev useful]. 2. If you wish to proceed, you need to join the Boost Developer's mailing list and find a mentor who will be an experienced Boost developer in one of the Boost libraries listed at http://www.boost.org/doc/libs/1_60_0/. Read [http://www.boost.org/community/policy.html the Boost Discussion Policy] in full, and once read in full go to http://lists.boost.org/mailman/listinfo.cgi/boost and subscribe. 3. After as an absolute minimum reading all posts tagged '''![gsoc16]''', students should write a well researched and intelligent message with '''![gsoc16]''' at the front of the subject line to that developer's mailing list seeking a mentor, and be as flexible as possible in finding a topic that both they and the mentor is interested in upon which to base a GSoC project proposal text to be submitted to Google. As a general rule, a well written and researched proposal to extend or improve an existing mature Boost library is likely to be much better received that student originated ideas for new libraries or facilities. 4. Once a potential mentor and project idea is found, the student must write a project proposal which should follow [wiki:SoCSubmissionTemplate this submission template]. Potential mentors may add precanned project ideas with programming competency tests to this page below as GSoC approaches. These may prove useful in starting a discussion with potential mentor(s) '''whom the student should approach directly'''. === Github's for standalone GSoCs past and present === Since 2013 with Boost's transition to git we have kept a single umbrella org on github for those GSoCs which are fairly self standing. Incremental extensions to existing libraries usually enter that library's main git repo as an experimental branch. Here are those orgs: * GSoC 2013: https://github.com/BoostGSoC13 * GSoC 2014: https://github.com/BoostGSoC14 * GSoC 2015: https://github.com/BoostGSoC15 * This year's GSoC will be housed at https://github.com/BoostGSoC16 Students may find examining past GSoC source code and commit histories of use. === Historical GSoC Ideas pages for years 2006 to now === * 2015 [wiki:SoC2015 Project Ideas] * 2014 [wiki:SoC2014 Project Ideas] * 2013 [wiki:SoC2013 Project Ideas] * 2012 [wiki:SoC2012 Project Ideas] * 2011 [wiki:SoC2011 Project Ideas] * 2010 [wiki:SoC2010 Project Ideas] * 2009 [wiki:soc2009 Project Ideas] * 2008 [http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Google_Summer_Of_Code_2008 Project Ideas] * 2007 [http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Google_Summer_Of_Code_2007 Project Ideas] * 2006 [http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Google_Summer_Of_Code_2006 Project Ideas] [http://www.boost.org/community/gsoc_2006_boost_overview.html An overview of Boost participation in Google Summer of Code™ 2006]. = Suggested GSoC project proposals = '''[wiki:GSoCIdeaTemplate To any potential mentor adding a proposal HERE please use this template]''' === 1. Static map (`boost::static_map, Pred = boost::constexpr_equal_to>`) === Potential mentors: Niall Douglas ==== Background ==== Even with all of Boost's facilities and using C++ 14, this program represents the currently best available method of implementing a static constant associative map of keys to values ([http://melpon.org/wandbox/permlink/SEt61EAxVG4J5BnF play with it in an online compiler here] and [https://goo.gl/gTZOEp see the assembler generated here]): {{{ #!c++ #include #include #include #include #include using std::experimental::string_view; enum class weekday { sunday, monday, tuesday, wednesday, thursday, friday, saturday }; // initializer_list, pair and string_view are constexpr, so this list can be constexpr // (i.e. lives in the mind of the compiler only and has zero representation in generated code) #define STRING_VIEW(str) { str, sizeof(str)-1 } constexpr std::initializer_list> string_to_weekday { { STRING_VIEW("sunday"), weekday::sunday }, { STRING_VIEW("monday"), weekday::monday }, { STRING_VIEW("tuesday"), weekday::tuesday }, { STRING_VIEW("wednesday"), weekday::wednesday }, { STRING_VIEW("thursday"), weekday::thursday }, { STRING_VIEW("friday"), weekday::friday }, { STRING_VIEW("saturday"), weekday::saturday } }; int main(void) { { // Calls malloc() at least 7 times static const std::map to_weekday1 = string_to_weekday; std::cout << "'monday' maps to " << static_cast(to_weekday1.at("monday")) << std::endl; std::cout << "'friday' maps to " << static_cast(to_weekday1.at("friday")) << std::endl; // Calls free() at least 7 times } { // Calls malloc() at least 8 times static const std::unordered_map to_weekday2 = string_to_weekday; std::cout << "'monday' maps to " << static_cast(to_weekday2.at("monday")) << std::endl; std::cout << "'friday' maps to " << static_cast(to_weekday2.at("friday")) << std::endl; // Calls free() at least 8 times } return 0; } }}} As you can see, we have either the choice of linearly scanning the associative array `string_to_weekday` for keys (which has linear complexity to number of items per lookup), or we can ''copy'' at runtime the associative array into an associative map which can thereafter be queried at runtime in either `O(log N)` or `O(1)` complexities respectively. Both these options generate very significant amounts of runtime code. Almost every seasoned C++ programmer has at least once (and often many times throughout a career) written a static constant associative map which is initialised once from static constant data usually at process start, and is thereafter never modified. Given that C++ 14 can implement an associative map where key lookups are constexpr when possible, it would be a very worthwhile addition to Boost and C++ in general if we gained a high quality implementation of a static map class. ==== GSoC project proposal ==== 1. To seek consensus from the Boost Developer's mailing list on a suitable design for a Boost `static_map` class with the following design features: 1. Whose number of items and keys are completely fixed from construction onwards. 2. All features of which can be used in constant expressions (i.e. all member functions are marked constexpr). 3. Which can be completely statically initialised in the mind of the compiler, or in static global storage. 4. Values, though not keys nor number of items, are modifiable. 5. Which performs constexpr key to value lookup ''to the maximum possible extent''. It turns out this list of requirements is in fact quite challenging to implement, and even seasoned Boost programmers have to stop and ponder this one. To demonstrate the difficulty in code: {{{ #!c++ constexpr std::pair map_data[] = { { 5, "apple" }, { 8, "pear" }, { 0, "banana" } }; // Easy: generates no runtime code constexpr auto cmap = make_static_map(map_data); // Easy: generates no runtime code, this is as if literal "apple" constexpr const char *what_is_5 = cmap[5]; // Challenging: needs to only generate code loading immediately from a memory location. // It must NOT generate any additional runtime overhead like hashing nor searching. const char *what_is_8 = cmap[8]; }}} The reason why the last statement is challenging is because of the rules of when a constexpr-capable code path is executed by the compiler: the compiler only ''has'' to constexpr execute when all the inputs are constant expressions '''and''' all the outputs will be used in constant expressions, and it only ''may'' constexpr execute when all the inputs are constant expressions. Indeed empirical testing of clang 3.7 and VS2015 found both currently do '''not''' constexpr execute when the result is not stored constexpr, so in a naive implementation the last statement '''does''' generate a runtime lookup on current compiler technology despite the constant expression inputs. If you'd like to see more detail about this specific problem, have a look at the assembler generated for a toy static_map implementation at https://goo.gl/eO7ooa. 2. To implement a `static_map` class which runs on at least two major C++ compilers. - Hopefully in C++ 14, however upcoming features in C++ 1z do make the problem considerably easier to fix. 3. To implement a comprehensive unit test suite for the `static_map` class, including tests ensuring no runtime overhead is generated for the challenging use case exampled above. 4. To configure per-commit continuous integration for the unit test suite on at least one of the major public open source CI services (e.g. Travis, Appveyor). 5. To write documentation to Boost quality levels for the new container class, including time and space complexity guarantees and benchmarks, and exception guarantees for each API and each use of each API. ==== Potential project extension funded by Boost ==== - `static_multimap` (keys are not unique) - `static_set` and `static_multiset` (keys only) - `static_ordered_map` and `static_ordered_multimap` (iteration is in lexical order of keys) - `static_bimap` (values are also keys) - `static_index` (multi-column table with some columns being hash indexed) ==== Programming competency test ==== Write a function which uses one of the compile time string hashing techniques at https://stackoverflow.com/questions/2111667/compile-time-string-hashing (or a constexpr hash function of your preference) to generate a constexpr `std::array` of its input strings. The function ought to have the following prototype: {{{ #!c++ template constexpr std::array hash_strings(Strings&&... strings); // Examples of usage constexpr std::array string_hashes0 = hash_strings(); constexpr std::array string_hashes2 = hash_strings("niall", "douglas"); constexpr std::array string_hashes4 = hash_strings("google", "summer", "of", "code"); // This should fail elegantly and usefully ... //constexpr std::array string_hashes_fail = hash_strings(5); }}} In your submission you should: 1. Explain why you chose the constexpr string hash function you did and the strengths of your choice over other choices. 2. Test that the output is identical whether executed by the compiler at compile time, or at runtime. 3. '''Prove''' that the compile time constexpr implementation generates no runtime code whatsoever. Submission of the programming test should be via copying and pasting the code you wrote into the end of the proposal you submit to Google Melange. === 2. Boost.odeint === Potential mentors: Karsten Ahnert and Mario Mulansky ==== Background ==== Boost.odeint http://www.odeint.com is a library devoted to finding numerical solutions of ordinary differential equations (ODEs). It is developed in a generic way using Template Metaprogramming which leads to extraordinary high flexibility at top performance. It is widely used in academic and industrial projects. Boost.odeint has been a Google Summer of Code project in [https://www.google-melange.com/gsoc/project/details/google/gsoc2011/mulansky/5724160613416960 2011] and [https://www.google-melange.com/gsoc/project/details/google/gsoc2013/earnocks/5872285445521408 2013] and we would like again to see students involved in the development of this numerical library. ==== Project Idea 1: Implicit Routines ==== The current focus of odeint is on '''explicit''' routines, namely the Runge-Kutta schemes or the multi-step methods. However, we would like to expand odeint by adding implicit routines in the same flexible manner. Implicit routines are important when dealing with stiff systems, but also ensure stability in some cases of discretized partial differential equations (PDEs). At the moment, odeint provides an implicit Euler and Rosenbrock algorithm, but the implementation is restricted to Boost.uBlas. As one of the main objectives of odeint is to provide highly flexible algorithms, we want to change this implementation to have a greater flexibility and interchangeability similar to that of the explicit routines. This project would not only require pure coding, but first also a considerable amount of design work. * [http://en.wikipedia.org/wiki/Numerical_methods_for_ordinary_differential_equations] * [http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods] * [http://en.wikipedia.org/wiki/Explicit_and_implicit_methods] This project does not only require profound knowledge on C++ and generic programming, but we would also like to see some experience on numerical algorithms and solving ODEs, if possible. * Develop a design on how the requirements on implicit routines can be modularized * Implement a prototype showing the usability of the design * Change the existing routines to the new design * Implement other backends (MTL, eigen, ...) to demonstrate the flexibility * Provide examples and documentation * (Add more implicit routines, if time permits) ==== Project Idea 2: Event Detection ==== Currently, Boost.odeint does not offer specific functionality for event detection. However, this is a common use-case and has been discussed on the [https://github.com/headmyshoulder/odeint-v2/issues/9 odeint Github page] in the past. An investigatory implementation of an event-detection algorithm was added recently as [https://github.com/headmyshoulder/odeint-v2/commit/766d2cb0631b89f8f7f5c8187b792a216e691bd9#diff-8cd5c8387c683493559d40540e0a4f0c an example]. On the other hand, sophisticated event detection algorithms have been explored in the context of [http://www.sciencedirect.com/science/article/pii/0167278982900343 computing Poincare sections]. The first task of this project, however, is to develop a clean, generic interface to incorporate event detection algorithms into the existing structure of Boost.odeint. The project requires profound knowledge of C++ and generic programming, as well as a good understanding of Boost.odeint and ODE simulations and numerical algorithms. * Design an interface to incorporate event detection into Boost.odeint * Prototypical implementation and testing the usability of the design in several use cases * Implementation of standard algorithm for event detection ([http://www.boost.org/doc/libs/1_50_0/libs/math/doc/sf_and_dist/html/math_toolkit/toolkit/internals1/roots2.html bisection]) * Implementation of sophisticated algorithms for event detection [http://www.sciencedirect.com/science/article/pii/0167278982900343 Poincare sections] * Documentation and examples === 3. Boost.Python improvements === Potential mentor: Stefan Seefeld This project requires knowledge of C++11 and Python ==== Potential projects: ==== * Modernize code base by using standard C++11 features rather than their equivalent Boost APIs (type traits, notably). * Integrate NumPy support into Boost.Python (https://github.com/ndarray/Boost.NumPy), or prepare for stand-alone review. === 4. Boost.uBLAS === Potential mentors: David Bellot All projects with Boost.uBLAS requires knowledge of C++11. ==== Background ==== uBLAS is a library for linear algebra and matrix computations. Using recursive templates, it allows the compiler to optimize any complex linear algebra expressions as if it were written by hand by the programmer. Basic classes are matrix and vector. The library has all the basic functionalities and a few standard algorithms. We would like to improve the functionality of this library by adding new algorithms and functionality especially in the field of data analysis and machine learning. ==== Project Idea 1: Data Frame and Statistics ==== Languages like R or Python (with Pandas) uses the notion of Data Frame and have many aggregation or grouping algorithm to generate all sorts of statistics on huge matrices. As it became a very important topic we would like to have similar functions in uBLAS. For example you can see libraries like Pandas ( http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) or a very powerful R package name data.table ( http://cran.r-project.org/web/packages/data.table/index.html). Having similar functionalities in ublas would be a must! The project will require the student to understand the basics of R data.frame and see what kind of limitations arise when it has to be implemented with a template meta-program in C++. However, the project will require the student to also identify all the possible optimizations that can't be done with generic purpose data.frame in R and Python because of missing information (like column types), etc... Finally, the student is expected to implement algorithms on the data.frame that can potentially be re-used on matrices too like subset selection with generic operators, statistics and summaries. Understanding memory management, alignment, optimizations, vector processing is not mandatory but most welcome. Understanding C++ expression template and C++ meta-programming is required. The student will start by studying existing implementations and propose a design. Then he or she will implement a prototype with tests and benchmarks. The final stage will be a thorough integration into ublas, and especially writing examples and documentation. ==== Project Idea 2: Statistics and data analysis ==== This project is about adding statistical capabilities to Boost.uBLAS. It requires a deep understanding of C++ and of basics and if possible advanced statistics. In term of work, it will require to add many functions to compute mean, variance, covariance, histogram, several types of running statistics on long vector or matrices. This project requires a lot of attention to the detail as all the functions must be thoroughly tested for all types of data. They have to be as generic as possible and work on most of the types. If time permits, but it is almost a second requirement, we would like to see implementation of simple machine learning algorithm like k-means clustering, Gaussian mixtures, PCA, ICA and possible other types of simple mixtures. The student will need to understand those techniques beforehand. If time permits, the student will be allowed to work on more advanced machine learning algorithms. The other requirements are the same as project 1. ==== Programming competency test ==== A programming competency test is required. It is asked of the candidates to implement a Toeplitz matrix in uBLAS. You can get your inspiration from how the banded matrices are implemented, like here: https://github.com/uBLAS/ublas/blob/master/include/boost/numeric/ublas/banded.hpp If you are selected as a student and if your implementation is good enough, as a bonus we will integrate your programming competency test into the code of uBLAS. === 5. Boost.Flyweight: multi-key flyweights === Potential mentors: Joaquín M López Muñoz, Vicente J. Botet Escriba ==== Background ==== [http://www.boost.org/libs/flyweight/doc/index.html Boost.Flyweight] allows for memory-efficient handling of large quantities of immutable objects when many of them have equivalent states, i.e. when their representation can be reduced to a smaller number of surrogate instances. These common representatives are stored and looked up in so-called ''[http://www.boost.org/libs/flyweight/doc/reference/factories.html#factory factories]''. As it happens, flyweight factories are one-keyed, i.e. they can only be looked up by values of one type (either that of the elements stored themselves, in the default case, or a user-specified type for ''[http://www.boost.org/libs/flyweight/doc/tutorial/key_value.html key-value flyweights]''). Sometimes the need arises of specifying more than one key type by which the same underyling objects can be accessed, naturally leading to an extended concept of ''multi-key factories'', which as of today are not part of Boost.Flyweight. More context is given in a recent [http://thread.gmane.org/gmane.comp.lib.boost.devel/265379 thread] on the Boost mailing list. ==== Project Idea: multi-key flyweights ==== Let `Key1`, ... , `KeyN` be types with some notion of identity ~ under which they are isomorphic, i.e. there are mappings `fij(Keyi)->Keyj` that preserve intra-type identity, and let be `Value` be some other, different type (the flyweight type). A multi-key flyweight on (`Key1`, ... , `KeyN`, `Value`) is a type `M` that behaves as a key-value flyweight for each (`Keyi`, `Value`) with the additional property that: `ki` ~ `kj` <=> `M(ki)==M(kj)` for each 1 <= `i`, `j` <= `N`, `ki` of type `Keyi`, `ki` of type `Keyj`, ... that is, equivalent keys (even if of different types) refer to the same object, and conversely. The goal of the project is to extend Boost.Flyweight to support these multi-key flyweights. Some of the items to be completed will be: * Define the concepts required for multi-key factories and their specifiers, in a manner consistent and interoperable with the existing concepts for [http://www.boost.org/libs/flyweight/doc/reference/holders.html#holder holders], [http://www.boost.org/libs/flyweight/doc/reference/locking.html#locking locking policies] and [http://www.boost.org/libs/flyweight/doc/reference/tracking.html#tracking tracking policies]. As a kind of sanity check, the concept of multi-key factory when the number of keys is 1 should coincide with the already existing concept of factory. * Write at least one implementation of a multi-key factory (and associated specifier) to be provided off-the-shelf by Boost.Flyweight. This is very likely to be based on [http://www.boost.org/libs/multi_index/doc/index.html Boost.MultiIndex]. This component has to be flexible enough to support different lookup strategies, including at least hash-based and less-based. * Define and implement a `multikey_flyweight` class template that uses and weaves together multi-key factories, holders, locking and tracking. Make this class template taggable in the same way as `boost::flyweight`. * ''' Extra:''' allow for tagged key types that extend the applicability of the library to the case where some `Keyi` is the same type as some other `Keyj`. * ''' Extra:''' implement a lazy version of a multi-key factory, i.e. one in which keys of type `Keyi` are not created unless explicitly used. Laziness is expected to impose some space penalty on the associated data structure, this has to be understood and documented. (In fact, Boost.!MultiIndex is probably not up to this task.) * Provide full docs (tutorial and reference), examples and full test coverage to be merged with existing Boost.Flyweight material. ==== Skills and competency test ==== The student will need * A mathematically-inclined mind to correctly capture and define the (non-trivial) concepts involved. * Good command of C++11. Code should ideally be C++03-compatible, with C++11 features (such as move semantics) leveraged where available and emulated where possible. * Fluency with template metaprogramming in general and [http://www.boost.org/libs/mpl/doc/index.html Boost.MPL] in particular. The metaprogramming and generic programming involved is quite hard: please take a look at Boost.Flyweight source code to get an idea of the level of complexity this calls for. * Good command of the English language for documentation purposes. No specific competency test is proposed, but candidate students are required to show some previous work where all or most of these skills are exhibited. === 6. Safe-Float === Potential mentors: Damian Vicino ==== Background ==== Arithmetic operations in C++ are NOT guaranteed to yield a correct mathematical result. For instance, the overflow on addition operation may produce an infinite value when using float. Safe Float proposes implementing a drop-in replacement for floating point numeric data types guaranteeing that, when used in expressions in a C++ program, no incorrect arithmetic results will be produced without developer awareness. In addition to the arithmetic problems, some others can result in security concerns related to usage of floating point. In these regards, the CERT Division of Software Engineering Institute at Carnegie Mellon University publishes guidelines for Secure Programming when using FP datatypes ![1]. A survey of arithmetic problems was written by David Goldberg in 1991 ![2]. And the limitations of Intel architecture for implementing floating point are described in the IA-32 ![3] and x87 ![4] manuals. Problems with floating-point can be categorised in: - Domain/Pole/Range errors - Approximation errors and narrowing on casting - Unexpected input (e.g. reading a NaN from a stream when expecting a number) - Ill defined (e.g. initialise or read as input “0.1”) - Unexpected operation results (e.g. undetected denormalisation, rounding, underflow, overflow to infinity or NaN operands) Since C++11 and C++14 standards ![5] provided standardised floating point environment access giving the possibility of tackle the problem efficiently in a platform independent way. Using the floating-point environment, most problems can be detected by introducing multiple lines of code after each operation for checking flags. Requiring the programmer to write down all these checks in every place a floating point operation is executed is error-prone. In general, the problems mentioned lead to obscure and hard to detect bugs for the non-expert in numerical methods. However, some algorithms are designed taking in account approximations and errors to happen and could be silently discarded. Thus, our approach in safe-float is allowing the declaration of concerns to be considered for each declared variable (e.g. we can define safe_float which only checks for division by zero problems). ==== GSoC project proposal ==== During GSoC 2015, safe-float was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process. 1. Implement new policies for cast and error handling of the data type. 2. Implement Safe-Float literals for IEEE754 standard compliance. 3. Extend the test suite. 4. Define Concepts were required. 5. Check correct exception guarantees are implemented. 6. Extend current documentation ![6]. ==== Programming competency test ==== - Define a user defined literal assignable to float that fails compilation when the value provided cannot be expressed as an positive integer power of 0.5 (e.g. 0.5, 0.25, 0.125). - Provide a function receiving an integer (X) and a tuple (std::tuple) into a tuple of vectors (std::tuple,std::vector, std::vector, …> where each vector has X elements of each originally received in each tuple_element. E.g. for X=2 and the tuple {1, 1.0, ‘a’} , the result type is std::tuple, std::vector, std::vector> and the values are: {{1, 1},{1.0, 1.0},{‘a’, ‘a’}} - Provide a template function “get_vector”, similar to std::get, that given the type X as parameter, returns editable access to the vector of type X in a tuple of vectors. ==== References ==== - ![1] SEI CERT. The CERT C++ secure coding standard. Pearson Education, 2008. - ![2] Goldberg, David. "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys (CSUR) 23.1 (1991): 5-48. - ![3] Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 (specially sections about FP flags in FADD, FSUB, FMUL and FDIV) [4] x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ) - ![5] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf - ![6] http://sdavtaker.github.io/safefloat/doc/html/index.html '''[wiki:GSoCIdeaTemplate To any potential mentor adding a proposal HERE please use this template]'''