Changes between Version 13 and Version 14 of SoC2016


Ignore:
Timestamp:
Feb 15, 2016, 7:41:59 PM (7 years ago)
Author:
David Bellot
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SoC2016

    v13 v14  
    249249* Integrate NumPy support into Boost.Python (https://github.com/ndarray/Boost.NumPy), or prepare for stand-alone review.
    250250
     251=== 4. Boost.uBLAS ===
     252Potential mentors: David Bellot
     253
     254All projects with Boost.uBLAS requires knowledge of C++11.
     255
     256==== Background ====
     257uBLAS is a library for linear algebra and matrix computations. Using recursive templates, it allows the compiler to optimize any complex linear algebra expressions as if it were written by hand by the programmer. Basic classes are matrix and vector. The library has all the basic functionalities and a few standard algorithms. We would like to improve the functionality of this library by adding new algorithms and functionality especially in the field of data analysis and machine learning.
     258
     259==== Project 1: Data Frame and Statistics ====
     260
     261Languages like R or Python (with Pandas) uses the notion of Data Frame and have many aggregation or grouping algorithm to generate all sorts of statistics on huge matrices. As it became a very important topic we would like to have similar functions in uBLAS. For example you can see libraries like Pandas ( http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) or a very powerful R package name data.table ( http://cran.r-project.org/web/packages/data.table/index.html). Having similar functionalities in ublas would be a must !
     262
     263The project will require to understand the basics of R data.frame and see what kind of limitations arise when it has to be implemented with a template meta-program in C++. However, the project will require the student to also identify all the possible optimizations than can't be done with generic purpose data.frame in R and Python because of missing information (like column types), etc...
     264Finally, the student is expected to implement algorithms on the data.frame that can potentially be re-used on matrices too like subset selection with generic operators, statistics and summaries. Understanding memory management, alignment, optimizations, vector processing is not mandatory but most welcome.
     265Understanding expression template and meta-programming in C++ is required.
     266
     267The student will start by studying existing implementations and propose a design. Then he or she will implement a prototype with tests and benchmarks. The final stage will be a thorough integration into ublas, and especially writing examples and documentation.
     268
     269==== Project 2:  Statistics and data analysis ====
     270
     271This project is about adding statistical capabilities to Boost.uBLAS. It requires a deep understanding of C++ and of basics and if possible advanced statistics. In term of work, it will require to add many functions to compute mean, variance, covariance, histogram, several types of running statistics on long vector or matrices.
     272This project requires a lot of attention to the detail as all the functions must be thoroughly tested for all types of data. They have to be as generic as possible and works on most of the types.
     273If time permits, but it is almost a second requirement, we would like to see implementation of simple machine learning algorithm like k-means clustering, Gaussian mixtures, PCA, ICA and possible other types of simple mixtures. The student will need to understand those techniques beforehand.
     274
     275If time permits, the student will be allowed to work on more advanced machine learning algorithms.
     276
     277The other requirements are the same as project 1.
     278
     279==== Programming competency test ====
     280
     281A programming competency test is required.
     282It is asked to the candidates to implement a Toeplitz matrix in uBLAS. You can get your inspiration from how the banded matrices are implemented, like here: https://github.com/uBLAS/ublas/blob/master/include/boost/numeric/ublas/banded.hpp
     283If you are selected as a student and if your implementation is good enough, as a bonus we will integrate your programming competency test into the code of uBLAS.
     284
     285
     286
     287
    251288
    252289'''[wiki:GSoCIdeaTemplate To any potential mentor adding a proposal HERE please use this template]'''