Open Source Software

The staff of the CDS is actively contributing to open-source data science software. In addition, the CDS funded no less than 13 “doctoral missions” to develop specific features impacting each community related to those projects.


Several features have been implemented in scikit-learn and scikit-learn-contrib:

dask-ml is a library for distributed and parallel machine learning using dask. The following algorithms have been implemented:

Joblib is a set of tools to provide lightweight pipelining in Python. The following contributions have been made:

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. @jorisvandenbossche is actively maintaining and improving this project.

A new Python library Operalib devoted to various machine learning algorithms devoted to operator-valued kernels regression was implemented.

MNE-Python is the Python open source toolbox for processing and visualizing MEG and EEG data. The following contributions have been made:

scikit-image is a collection of algorithms for image processing. The following contributions have been made:


Sphinx-gallery which is a Sphinx extension that builds an HTML gallery of examples from any set of Python scripts.


Specio is a Python library that provides an easy interface to read hyperspectral data. It is cross-platform, runs on Python 2.x and 3.x, and is easy to install.

Comments are closed.