The staff of the CDS is actively contributing to open-source data science software. In addition, the CDS funded no less than 13 “doctoral missions” to develop specific features impacting each community related to those projects.
- Isolation forest for anomaly detection
- Multivariate adaptive regression splines used in regression tasks
- Categorical encoder used in pre-processing
- Memory caching in scikit-learn pipelines
- Quantile transformer used in pre-processing
- Transformed target regressor to ease target manipulation in regression task
- Imbalanced-learn to deal with classification of imbalanced data sets
- Column transformer to combine heterogeneous pre-processing steps
- (In progress) Optimization of the tree architecture
- (In progress) Maintenance and improvements: @TomDLT, @glemaitre, @jorisvandenbossche
Joblib is a set of tools to provide lightweight pipelining in Python. The following contributions have been made:
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. @jorisvandenbossche is actively maintaining and improving this project.
MNE-Python is the Python open source toolbox for processing and visualizing MEG and EEG data. The following contributions have been made:
scikit-image is a collection of algorithms for image processing. The following contributions have been made:
- Implementation of the Haar-like features
Sphinx-gallery which is a Sphinx extension that builds an HTML gallery of examples from any set of Python scripts.
Specio is a Python library that provides an easy interface to read hyperspectral data. It is cross-platform, runs on Python 2.x and 3.x, and is easy to install.