The Paris-Saclay CDS for has opened data science positions to reinforce the data science ecosystem and to build and support a data science platform. The data scientist will work on (in order of priority):
- Data-science support: The CDS has many collaborations between scientists and engineers across scientific disciplines. The candidate will work on a small number of data-science projects with different scientific partners. The candidate should be interested in a variety of scientific applications of data processing and able to understand an application, to communicate with scientists of a different culture, and to deploy data-processing solutions for their needs. Current projects include:
- Galaxy/star classification for preparing the LSST data processing pipeline.
- Particle tracking for the ATLAS/LHC upgrade.
- Segmenting and classifying Solar wind time series data.
- Classifying hospital stays (PMSI coding) with APHP.
- Laser spectrometry to improve the safety of intravenous drug administration.
- Improving search in large biological and biomedical data sets.
- Analyzing sensor data for tracking pollution and its health effects (Polluscope).
- Classifying crowdsourced ecology data (Spipoll).
- Software engineering: The objective to enable researchers to do better science thanks to better software tools. It means bringing state of the art data science research software into high quality toolboxes (as scikit-learn). Getting involved with open source development. Assisting data scientists (students, postdoctoral fellows, permanent researchers) to develop their software engineering skills and to get them involved with open source development.
- Training: Accompany domain scientists in their data analysis efforts. Accompany data scientists in their methodological research. Designing training sprints and practical material for the courses (cf. based on software carpentry)
Team
- The data scientist will work with scikit-learn core developers such as Gaël Varoquaux, Olivier Grisel, Loic Estève, Alexandre Gramfort and others. Possible collaboration with data and domain science researchers such as Balazs Kegl, Isabelle Guyon, Sarah Cohen-Boulakia, Karine Zeitouni, David Rousseau, and others.
Qualifications
- M.S. / Ph.D. in Computer Science, Statistical Machine Learning
- Good understanding of the data science workflow. Experience with data challenges is a plus.
- Strong programming experience with one or more data science languages (Python, R, Matlab)
- Experience with open source development (desired but not required).
The applicant should send a CV, a statement of purpose, and up to three letters of recommendations to cdsupsay@gmail.com.
Position is open now and will be open until it is filled.
Description:
Duration: 18 months
Gross salary per month in €: 2500-2817