Data challenges are competitions in data science running over several weeks or months to resolve problems using provided datasets. Data challenges can be thought of as crowdsourcing, benchmarking, and communication tools. They have been used for decades to test and compare competing solutions to data science problems in a fair and controlled way, to eliminate “inventor-evaluator” bias, and to stimulate the scientific community while promoting reproducible science.
In our experience, challenges turned out to be one of the most efficient tools for connecting data science to domain sciences, which is the main mission of the CDS. They are also great communication tools to build the PS- CDS brand, and to reach an audience beyond classical research. Finally, organizing and running challenges, including preparing and dimensioning the data and formalizing the problem, is a great training experience for novice data scientist.
The CDS 2 will continue funding and organizing high-profile data challenges.