The selected candidate will be recruited as an employee of the National Museum of Natural History (MNHN-Paris), starting ideally September 2017.
PROJECT: Human-machine interaction for identifying insects on photos
Since 2010, participants of our citizen science program Spipoll (spipoll.fr) have collected 280,000 pictures of flower-dwellings insects and spiders. This roughly 50k/year pictures should increase steeply in 2018 with the launch of a modernized web-site for participation. All of these photos have been identified by participants, although most of them are beginners in entomology, with the help of an online key allowing to find the most likely name among more than 600 possibilities. These names allow partitioning the whole fauna of flower-dwelling insects into non-overlapping groups according to an ontology and corresponding to various taxonomic levels, from species to broad species groups, called terminal taxa. Participants are encouraged to check and comment each other’s identifications. Most photos are eventually “validated” by nominated experts.
The database is hold at the Museum and is used for research on the macro-ecology of pollinators and pollination (cesco.mnhn.fr).
Four Information-and-Data science groups are currently interacting with us, developing researches that may eventually improve participant experience:
- Regine Vignes group (MNHN-Paris, xper3.fr/) is working on developing the identification key (spipoll.snv.jussieu.fr/mkey/mkey-spipoll.html), in particular allowing the storage of all steps used within the key during photo identification. A big issue is to improve the accuracy of the key. It is likely that many more insects species or species groups could be identified on photo than the current 600 (e.g., some groups may be split in sub-groups if new criteria are found). However, it is currently very difficult to simultaneously capitalize on users’ experience to improve the key and modify the ontology of the key (learning process), as any addition of new criteria causes instability due to the already large number of terminal taxa.
- David Gross-Amblard group (Inria-Rennes, www-druid.irisa.fr) is working on developing algorithms to allocate optimally simple tasks to many people to solve complex problems. With respect to Spipoll, complexity is, in particular, to validate insect identification. Each participant has some specific skills and acquires new ones during participation, and some free time to allow to identification. Validation may be more or less urgent depending on ongoing projects. Insects themselves are more or less difficult to identity and furthermore, participants have different experience with different insects. Validating photos will also affect participant score (in positive or negative way). On one hand, increasing one’s score is a strong motivation for participation, one the other hand, it’s more effective to avoid participants with poor score. Then, challenges related to validation may be addressed such as tagging collectively and with robustness a large number of photos on various criteria (e.g., coloration of honey bees, abrasion of insect’s wings, amount of pollen carried by insects, sex, etc.)
- MMOS team (Switzerland, mmos.ch) is working on incorporating citizen science projects in the gaming universe, especially massively multiplayer online games. Beyond accessing a wide community of possible participants, one output of this collaboration is to explore how engagements in games and in citizen science may rely partly on the same motivations. MMOS made its name recently with the success of Project Discovery, the first implementation of their new approach, a collaboration with the Icelandic game developer company CCP and the Swedish research project, the Human Protein Atlas. With Project Discovery, around 150 thousand gamers of the massively multiplayer online game EVE Online have classified around 30 million microscopy images, while being immersed in their favorite video game, making this project a major achievement in citizen science. MMOS and the Spipoll team of MNHN Paris works closely together in the framework of the H2020 GAPARS consortium.
- Balazs Kegl group (Paris-Saclay, CNRS) is developing a crowdsourcing tool for prototyping machine learning solutions for automatic identification. Motivated by the shortcomings of traditional data challenges, they have developed a unique concept and platform, called Rapid Analytics and Model Prototyping (RAMP; ramp.studio), based on modularization and code submission. Open code submission allows participants to build on each other’s ideas, provides the organizers with a fully functioning prototype, and makes it possible to build complex machine learning workflows while keeping the contributions simple. First trials with a closed set of 18 species (with pairs of more or less similar species) and 60,000 pictures from the Spipoll database lead to an impressive 95% rate of good identifications.
We aim at integrating these different approaches in the next 3 years and whish first to allow an innovative research project to emerge at the interface of these different research groups.
We are seeking a candidate able to develop such a research project based on his/her own skills and interactions with at least two of the above mentioned groups. The selected project should be innovative, provide an added value with respect to what the different groups may do based on their own skills, be designed to be successful in one year (the project may be extended up to another year if necessary), and above all, should improve participant experience.
The selected candidate will be recruited as an employee of the Museum (Paris) (salary 1900-2200 € net of charge/month depending on experience), starting ideally September 2017. Depending on the project and skills, s/he may spend most of his/her time with one of our above listed partners.
Any further questions can be addressed to Romain Julliard.
Project and CV should be send by June, 25th at the latest, in a single file to firstname.lastname@example.org