HiggsML

The HIGGSML challenge was a machine learning (ML) challenge to optimize the discovery potential for the Higgs boson. In high-energy physics (HEP), especially at the Large Hadron Collider (LHC), there is a complex software pipeline that reduces the petabytes of data to final measurements. Machine learning (neural nets, boosting; or multivariate analysis as it is called within HEP), has been used since the nineties within this pipeline. However, it was realized that the tools used within the HEP communities were obsolete, and machine learning was not exploited to its full potential.

The HiggsML challenge was organized by a collaboration of three physicists of the ATLAS experiment on the LHC at CERN and three machine learning specialists (five of them from Saclay). The challenge was funded largely by CDS, and to a lesser extent by Google and INRIA. It was run on Kaggle, the best known data challenge platform. For the first time, simulated Higgs events (both signal and background) were released by the ATLAS collaboration. The challenge participants were asked to submit a classifier to maximize the significance of the Higgs boson search in the difficult t+t channel. The challenge was running from May to September 2014. It was a remarkable success with more than 2000 participants in 1785 teams, the largest challenge on Kaggle at that time. The winner beat the significance of the HEP in-house tool (called TMVA) by 20%. We also awarded a special “HEP meets ML” prize to the author of the XGboost library which has since become a de facto industry standard in ML.

The most important outcome of the challenge was the dynamics it generated, both in ML and in HEP, both locally and internationally.

HiggsML

News