Kaggle hosts data science competitions where participants use machine learning techniques to solve problems by analyzing datasets. Competitions have objective criteria for judging entries and have uncovered insights, like a student outperforming algorithms in predicting dark matter. Kaggle has over 60,000 data scientists as users who apply different machine learning methods to the hosted competitions.
8. Kaggle’s Dark Matter Competition
on the White House blog
“The world’s brightest physicists
have been working for decades on
solving one of the great unifying
problems of our universe”
“In less than a week, Martin
O’Leary, a PhD student in
glaciology, outperformed
the state-of-the-art algorithms”
12. Users apply different techniques
• neural networks • genetic algorithms
• logistic regression • random forest
• support vector machine • Monte Carlo methods
• decision trees • principal component analysis
• ensemble methods • Kalman filter
• adaBoost • evolutionary fuzzy modeling
• Bayesian networks
13. EXAMPLE ESSAY QUESTION —
We all understand the benefits of laughter. For
example, someone once said, “Laughter is the
shortest distance between two people.”
Many other people believe that laughter is an
important part of any relationship. Tell a true story in
which laughter was one element or part.
14. “Have you ever experienced a time
with your friends or family where you
laughed so hard your stomach hurt,
and your eyes were filled with tears?
Laughing is something every person
needs.
Automated results by
A great laugh can make a persons day
the winning algorithm are
and put a smile on their face. If no one
as reliable as manual
laughed the world would be a terribly
assessment by teachers. sad place. My friends and I are always
laughing, to the point where were
rolling on the ground, clutching our
stomachs laughing.”
15.
16.
17. & Obesity
& Hypertension
& High Cholesterol
Diabetes
Probability of going to hospital
in the next six months
21. What could the world’s best
analysts find in your data?
e-mail a@kaggle.com
phone +1 650 283 9781
Notas del editor
How does data science differ from econometrics
Companies post their problem, their data and a prize, and our 38,000 data scientists compete to product the best solution.
Players’ algorithms are back-tested in real time, so we can show how people are performing on a live leaderboard. The live leaderboard accounts for a large part of Kaggle’s success, as people are motivated to outperform each other (which is catalyzes better performance than individuals developing a model in isolation).
From many different (maths-related disciplines)
Users have the option to tell us their favourite techniques
Outbound Within 1 min Within 5 min Overall 62.1 87.9 Peak hour 12.8 57.4 PH next hour 14.8 63.0 Inbound predictions tend to be far more accurate. 34.5 per cent of inbound peak-hour predictions made one hour ahead are correct within one minute. Inbound Within 1 min Within 5 min Overall 69.3 92.1 Peak hour 37.8 77.8 PH next hour 34.5 79.3
6000 molecules (anonymized) 1700 structural descriptors Objective of prediction: Biological Response (mutagenicity) Indicated as 1/0 Many other biological responses can be modeled using the same approach Exceeded expectations within 2 weeks Extensible to other compound properties: mutagenicity, hepatotoxicity, solubility, PK/PD etc.
Could predict whether a used car would be a lemon with approximately 47% accuracy.