10 jean-louis zimmermann - open streetmap france - lizmobility
6 scikit-learn - Data Tuesday 26 fev 2013
1. scikit-learn
Machine Learning in Python
Data Tuesday - Feb. 26 2013 - Paris
dimanche 24 février 13
2. • Library of Machine Learning models
• Simple fit / predict / transform API
• Python / NumPy / SciPy / Cython
& wrappers for libsvm / liblinear
• Model Assessment, Selection & Ensembles
• Some support for multi-core
dimanche 24 février 13
3. Possible Applications
• Text Classification / Sequence Tagging NLP
• Computer Vision / Robotics
• Learning To Rank - IR and advertisement
• Statistical Analysis of the Brain: fMRI / MEG
• Astronomy, Biology, Social Sciences...
dimanche 24 février 13
7. Example:
Training a Model for
Face Recognition
dimanche 24 février 13
8. Total dataset size:
n_samples: 1288, n_features: 1850, n_classes: 7
Extracting the top 150 eigenfaces from 966 faces
done in 0.466s
Projecting the input data on the eigenfaces orthonormal basis
done in 0.056s
Fitting the SVM classifier to the training set
done in 18.549s
Predicting people's names on the test set
done in 0.062s
precision recall f1-score support
Ariel Sharon 0.90 0.75 0.82 12
Colin Powell 0.78 0.94 0.85 62
Donald Rumsfeld 0.86 0.72 0.78 25
George W Bush 0.89 0.96 0.92 141
Gerhard Schroeder 0.92 0.74 0.82 31
Hugo Chavez 0.90 0.53 0.67 17
Tony Blair 0.81 0.74 0.77 34
avg / total 0.86 0.86 0.86 322
dimanche 24 février 13
11. Contributors
• GitHub-centric contribution workflow
• each pull request needs 2 x [+1] reviews
• code + tests + doc + example
• 92% test coverage / Continuous Integr.
• 4 major releases per years + 4 bugfix rel.
• 66 contributors for release 0.13
dimanche 24 février 13
12. Users
• We support users on & ML
• 200+ questions tagged with [scikit-learn]
• Many competitors + benchmarks
• 500+ answers on ongoing user survey
• 60% academics / 40% from industry
• Some data-drive Startups use sklearn
dimanche 24 février 13
13. Thank you!
• http://scikit-learn.org - Main Project + doc
• @ogrisel on twitter
• http://ogrisel.com - ML Consultancy (soon)
dimanche 24 février 13
15. Caveat Emptor
• Domain specific tooling kept to a minimum
• Some feature extraction for Bag of
Words Text Analysis
• Some functions for extracting image
patches
• Domain integration is the responsibility of
the user or 3rd party libraries
dimanche 24 février 13