SlideShare una empresa de Scribd logo
1 de 38
Catalit LLC
SCIKIT-LEARNTUTORIAL
Francesco Mosconi
SF Data Mining Meetup @ Google Launchpad
May 2017
Data Weekends
Catalit LLC
BEFORE WE START
Download and install:
MINICONDA PYTHON 2.7
from here:
https://conda.io/miniconda.html
Catalit LLC
INTHIS WORKSHOP
• Recognize problems & choose right ML technique
• Load and manipulate data with Pandas
• Build classification model with Scikit-Learn
• Evaluate model performance with Scikit-Learn
Catalit LLC
MLTECHNIQUES
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
MLTECHNIQUES
CONTINUOUS CATEGORICAL
SUPERVISED REGRESSION CLASSIFICATION
UNSUPERVISED CLUSTERING
Catalit LLC
TYPES OF PROBLEMS
Catalit LLC
TYPES OF PROBLEMSSentiment Analysis Heart MonitoringBook recommendation
Caption generation
Human recognition
Catalit LLC
TYPES OF PROBLEMS
House price prediction
Document classification Social Network Analysis
Catalit LLC
SCIKIT-LEARN
Catalit LLC
MODEL BUILDING
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
BENCHMARK
Catalit LLC
CLASSIFIERS
http://www.aboutdm.com/2013/04/history-of-machine-learning.html
Catalit LLC
Catalit LLC
Catalit LLC
New!
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
PROCESSING
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Transfor
mer
X
Transfor
mer
X'
Estimato
r
X'' y
Catalit LLC
EVALUATION
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
Catalit LLC
CONFUSION MATRIX
• Accuracy: Overall, how often is it correct?
• (TP +TN) / total
Test Negative Test Positive
Condition
Negative
TRUE NEGATIVE
FALSE POSITIVE
(Type I error)
Condition
Positive
FALSE NEGATIVE
(Type II error)
TRUE POSITIVE
Catalit LLC
TRAIN -TEST SPLIT
Training
data
Testing
data
Model
Train
Model
Measure
performance
Alldataavailable
Catalit LLC
Catalit LLC
Catalit LLC
ATALE OF FLOWERS
https://en.wikipedia.org/wiki/Iris_flower_data_set
Iris
Versicolor
Iris
Virginica
Catalit LLC
BINARY CLASSIFICATION
Sepal Length Sepal Width Petal Length Petal Width Type
Flower 1 6.2 3.4 5.4 2.3 Virginica
Flower 2 5.9 3.0 5.1 1.8 Virginica
Flower 3 7.0 3.2 4.7 1.4 Versicolor
Features Labels
Data Point
Catalit LLC
SUPERVISED LEARNING
http://www.realsafety.org/wp-content/uploads/2014/11/safety-supervisors-interaction.png
Catalit LLC
TUTORIAL
Code:
dataweekends.com/ml
Catalit LLC
THANKYOU
Data Weekends
Next Data Weekends Dates:
2-day Machine Learning: May 6-7
2-day Intro Deep Learning: May 20 - 21
2-day Advanced Deep Learning: Jun 3 - 4
2-day Intro Deep Learning: Jun 17 - 18

Más contenido relacionado

Destacado

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 

Destacado (20)

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Machine learning with scikit-learn
Machine learning with scikit-learnMachine learning with scikit-learn
Machine learning with scikit-learn
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learn
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Text categorization
Text categorizationText categorization
Text categorization
 

Similar a Intro to scikit learn may 2017

Similar a Intro to scikit learn may 2017 (20)

Machine Learning: Techniques, Best Practices and Practical Application
Machine Learning: Techniques, Best Practices and Practical ApplicationMachine Learning: Techniques, Best Practices and Practical Application
Machine Learning: Techniques, Best Practices and Practical Application
 
Intro to deep learning Python Meetup
Intro to deep learning Python MeetupIntro to deep learning Python Meetup
Intro to deep learning Python Meetup
 
Advanced Keras
Advanced KerasAdvanced Keras
Advanced Keras
 
Advanced Keras / Global Artificial Intelligence Conference / Santa Clara 2018
Advanced Keras / Global Artificial Intelligence Conference / Santa Clara 2018Advanced Keras / Global Artificial Intelligence Conference / Santa Clara 2018
Advanced Keras / Global Artificial Intelligence Conference / Santa Clara 2018
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
MicroStrategy Training From myTectra in Bangalore,
MicroStrategy Training From myTectra in Bangalore,MicroStrategy Training From myTectra in Bangalore,
MicroStrategy Training From myTectra in Bangalore,
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015
 
Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
AtlasCamp 2015 Keynote
AtlasCamp 2015 KeynoteAtlasCamp 2015 Keynote
AtlasCamp 2015 Keynote
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data Science
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pub
 
M365 Virtual Conference: Componentize Your Power Apps
M365 Virtual Conference: Componentize Your Power AppsM365 Virtual Conference: Componentize Your Power Apps
M365 Virtual Conference: Componentize Your Power Apps
 
The Future Based on AI and Analytics
The Future Based on AI and AnalyticsThe Future Based on AI and Analytics
The Future Based on AI and Analytics
 
UXDX Berlin - Test & Deploy, by Quentin Berder, President, WiredCraft
UXDX Berlin - Test & Deploy, by Quentin Berder, President, WiredCraftUXDX Berlin - Test & Deploy, by Quentin Berder, President, WiredCraft
UXDX Berlin - Test & Deploy, by Quentin Berder, President, WiredCraft
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science project
 

Más de Francesco Mosconi

Más de Francesco Mosconi (9)

Tensorflow 2.0 SF Python Meetup November 2019
Tensorflow 2.0 SF Python Meetup November 2019Tensorflow 2.0 SF Python Meetup November 2019
Tensorflow 2.0 SF Python Meetup November 2019
 
Deep Learning in 2019 SF Python October
Deep Learning in 2019 SF Python OctoberDeep Learning in 2019 SF Python October
Deep Learning in 2019 SF Python October
 
Tensorflow 2.0 and Keras: What's new, what's shared, what's different
Tensorflow 2.0 and Keras: What's new, what's shared, what's differentTensorflow 2.0 and Keras: What's new, what's shared, what's different
Tensorflow 2.0 and Keras: What's new, what's shared, what's different
 
Python for Machine Learning and Deep Learning
Python for Machine Learning and Deep LearningPython for Machine Learning and Deep Learning
Python for Machine Learning and Deep Learning
 
Image Similarity Search
Image Similarity SearchImage Similarity Search
Image Similarity Search
 
Tensorflow 2.0 pipeline ai
Tensorflow 2.0 pipeline aiTensorflow 2.0 pipeline ai
Tensorflow 2.0 pipeline ai
 
Tensorflow 2.0 and Keras ODSC Jan 2019
Tensorflow 2.0 and Keras ODSC Jan 2019Tensorflow 2.0 and Keras ODSC Jan 2019
Tensorflow 2.0 and Keras ODSC Jan 2019
 
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
 
Intro to Deep Learning April 2017
Intro to Deep Learning April 2017Intro to Deep Learning April 2017
Intro to Deep Learning April 2017
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Intro to scikit learn may 2017