SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Learning from the Past:
     with Scikit-Learn
         ANOOP THOMAS MATHEW
               Profoundis Labs Pvt. Ltd.
Agenda
●   Basics of Machine Learning

●   Introduction some common techniques

●   Let you know scikit-learn exists

●   Some inspiration on using machine
    learning in daily life scenarios and live
    projects.
How to draw a snake?
How to draw a snake?
How to draw a snake?
How to draw a snake?




                             IR D!
                       W E
             hi s is
         T
Introduction


A lot of Data!

What to do???
Introduction

What is

             Machine Learning
                    (Data Mining)?
                         (in plain english)
Machine Learning

 "A computer program is said to learn from
experience E with respect to some class of tasks
T and performance measure P, if its performance
at tasks in T, as measured by P, improves with
experience E"
                                  Tom M. Mitchell
Machine Learning



● Supervised Learning - model.fit(X, y)
● Unsupervised Learning - model.fit(X)
Supervised Learning
For example ...
from sklearn.linear_model import Ridge as RidgeRegression
from sklearn import datasets
from matplotlib import pyplot as plt

boston = datasets.load_boston()
X = boston.data
y = boston.target
clf = RidgeRegression()
clf.fit(X, y)
clf.predict(X)
Unsupervised Learning
For example ...


from sklearn.cluster import KMeans
from numpy.random import RandomState
rng = RandomState(42)
k_means = KMeans(3, random_state=rng)
k_means.fit(X)
What can Scikit-learn do?



Clustering
Classification
Regression
Terminology

•   Model the collection of parameters you are trying to fit
•   Data what you are using to fit the model
•   Target the value you are trying to predict with your model
•   Features attributes of your data that will be used in prediction
•   Methods algorithms that will use your data to fit a model
Steps for Analysis
●   Understand the task. See how to measure the
    performance.

●   Choose the source of training experience.

●   Decide what will be input and output.

●   Choose a set of models to the output function.

●   Choose a learning algorithm.
Steps for Analysis
●   Understand the task. See how to measure the
    performance. Find the right question to ask.

●   Choose the source of training experience.
      ●   Keep training and testing dataset separate. Beware of overfitting !

●   Decide what will be input and expected output.

●   Choose a set of models to approximate the output
    function. (use dimensinality reduction)

●   Choose a learning algorithm. Try different ones ;)
Some Common Algorithms
     Some Common Algorithms

Principal Component Analysis
Some Common Algorithms
         Some Common Algorithms
●   Support Vector Machine
Some Common Algorithms
         Some Common Algorithms
●   Nearest Neighbour Classifier
Some Common Algorithms
         Some Common Algorithms
●   Decision Tree Learning
Some Common Algorithms
     Some Common Algorithms

k-means clustering
Some Common Algorithms
     Some Common Algorithms

DB SCAN Clustering
Some Example Usecases
       Some Example Usecases




● Log file analysis
● Outlier dectection

● Fraud Dectection

● Forcasting

● User patterns
A few comments
                            A few comments

●
    nltk is a good(better) for text processing
●
    scikit-learn is for medium size problems
●
    for humongous projects, think of mahout
●
    matplotlib can be     used for visualization
●
    visualize it in browser using d3.js
●
    have a look at pandas for numerical analysis
Conclusion
                                   Conclusion




● This is just the tip of an iceberg.
● Scikit-learn is really cool to hack with.

● A lot of examples

(http://scikit-learn.org/stable/auto_examples/index.html)
Final words
              Final words



pip install scikit-learn

Its all in the internet.

 Happy Hacking!

Más contenido relacionado

Destacado

Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
jyyothees mv
 
Presentation Machine Learning
Presentation Machine LearningPresentation Machine Learning
Presentation Machine Learning
Periklis Gogas
 
thesis_jinxing_lin
thesis_jinxing_linthesis_jinxing_lin
thesis_jinxing_lin
jinxing lin
 
Scope of managerial economics
Scope of managerial economicsScope of managerial economics
Scope of managerial economics
Nethan P
 
Demand Forecasting
Demand ForecastingDemand Forecasting
Demand Forecasting
Anupam Basu
 
Demand forecasting techniques ppt
Demand forecasting techniques pptDemand forecasting techniques ppt
Demand forecasting techniques ppt
pcte
 
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang SurjasaForcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
INDRA PETRUS AMBARITA
 

Destacado (20)

Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
Forecasting (1)
Forecasting (1)Forecasting (1)
Forecasting (1)
 
RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011
RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011
RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
Semiconductor industry demand forecasting using custom models
Semiconductor industry demand forecasting using custom modelsSemiconductor industry demand forecasting using custom models
Semiconductor industry demand forecasting using custom models
 
solomonaddai
solomonaddaisolomonaddai
solomonaddai
 
Data Science : Make Smarter Business Decisions
Data Science : Make Smarter Business DecisionsData Science : Make Smarter Business Decisions
Data Science : Make Smarter Business Decisions
 
Presentation Machine Learning
Presentation Machine LearningPresentation Machine Learning
Presentation Machine Learning
 
Scope of managerial economics
Scope of managerial economics Scope of managerial economics
Scope of managerial economics
 
Demand Forcasting
Demand ForcastingDemand Forcasting
Demand Forcasting
 
thesis_jinxing_lin
thesis_jinxing_linthesis_jinxing_lin
thesis_jinxing_lin
 
Machine learning ~ Forecasting
Machine learning ~ ForecastingMachine learning ~ Forecasting
Machine learning ~ Forecasting
 
Demand Forecasting
Demand ForecastingDemand Forecasting
Demand Forecasting
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
機器學習速遊
機器學習速遊機器學習速遊
機器學習速遊
 
Scope of managerial economics
Scope of managerial economicsScope of managerial economics
Scope of managerial economics
 
Demand Forecasting
Demand ForecastingDemand Forecasting
Demand Forecasting
 
Demand estimation and forecasting
Demand estimation and forecastingDemand estimation and forecasting
Demand estimation and forecasting
 
Demand forecasting techniques ppt
Demand forecasting techniques pptDemand forecasting techniques ppt
Demand forecasting techniques ppt
 
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang SurjasaForcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
Forcasting demand, By: Indra Petrus Ambarita. Dosen : Dr. Dadang Surjasa
 

Similar a Pycon 2012 Scikit-Learn

Similar a Pycon 2012 Scikit-Learn (20)

Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Data science
Data scienceData science
Data science
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
ML in Android
ML in AndroidML in Android
ML in Android
 
Behaviour Driven Development and Thinking About Testing
Behaviour Driven Development and Thinking About TestingBehaviour Driven Development and Thinking About Testing
Behaviour Driven Development and Thinking About Testing
 
Bdd and-testing
Bdd and-testingBdd and-testing
Bdd and-testing
 
Artificial Intelligence - Anna Uni -v1.pdf
Artificial Intelligence - Anna Uni -v1.pdfArtificial Intelligence - Anna Uni -v1.pdf
Artificial Intelligence - Anna Uni -v1.pdf
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 

Más de Anoop Thomas Mathew

Test Driven Development in Python
Test Driven Development in PythonTest Driven Development in Python
Test Driven Development in Python
Anoop Thomas Mathew
 

Más de Anoop Thomas Mathew (18)

Data Driven Code
Data Driven CodeData Driven Code
Data Driven Code
 
Writing Smarter Applications with Machine Learning
Writing Smarter Applications with Machine LearningWriting Smarter Applications with Machine Learning
Writing Smarter Applications with Machine Learning
 
Thinking in Functions: Functional Programming in Python
Thinking in Functions: Functional Programming in PythonThinking in Functions: Functional Programming in Python
Thinking in Functions: Functional Programming in Python
 
Protoyping Painkiller Startups
Protoyping Painkiller StartupsProtoyping Painkiller Startups
Protoyping Painkiller Startups
 
ingenium
ingeniumingenium
ingenium
 
Web Development Fundamentals
Web Development FundamentalsWeb Development Fundamentals
Web Development Fundamentals
 
What The Web!
What The Web!What The Web!
What The Web!
 
Investor pitch deck for Vibe
Investor pitch deck for VibeInvestor pitch deck for Vibe
Investor pitch deck for Vibe
 
Getting Started on distributed version control with git
Getting Started on distributed version control with gitGetting Started on distributed version control with git
Getting Started on distributed version control with git
 
Let's Contribute
Let's ContributeLet's Contribute
Let's Contribute
 
Advanced Computing for Sustainable Future
Advanced Computing for Sustainable FutureAdvanced Computing for Sustainable Future
Advanced Computing for Sustainable Future
 
Ambidextrous Python - Introduction Python Libraries
Ambidextrous Python - Introduction Python Libraries Ambidextrous Python - Introduction Python Libraries
Ambidextrous Python - Introduction Python Libraries
 
Faster Python
Faster PythonFaster Python
Faster Python
 
Startups and FOSS
Startups and FOSSStartups and FOSS
Startups and FOSS
 
How slow is Real slow - PyCon India 2013
How slow is Real slow - PyCon India 2013 How slow is Real slow - PyCon India 2013
How slow is Real slow - PyCon India 2013
 
Redis way of Anayltics with Python - Fifth Elephant 2012
Redis way of Anayltics with Python - Fifth Elephant 2012Redis way of Anayltics with Python - Fifth Elephant 2012
Redis way of Anayltics with Python - Fifth Elephant 2012
 
Building a Company atop of Open Source
Building a Company atop of Open SourceBuilding a Company atop of Open Source
Building a Company atop of Open Source
 
Test Driven Development in Python
Test Driven Development in PythonTest Driven Development in Python
Test Driven Development in Python
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Pycon 2012 Scikit-Learn

  • 1. Learning from the Past: with Scikit-Learn ANOOP THOMAS MATHEW Profoundis Labs Pvt. Ltd.
  • 2.
  • 3. Agenda ● Basics of Machine Learning ● Introduction some common techniques ● Let you know scikit-learn exists ● Some inspiration on using machine learning in daily life scenarios and live projects.
  • 4. How to draw a snake? How to draw a snake?
  • 5. How to draw a snake? How to draw a snake? IR D! W E hi s is T
  • 6. Introduction A lot of Data! What to do???
  • 7. Introduction What is Machine Learning (Data Mining)? (in plain english)
  • 8. Machine Learning "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E" Tom M. Mitchell
  • 9. Machine Learning ● Supervised Learning - model.fit(X, y) ● Unsupervised Learning - model.fit(X)
  • 11. For example ... from sklearn.linear_model import Ridge as RidgeRegression from sklearn import datasets from matplotlib import pyplot as plt boston = datasets.load_boston() X = boston.data y = boston.target clf = RidgeRegression() clf.fit(X, y) clf.predict(X)
  • 13. For example ... from sklearn.cluster import KMeans from numpy.random import RandomState rng = RandomState(42) k_means = KMeans(3, random_state=rng) k_means.fit(X)
  • 14. What can Scikit-learn do? Clustering Classification Regression
  • 15. Terminology • Model the collection of parameters you are trying to fit • Data what you are using to fit the model • Target the value you are trying to predict with your model • Features attributes of your data that will be used in prediction • Methods algorithms that will use your data to fit a model
  • 16. Steps for Analysis ● Understand the task. See how to measure the performance. ● Choose the source of training experience. ● Decide what will be input and output. ● Choose a set of models to the output function. ● Choose a learning algorithm.
  • 17. Steps for Analysis ● Understand the task. See how to measure the performance. Find the right question to ask. ● Choose the source of training experience. ● Keep training and testing dataset separate. Beware of overfitting ! ● Decide what will be input and expected output. ● Choose a set of models to approximate the output function. (use dimensinality reduction) ● Choose a learning algorithm. Try different ones ;)
  • 18. Some Common Algorithms Some Common Algorithms Principal Component Analysis
  • 19. Some Common Algorithms Some Common Algorithms ● Support Vector Machine
  • 20. Some Common Algorithms Some Common Algorithms ● Nearest Neighbour Classifier
  • 21. Some Common Algorithms Some Common Algorithms ● Decision Tree Learning
  • 22. Some Common Algorithms Some Common Algorithms k-means clustering
  • 23. Some Common Algorithms Some Common Algorithms DB SCAN Clustering
  • 24. Some Example Usecases Some Example Usecases ● Log file analysis ● Outlier dectection ● Fraud Dectection ● Forcasting ● User patterns
  • 25. A few comments A few comments ● nltk is a good(better) for text processing ● scikit-learn is for medium size problems ● for humongous projects, think of mahout ● matplotlib can be used for visualization ● visualize it in browser using d3.js ● have a look at pandas for numerical analysis
  • 26. Conclusion Conclusion ● This is just the tip of an iceberg. ● Scikit-learn is really cool to hack with. ● A lot of examples (http://scikit-learn.org/stable/auto_examples/index.html)
  • 27. Final words Final words pip install scikit-learn Its all in the internet. Happy Hacking!