SlideShare una empresa de Scribd logo
1 de 34
Basic machine learning background
with Python scikit-learn
Usman Roshan
Python
• An object oriented interpreter-based
programming language
• Basic essentials
– Data types are numbers, strings, lists, and dictionaries
(hash-tables)
– For loops and conditionals
– Functions
– Lists and hash-tables are references (like pointers in C)
– All variables are passed by value
Python
• Simple Python programs
Data science
• Data science
– Simple definition: reasoning and making decisions from
data
– Contains machine learning, statistics, algorithms,
programming, big data
• Basic machine learning problems
– Classification
• Linear methods
• Non-linear
– Feature selection
– Clustering (unsupervised learning)
– Visualization with PCA
Python scikit-learn
• Popular machine learning toolkit in Python
http://scikit-learn.org/stable/
• Requirements
– Anaconda
– Available from
https://www.continuum.io/downloads
– Includes numpy, scipy, and scikit-learn (former
two are necessary for scikit-learn)
Data
• We think of data as vectors in a fixed
dimensional space
• For example
Classification
• Widely used task: given data determine the
class it belongs to. The class will lead to a
decision or outcome.
• Used in many different places:
– DNA and protein sequence classification
– Insurance
– Weather
– Experimental physics: Higgs Boson determination
Linear models
• We think of a linear model
as a hyperplane in space
• The margin is the
minimum distance of all
closest point (misclassified
have negative distance)
• The support vector
machine is the hyperplane
with largest margin
y
x
Support vector machine:
optimally separating hyperplane
In practice we allow for error terms in case there is no hyperplane.
minw,w0 ,xi
1
2
w
2
+ C xi
i
å
æ
è
ç
ö
ø
÷
s.t.
yi
(wT
x + w0
) ³1-xi
xi
³ 0
Optimally separating hyperplane
with errors
y
x
w
SVM on simple data
• Run SVM on example data shown earlier
• Solid line is SVM and dashed indicates margin
SVM in scikit-learn
• Dataset are taken from the UCI machine
learning repository
• Learn an SVM model on training data
• Which parameter settings?
– C: tradeoff between error and model complexity
(margin)
– max_iter: depth of the gradient descent algorithm
• Predict on test data
SVM in scikit-learn
Analysis of SVM program on
breast cancer data
Non-linear classification
• In practice some datasets may not be classifiable.
• Remember this may not be a big deal because the
test error is more important than the train one
Non-linear classification
• Neural networks
– Create a new representation of the data where it
is linearly separable
– Large networks leads to deep learning
• Decision trees
– Use several linear hyperplanes arranged in a tree
– Ensembles of decision trees are state of the art
such as random forest and boosting
Decision tree
From Alpaydin, 2010
Combining classifiers by bagging
• A single decision tree can overfit the data and
have poor generalization (high error on test data).
We can relieve this by bagging
• Bagging
– Randomly sample training data by bootstrapping
– Determine classifier Ci on sampled data
– Goto step 1 and repeat m times
– For final classifier output the majority vote
• Similar to tree bagging
– Compute decision trees on bootstrapped datasets
– Return majority vote
Variance reduction by voting
• What is the variance of the output of k
classifiers?
• Thus we want classifiers to be independent to
minimize variance
• Given independent binary classifiers each with
accuracy > ½ the majority vote accuracy increases
as we increase the number of classifiers (Hansen
and Salamon, IEEE Transactions of Pattern
Analysis and Machine Intelligence, 1990)
Random forest
• In addition to sampling datapoints (feature
vectors) we also sample features (to increase
independence among classifiers)
• Compute many decision trees and output
majority vote
• Can also rank features
• Alternative to bagging is to select datapoints
with different probabilities that change in the
algorithm (called boosting)
Decision tree and random forest in
scikit-learn
• Learn a decision tree and random forest on
training data
• Which parameter settings?
– Decision tree:
• Depth of tree
– Random forest:
• Number of trees
• Percentage of columns
• Predict on test data
Decision tree and random forest in
scikit-learn
Data projection
• What is the mean and variance here?
1 2 3 4
3
2
1
2
Data projection
• Which line maximizes variance?
1 2 3 4
3
2
1
2
Data projection
• Which line maximizes variance?
1 2 3 4
3
2
1
2
Principal component analysis
• Find vector w of length 1 that maximizes
variance of projected data
PCA optimization problem
argmax
w
1
n
(wT
xi - wT
m)2
subject to wT
w = 1
i=1
n
å
The optimization criterion can be rewritten as
argmax
w
1
n
(wT
(xi - m))2
=
i=1
n
å
argmax
w
1
n
(wT
(xi - m))T
(wT
(xi - m)) =
i=1
n
å
argmax
w
1
n
((xi - m)T
w)(wT
(xi - m)) =
i=1
n
å
argmax
w
1
n
wT
(xi - m)(xi - m)T
w =
i=1
n
å
argmax
w
wT 1
n
(xi - m)(xi - m)T
w =
i=1
n
å
argmax
w
wT
åw subject to wT
w = 1
Dimensionality reduction and visualization with PCA
Dimensionality reduction and visualization with PCA
PCA plot of breast cancer data (output of program in previous slide)
Unsupervised learning - clustering
• K-means: popular fast program for clustering
data
• Objective: find k clusters that minimize
Euclidean distance of points in each cluster to
their centers (means)
|| x j - mi ||2
x j ÎCi
å
i=1
2
å
K-means algorithm for two clusters
Input:
Algorithm:
1. Initialize: assign xi to C1 or C2 with equal probability and compute
means:
2. Recompute clusters: assign xi to C1 if ||xi-m1||<||xi-m2||, otherwise
assign to C2
3. Recompute means m1 and m2
4. Compute objective
5. Compute objective of new clustering. If difference is smaller than
then stop, otherwise go to step 2.
xi
ÎRd
,i =1… n
m1 =
1
C1
xi
xi ÎC1
å m2 =
1
C2
xi
xi ÎC2
å
|| x j - mi ||2
x j ÎCi
å
i=1
2
å
d
K-means in scikit-learn
K-means PCA plot in scikit-learn
K-means PCA plot in scikit-learn
PCA plot of breast cancer data
colored by true labels
PCA plot of breast cancer data
colored by k-means labels
Conclusion
• We saw basic data science and machine learning
tasks in Python scikit-learn
• Can we handle very large datasets in Python
scikit-learn? Yes
– For space use array from numpy to use a byte for a
char and 4 for float and int. Otherwise more space is
used because Python is object oriented
– For speed use stochastic gradient descent in scikit-
learn (doesn’t come with mini-batch though) and
mini-batch k-means
• Deep learning stuff: Keras

Más contenido relacionado

Similar a background.pptx

Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8Roger Barga
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forestumarcybermind
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selectionMarco Meoni
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคลBAINIDA
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 

Similar a background.pptx (20)

Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 

Más de KabileshCm

ROBOTICS APPLICATIONS of Electronics.pptx
ROBOTICS APPLICATIONS of Electronics.pptxROBOTICS APPLICATIONS of Electronics.pptx
ROBOTICS APPLICATIONS of Electronics.pptxKabileshCm
 
Presentation On Machine Learning greator.pptx
Presentation On Machine Learning greator.pptxPresentation On Machine Learning greator.pptx
Presentation On Machine Learning greator.pptxKabileshCm
 
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptx
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptxSTUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptx
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptxKabileshCm
 
project[1].pdf
project[1].pdfproject[1].pdf
project[1].pdfKabileshCm
 
DATA SCIENCE WITH PYTHON.pptx
DATA SCIENCE WITH PYTHON.pptxDATA SCIENCE WITH PYTHON.pptx
DATA SCIENCE WITH PYTHON.pptxKabileshCm
 
Application Development for the Internet of Things.pptx
Application Development for the Internet of Things.pptxApplication Development for the Internet of Things.pptx
Application Development for the Internet of Things.pptxKabileshCm
 
Python-data-science.pptx
Python-data-science.pptxPython-data-science.pptx
Python-data-science.pptxKabileshCm
 

Más de KabileshCm (9)

ROBOTICS APPLICATIONS of Electronics.pptx
ROBOTICS APPLICATIONS of Electronics.pptxROBOTICS APPLICATIONS of Electronics.pptx
ROBOTICS APPLICATIONS of Electronics.pptx
 
Presentation On Machine Learning greator.pptx
Presentation On Machine Learning greator.pptxPresentation On Machine Learning greator.pptx
Presentation On Machine Learning greator.pptx
 
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptx
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptxSTUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptx
STUDENTS ATTENDANCE USING FACE RECOGNITION-1.pptx
 
project[1].pdf
project[1].pdfproject[1].pdf
project[1].pdf
 
DATA SCIENCE WITH PYTHON.pptx
DATA SCIENCE WITH PYTHON.pptxDATA SCIENCE WITH PYTHON.pptx
DATA SCIENCE WITH PYTHON.pptx
 
Application Development for the Internet of Things.pptx
Application Development for the Internet of Things.pptxApplication Development for the Internet of Things.pptx
Application Development for the Internet of Things.pptx
 
neural.ppt
neural.pptneural.ppt
neural.ppt
 
Lecture6.ppt
Lecture6.pptLecture6.ppt
Lecture6.ppt
 
Python-data-science.pptx
Python-data-science.pptxPython-data-science.pptx
Python-data-science.pptx
 

Último

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 

Último (20)

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

background.pptx

  • 1. Basic machine learning background with Python scikit-learn Usman Roshan
  • 2. Python • An object oriented interpreter-based programming language • Basic essentials – Data types are numbers, strings, lists, and dictionaries (hash-tables) – For loops and conditionals – Functions – Lists and hash-tables are references (like pointers in C) – All variables are passed by value
  • 4. Data science • Data science – Simple definition: reasoning and making decisions from data – Contains machine learning, statistics, algorithms, programming, big data • Basic machine learning problems – Classification • Linear methods • Non-linear – Feature selection – Clustering (unsupervised learning) – Visualization with PCA
  • 5. Python scikit-learn • Popular machine learning toolkit in Python http://scikit-learn.org/stable/ • Requirements – Anaconda – Available from https://www.continuum.io/downloads – Includes numpy, scipy, and scikit-learn (former two are necessary for scikit-learn)
  • 6. Data • We think of data as vectors in a fixed dimensional space • For example
  • 7. Classification • Widely used task: given data determine the class it belongs to. The class will lead to a decision or outcome. • Used in many different places: – DNA and protein sequence classification – Insurance – Weather – Experimental physics: Higgs Boson determination
  • 8. Linear models • We think of a linear model as a hyperplane in space • The margin is the minimum distance of all closest point (misclassified have negative distance) • The support vector machine is the hyperplane with largest margin y x
  • 9. Support vector machine: optimally separating hyperplane In practice we allow for error terms in case there is no hyperplane. minw,w0 ,xi 1 2 w 2 + C xi i å æ è ç ö ø ÷ s.t. yi (wT x + w0 ) ³1-xi xi ³ 0
  • 11. SVM on simple data • Run SVM on example data shown earlier • Solid line is SVM and dashed indicates margin
  • 12. SVM in scikit-learn • Dataset are taken from the UCI machine learning repository • Learn an SVM model on training data • Which parameter settings? – C: tradeoff between error and model complexity (margin) – max_iter: depth of the gradient descent algorithm • Predict on test data
  • 13. SVM in scikit-learn Analysis of SVM program on breast cancer data
  • 14. Non-linear classification • In practice some datasets may not be classifiable. • Remember this may not be a big deal because the test error is more important than the train one
  • 15. Non-linear classification • Neural networks – Create a new representation of the data where it is linearly separable – Large networks leads to deep learning • Decision trees – Use several linear hyperplanes arranged in a tree – Ensembles of decision trees are state of the art such as random forest and boosting
  • 17. Combining classifiers by bagging • A single decision tree can overfit the data and have poor generalization (high error on test data). We can relieve this by bagging • Bagging – Randomly sample training data by bootstrapping – Determine classifier Ci on sampled data – Goto step 1 and repeat m times – For final classifier output the majority vote • Similar to tree bagging – Compute decision trees on bootstrapped datasets – Return majority vote
  • 18. Variance reduction by voting • What is the variance of the output of k classifiers? • Thus we want classifiers to be independent to minimize variance • Given independent binary classifiers each with accuracy > ½ the majority vote accuracy increases as we increase the number of classifiers (Hansen and Salamon, IEEE Transactions of Pattern Analysis and Machine Intelligence, 1990)
  • 19. Random forest • In addition to sampling datapoints (feature vectors) we also sample features (to increase independence among classifiers) • Compute many decision trees and output majority vote • Can also rank features • Alternative to bagging is to select datapoints with different probabilities that change in the algorithm (called boosting)
  • 20. Decision tree and random forest in scikit-learn • Learn a decision tree and random forest on training data • Which parameter settings? – Decision tree: • Depth of tree – Random forest: • Number of trees • Percentage of columns • Predict on test data
  • 21. Decision tree and random forest in scikit-learn
  • 22. Data projection • What is the mean and variance here? 1 2 3 4 3 2 1 2
  • 23. Data projection • Which line maximizes variance? 1 2 3 4 3 2 1 2
  • 24. Data projection • Which line maximizes variance? 1 2 3 4 3 2 1 2
  • 25. Principal component analysis • Find vector w of length 1 that maximizes variance of projected data
  • 26. PCA optimization problem argmax w 1 n (wT xi - wT m)2 subject to wT w = 1 i=1 n å The optimization criterion can be rewritten as argmax w 1 n (wT (xi - m))2 = i=1 n å argmax w 1 n (wT (xi - m))T (wT (xi - m)) = i=1 n å argmax w 1 n ((xi - m)T w)(wT (xi - m)) = i=1 n å argmax w 1 n wT (xi - m)(xi - m)T w = i=1 n å argmax w wT 1 n (xi - m)(xi - m)T w = i=1 n å argmax w wT åw subject to wT w = 1
  • 27. Dimensionality reduction and visualization with PCA
  • 28. Dimensionality reduction and visualization with PCA PCA plot of breast cancer data (output of program in previous slide)
  • 29. Unsupervised learning - clustering • K-means: popular fast program for clustering data • Objective: find k clusters that minimize Euclidean distance of points in each cluster to their centers (means) || x j - mi ||2 x j ÎCi å i=1 2 å
  • 30. K-means algorithm for two clusters Input: Algorithm: 1. Initialize: assign xi to C1 or C2 with equal probability and compute means: 2. Recompute clusters: assign xi to C1 if ||xi-m1||<||xi-m2||, otherwise assign to C2 3. Recompute means m1 and m2 4. Compute objective 5. Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2. xi ÎRd ,i =1… n m1 = 1 C1 xi xi ÎC1 å m2 = 1 C2 xi xi ÎC2 å || x j - mi ||2 x j ÎCi å i=1 2 å d
  • 32. K-means PCA plot in scikit-learn
  • 33. K-means PCA plot in scikit-learn PCA plot of breast cancer data colored by true labels PCA plot of breast cancer data colored by k-means labels
  • 34. Conclusion • We saw basic data science and machine learning tasks in Python scikit-learn • Can we handle very large datasets in Python scikit-learn? Yes – For space use array from numpy to use a byte for a char and 4 for float and int. Otherwise more space is used because Python is object oriented – For speed use stochastic gradient descent in scikit- learn (doesn’t come with mini-batch though) and mini-batch k-means • Deep learning stuff: Keras