SlideShare una empresa de Scribd logo
1 de 52
Machine Learning
with R
Barbara Fusinska
@BasiaFusinska
About me
Programmer
Machine Learning
Data Solutions Architect
@BasiaFusinska
https://github.com/BasiaFusinska/MachineLearningWithR
Agenda
• What’s Machine Learning?
• Exploratory Data Analysis
• Classification
• Clustering
• Regression
Setup
• Install R:
https://www.r-project.org/
• Install RStudio:
https://www.rstudio.com/
• GitHub repository:
https://github.com/BasiaFusinska/Ma
chineLearningWithR
• Packages
Machine Learning?
Movies Genres
Title # Kisses # Kicks Genre
Taken 3 47 Action
Love story 24 2 Romance
P.S. I love you 17 3 Romance
Rush hours 5 51 Action
Bad boys 7 42 Action
Question:
What is the genre of
Gone with the wind
?
Data-based classification
Id Feature 1 Feature 2 Class
1. 3 47 A
2. 24 2 B
3. 17 3 B
4. 5 51 A
5. 7 42 A
Question:
What is the class of the entry
with the following features:
F1: 31, F2: 4
?
Data Visualization
0
10
20
30
40
50
60
0 10 20 30 40 50
Rule 1:
If on the left side of the
line then Class = A
Rule 2:
If on the right side of the
line then Class = B
A
B
Chick sexing
Supervised
learning
• Classification, regression
• Label, target value
• Training & Validation
phases
Unsupervised
learning
• Clustering, feature
selection
• Finding structure of data
• Statistical values
describing the data
Supervised Machine Learning workflow
Clean data Data split
Machine Learning
algorithm
Trained model Score
Preprocess
data
Training
data
Test data
Publishing the model
Machine Learning
Model
Model Training
Published
Machine Learning
Model
Prediction
Training data
Publish model
Test stream
Scores
Exploratory
Data Analysis
Demo
Classification problem
Model training
Data & Labels
Classification data
Source #Links #Characters ... Fake
TopNews 10 2750 … T
Twitter 2 120 … F
TopNews 235 502 … F
Channel X 1530 3024 … T
Twitter 24 70 … F
StoryLeaks 722 1408 … T
Facebook 98 230 … T
… … … … ...
Features
Labels
Task: Iris EDA
• Descriptive statistics (dimensions,
rows, columns, data types,
correlation)
• Data visualization (distributions,
outliers)
• Features distributions & classes
separation
• 2D visualisation
http://archive.ics.uci.edu/ml/datasets/Iris
K-Nearest Neighbours Algorithm
• Object is classified by a majority
vote
• k – algorithm parameter
• Distance metrics: Euclidean
(continuous variables), Hamming
(text)
?
Naïve Bayes classifier
𝑝 𝐶 𝑘 𝒙) =
𝑝 𝐶 𝑘 𝑝 𝒙 𝐶 𝑘)
𝑝(𝒙)
𝒙 = (𝑥1, … , 𝑥 𝑘)
𝑝 𝐶 𝑘 𝑥1, … , 𝑥 𝑘) likelihood
evidence
prior
posterior
Naïve Bayes example
Sex Height Weight Foot size
Male 6 190 11
Male 6.2 170 10
Female 5 130 6
… … … …
Sex Height Weight Foot size
? 5.9 140 8
𝑝 𝑚𝑎𝑙𝑒 𝒙 =
𝑝 𝑚𝑎𝑙𝑒 𝑝 5.9 𝑚𝑎𝑙𝑒 𝑝 140 𝑚𝑎𝑙𝑒 𝑝(8|𝑚𝑎𝑙𝑒)
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 = 𝑝 𝑚𝑎𝑙𝑒 𝑝 5.9 𝑚𝑎𝑙𝑒 𝑝 140 𝑚𝑎𝑙𝑒 𝑝 8 𝑚𝑎𝑙𝑒 +
𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 5.9 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 140 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝(8|𝑓𝑒𝑚𝑎𝑙𝑒)
𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝒙 =
𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 5.9 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 140 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝(8|𝑓𝑒𝑚𝑎𝑙𝑒)
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
Logistic regression
𝑧 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘
𝑦 =
1 𝑓𝑜𝑟 𝑧 > 0
0 𝑓𝑜𝑟 𝑧 < 0
𝑦 =
1 𝑓𝑜𝑟 𝜙(𝑧) > 0.5
0 𝑓𝑜𝑟 𝜙(𝑧) < 0.5
Logistic function
Coefficients
Best fit of β
Data
processing
Demo
Data
classification
Demo
Evaluation methods for classification
Confusion
Matrix
Reference
Positive Negative
Prediction
Positive TP FP
Negative FN TN
Receiver Operating Characteristic
curve
Area under the curve
(AUC)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
#𝑐𝑜𝑟𝑟𝑒𝑐𝑡
#𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
=
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁
𝑇𝑁 + 𝐹𝑁
How good at avoiding
false alarms
How good it is at
detecting positives
Task: Iris
Classification
• Data preprocessing
• Split data for training and tests
sets
• Classification using: kNN and Naïve
Bayes
• Performance evaluation
• Results Visualisation
Task: Binary
Classification
• Only two slasses in the dataset
(versicolor & virginica)
• Classification using logistic
regression
• Performance evaluation
• Results Visualisation
Resampling: Bootstrapping
k-fold cross validation
Data
resampling
Demo
Data tuning
Demo
Task: Resampling
& Tuning
• Repeated k-fold cross validation
• Use Naïve Bayes as classification
algorithm
• Tune the parameters using specific
values
• Performance evaluation
Clustering problem
K-means Algorithm
Hierarchical clustering
• Decision of where the cluster
should be split
• Metric: distance between pairs
of observation
• Linkage criterion: dissimilarity of
sets
Clustering
Demo
Evaluating
methods for
clustering
• Sum of squares
• Class based measures
• Underlying true
Task: Iris
Clustering
• Clustering using k-means and
hierarchies
• Compare clusters with the original
classes assignments
• Visualise the findings
Regression problem
• Dependent value
• Predicting the real value
• Fitting the coefficients
• Analytical solutions
• Gradient descent
𝑓 𝒙 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘
Ordinary linear regression
Residual sum of squares (RSS)
𝑆 𝑤 =
𝑖=1
𝑛
(𝑦𝑖 − 𝑥𝑖
𝑇
𝑤)2
= 𝑦 − 𝑋𝑤 𝑇
𝑦 − 𝑋𝑤
𝑤 = 𝑎𝑟𝑔 min
𝑤
𝑆(𝑤)
Task: Prestige EDA
• Descriptive statistics (dimensions,
rows, columns, data types,
correlation)
• Data visualization (distributions,
outliers)
• Handle missing data
• Features significance
Evaluation methods for regression
• Errors
𝑅𝑀𝑆𝐸 = 𝑖=1
𝑛
(𝑓𝑖 − 𝑦𝑖)2
𝑛
𝑅2 = 1 −
(𝑓𝑖 − 𝑦𝑖)2
( 𝑦 − 𝑦𝑖)2
• Statistics (t, ANOVA)
Residuals vs
Fitted
• Check if residuals have non-
linear patterns
• Check if the model captures
the non-linear relationship
• Should show equally spread
residuals around the
horizontal line
Normal Q-Q
• Shows if the residuals are
normally distributed
• Values should be lined on the
straight dashed line
• Check if residuals do not
deviate severely
Scale-Location
• Show if residuals are spread
equally along the ranges of
predictors
• Test the assumption of equal
variance (homoscedasticity)
• Should show horizontal line
with equally (randomly)
spread points
Residuals vs
Leverage
• Helps to find influential cases
• When outside of the Cook’s
distance the cases are
influential
• With no influential cases
Cook’s distance lines should
be barely visible
Regression
problem
Demo
Task: Prestige
Regression
• Numeric and categorical features
• Other than linear relations
• Combining the features
Categorical data for regression
• Categories: A, B, C are coded as
dummy variables
• In general if the variable has k
categories it will be decoded into
k-1 dummy variables
Category V1 V2
A 0 0
B 1 0
C 0 0
𝑓 𝒙 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑗 𝑥𝑗 + 𝛽𝑗+1 𝑣1 + ⋯ + 𝛽𝑗+𝑘−1 𝑣 𝑘
Categorical data for regression
𝑓 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑣1 + ⋯ + 𝛽 𝑘 𝑣 𝑘−1 +
𝛽 𝑘+1 𝑣1 𝑥 + ⋯ + 𝛽2𝑘−1 𝑣 𝑘−1 𝑥
𝑦 ~ 𝑥 + 𝑐𝑎𝑡 + 𝑥: 𝑐𝑎𝑡
Keep in touch
BarbaraFusinska.com
@BasiaFusinska
https://github.com/BasiaFusinska/MachineLearningWithR

Más contenido relacionado

La actualidad más candente

Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using PythonShirin Mojarad, Ph.D.
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Edureka!
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & SelectionEng Teong Cheah
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning AlgorithmsWalaa Hamdy Assy
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 

La actualidad más candente (20)

Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & Selection
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
R programming
R programmingR programming
R programming
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 

Similar a Machine Learning with R

Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISCOGS Presentations
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with AzureBarbara Fusinska
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
Modeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsModeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsAlexander Braylan
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization CS, NcState
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceDamianMingle
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxPratik Gohel
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classificationWael Badawy
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysisBangalore
 

Similar a Machine Learning with R (20)

Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
KNN
KNN KNN
KNN
 
Modeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsModeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex Annotations
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
07 learning
07 learning07 learning
07 learning
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Learning from data
Learning from dataLearning from data
Learning from data
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 

Más de Barbara Fusinska

Hassle free, scalable, machine learning learning with Kubeflow
Hassle free, scalable, machine learning learning with KubeflowHassle free, scalable, machine learning learning with Kubeflow
Hassle free, scalable, machine learning learning with KubeflowBarbara Fusinska
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with RBarbara Fusinska
 
Using Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportUsing Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportBarbara Fusinska
 
Networks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowNetworks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowBarbara Fusinska
 
Using Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportUsing Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportBarbara Fusinska
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitBarbara Fusinska
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with RBarbara Fusinska
 
Using Machine Learning and Chatbots to handle 1st line technical support
Using Machine Learning and Chatbots to handle 1st line technical supportUsing Machine Learning and Chatbots to handle 1st line technical support
Using Machine Learning and Chatbots to handle 1st line technical supportBarbara Fusinska
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska
 
A picture speaks a thousand words - Data Visualisation with R
A picture speaks a thousand words - Data Visualisation with RA picture speaks a thousand words - Data Visualisation with R
A picture speaks a thousand words - Data Visualisation with RBarbara Fusinska
 
Predicting the Future as a Service with Azure ML and R
Predicting the Future as a Service with Azure ML and R Predicting the Future as a Service with Azure ML and R
Predicting the Future as a Service with Azure ML and R Barbara Fusinska
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsBarbara Fusinska
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Breaking the eggshell: From .NET to Node.js
Breaking the eggshell: From .NET to Node.jsBreaking the eggshell: From .NET to Node.js
Breaking the eggshell: From .NET to Node.jsBarbara Fusinska
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 

Más de Barbara Fusinska (20)

Hassle free, scalable, machine learning learning with Kubeflow
Hassle free, scalable, machine learning learning with KubeflowHassle free, scalable, machine learning learning with Kubeflow
Hassle free, scalable, machine learning learning with Kubeflow
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with R
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Using Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportUsing Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical Support
 
Networks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlowNetworks are like onions: Practical Deep Learning with TensorFlow
Networks are like onions: Practical Deep Learning with TensorFlow
 
Using Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportUsing Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical Support
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive Toolkit
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with R
 
Using Machine Learning and Chatbots to handle 1st line technical support
Using Machine Learning and Chatbots to handle 1st line technical supportUsing Machine Learning and Chatbots to handle 1st line technical support
Using Machine Learning and Chatbots to handle 1st line technical support
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
A picture speaks a thousand words - Data Visualisation with R
A picture speaks a thousand words - Data Visualisation with RA picture speaks a thousand words - Data Visualisation with R
A picture speaks a thousand words - Data Visualisation with R
 
Predicting the Future as a Service with Azure ML and R
Predicting the Future as a Service with Azure ML and R Predicting the Future as a Service with Azure ML and R
Predicting the Future as a Service with Azure ML and R
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Breaking the eggshell: From .NET to Node.js
Breaking the eggshell: From .NET to Node.jsBreaking the eggshell: From .NET to Node.js
Breaking the eggshell: From .NET to Node.js
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
When the connection fails
When the connection failsWhen the connection fails
When the connection fails
 

Último

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Último (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Machine Learning with R

  • 1. Machine Learning with R Barbara Fusinska @BasiaFusinska
  • 2. About me Programmer Machine Learning Data Solutions Architect @BasiaFusinska https://github.com/BasiaFusinska/MachineLearningWithR
  • 3. Agenda • What’s Machine Learning? • Exploratory Data Analysis • Classification • Clustering • Regression
  • 4. Setup • Install R: https://www.r-project.org/ • Install RStudio: https://www.rstudio.com/ • GitHub repository: https://github.com/BasiaFusinska/Ma chineLearningWithR • Packages
  • 6.
  • 7. Movies Genres Title # Kisses # Kicks Genre Taken 3 47 Action Love story 24 2 Romance P.S. I love you 17 3 Romance Rush hours 5 51 Action Bad boys 7 42 Action Question: What is the genre of Gone with the wind ?
  • 8. Data-based classification Id Feature 1 Feature 2 Class 1. 3 47 A 2. 24 2 B 3. 17 3 B 4. 5 51 A 5. 7 42 A Question: What is the class of the entry with the following features: F1: 31, F2: 4 ?
  • 9. Data Visualization 0 10 20 30 40 50 60 0 10 20 30 40 50 Rule 1: If on the left side of the line then Class = A Rule 2: If on the right side of the line then Class = B A B
  • 11. Supervised learning • Classification, regression • Label, target value • Training & Validation phases
  • 12. Unsupervised learning • Clustering, feature selection • Finding structure of data • Statistical values describing the data
  • 13. Supervised Machine Learning workflow Clean data Data split Machine Learning algorithm Trained model Score Preprocess data Training data Test data
  • 14. Publishing the model Machine Learning Model Model Training Published Machine Learning Model Prediction Training data Publish model Test stream Scores
  • 17. Classification data Source #Links #Characters ... Fake TopNews 10 2750 … T Twitter 2 120 … F TopNews 235 502 … F Channel X 1530 3024 … T Twitter 24 70 … F StoryLeaks 722 1408 … T Facebook 98 230 … T … … … … ... Features Labels
  • 18. Task: Iris EDA • Descriptive statistics (dimensions, rows, columns, data types, correlation) • Data visualization (distributions, outliers) • Features distributions & classes separation • 2D visualisation http://archive.ics.uci.edu/ml/datasets/Iris
  • 19. K-Nearest Neighbours Algorithm • Object is classified by a majority vote • k – algorithm parameter • Distance metrics: Euclidean (continuous variables), Hamming (text) ?
  • 20. Naïve Bayes classifier 𝑝 𝐶 𝑘 𝒙) = 𝑝 𝐶 𝑘 𝑝 𝒙 𝐶 𝑘) 𝑝(𝒙) 𝒙 = (𝑥1, … , 𝑥 𝑘) 𝑝 𝐶 𝑘 𝑥1, … , 𝑥 𝑘) likelihood evidence prior posterior
  • 21. Naïve Bayes example Sex Height Weight Foot size Male 6 190 11 Male 6.2 170 10 Female 5 130 6 … … … … Sex Height Weight Foot size ? 5.9 140 8 𝑝 𝑚𝑎𝑙𝑒 𝒙 = 𝑝 𝑚𝑎𝑙𝑒 𝑝 5.9 𝑚𝑎𝑙𝑒 𝑝 140 𝑚𝑎𝑙𝑒 𝑝(8|𝑚𝑎𝑙𝑒) 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 = 𝑝 𝑚𝑎𝑙𝑒 𝑝 5.9 𝑚𝑎𝑙𝑒 𝑝 140 𝑚𝑎𝑙𝑒 𝑝 8 𝑚𝑎𝑙𝑒 + 𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 5.9 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 140 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝(8|𝑓𝑒𝑚𝑎𝑙𝑒) 𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝒙 = 𝑝 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 5.9 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 140 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝(8|𝑓𝑒𝑚𝑎𝑙𝑒) 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
  • 22. Logistic regression 𝑧 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘 𝑦 = 1 𝑓𝑜𝑟 𝑧 > 0 0 𝑓𝑜𝑟 𝑧 < 0 𝑦 = 1 𝑓𝑜𝑟 𝜙(𝑧) > 0.5 0 𝑓𝑜𝑟 𝜙(𝑧) < 0.5 Logistic function Coefficients Best fit of β
  • 25. Evaluation methods for classification Confusion Matrix Reference Positive Negative Prediction Positive TP FP Negative FN TN Receiver Operating Characteristic curve Area under the curve (AUC) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = #𝑐𝑜𝑟𝑟𝑒𝑐𝑡 #𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁 𝑇𝑁 + 𝐹𝑁 How good at avoiding false alarms How good it is at detecting positives
  • 26. Task: Iris Classification • Data preprocessing • Split data for training and tests sets • Classification using: kNN and Naïve Bayes • Performance evaluation • Results Visualisation
  • 27. Task: Binary Classification • Only two slasses in the dataset (versicolor & virginica) • Classification using logistic regression • Performance evaluation • Results Visualisation
  • 32. Task: Resampling & Tuning • Repeated k-fold cross validation • Use Naïve Bayes as classification algorithm • Tune the parameters using specific values • Performance evaluation
  • 35. Hierarchical clustering • Decision of where the cluster should be split • Metric: distance between pairs of observation • Linkage criterion: dissimilarity of sets
  • 37. Evaluating methods for clustering • Sum of squares • Class based measures • Underlying true
  • 38. Task: Iris Clustering • Clustering using k-means and hierarchies • Compare clusters with the original classes assignments • Visualise the findings
  • 39. Regression problem • Dependent value • Predicting the real value • Fitting the coefficients • Analytical solutions • Gradient descent 𝑓 𝒙 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘
  • 40. Ordinary linear regression Residual sum of squares (RSS) 𝑆 𝑤 = 𝑖=1 𝑛 (𝑦𝑖 − 𝑥𝑖 𝑇 𝑤)2 = 𝑦 − 𝑋𝑤 𝑇 𝑦 − 𝑋𝑤 𝑤 = 𝑎𝑟𝑔 min 𝑤 𝑆(𝑤)
  • 41. Task: Prestige EDA • Descriptive statistics (dimensions, rows, columns, data types, correlation) • Data visualization (distributions, outliers) • Handle missing data • Features significance
  • 42. Evaluation methods for regression • Errors 𝑅𝑀𝑆𝐸 = 𝑖=1 𝑛 (𝑓𝑖 − 𝑦𝑖)2 𝑛 𝑅2 = 1 − (𝑓𝑖 − 𝑦𝑖)2 ( 𝑦 − 𝑦𝑖)2 • Statistics (t, ANOVA)
  • 43. Residuals vs Fitted • Check if residuals have non- linear patterns • Check if the model captures the non-linear relationship • Should show equally spread residuals around the horizontal line
  • 44. Normal Q-Q • Shows if the residuals are normally distributed • Values should be lined on the straight dashed line • Check if residuals do not deviate severely
  • 45. Scale-Location • Show if residuals are spread equally along the ranges of predictors • Test the assumption of equal variance (homoscedasticity) • Should show horizontal line with equally (randomly) spread points
  • 46. Residuals vs Leverage • Helps to find influential cases • When outside of the Cook’s distance the cases are influential • With no influential cases Cook’s distance lines should be barely visible
  • 48. Task: Prestige Regression • Numeric and categorical features • Other than linear relations • Combining the features
  • 49. Categorical data for regression • Categories: A, B, C are coded as dummy variables • In general if the variable has k categories it will be decoded into k-1 dummy variables Category V1 V2 A 0 0 B 1 0 C 0 0 𝑓 𝒙 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑗 𝑥𝑗 + 𝛽𝑗+1 𝑣1 + ⋯ + 𝛽𝑗+𝑘−1 𝑣 𝑘
  • 50. Categorical data for regression 𝑓 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑣1 + ⋯ + 𝛽 𝑘 𝑣 𝑘−1 + 𝛽 𝑘+1 𝑣1 𝑥 + ⋯ + 𝛽2𝑘−1 𝑣 𝑘−1 𝑥 𝑦 ~ 𝑥 + 𝑐𝑎𝑡 + 𝑥: 𝑐𝑎𝑡
  • 51.