SlideShare una empresa de Scribd logo
1 de 103
Descargar para leer sin conexión
by Ilya Kuzovkin
ilya.kuzovkin@gmail.com
Mooncascade ML Camp
2016
Machine Learning
ESSENTIAL CONCEPTS
ONE MACHINE LEARNING USE CASE
Can we ask a computer to
create those patterns
automatically?
Can we ask a computer to
create those patterns
automatically?
Yes
Can we ask a computer to
create those patterns
automatically?
Yes
How?
Raw data
Instance
Raw data
Class (label)
A data sample:
“7”
Instance
Raw data
Class (label)
A data sample:
“7”
How to represent it in a machine-readable form?
Instance
Raw data
Class (label)
A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
Instance
Raw data
Class (label)
A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
28px
28 px
Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”
“2”
“8”
“2”
Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
Dataset
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”
“2”
“8”
“2”
The data is in the right format — what’s next?
The data is in the right format — what’s next?
• C4.5	
• Random	forests	
• Bayesian	networks	
• Hidden	Markov	models	
• Artificial	neural	network	
• Data	clustering	
• Expectation-maximization	
algorithm	
• Self-organizing	map	
• Radial	basis	function	network	
• Vector	Quantization	
• Generative	topographic	map	
• Information	bottleneck	method	
• IBSEAD	
• Apriori	algorithm	
• Eclat	algorithm	
• FP-growth	algorithm	
• Single-linkage	clustering	
• Conceptual	clustering	
• K-means	algorithm	
• Fuzzy	clustering	
• Temporal	difference	learning	
• Q-learning	
• Learning	Automata
• AODE	
• Artificial	neural	network	
• Backpropagation	
• Naive	Bayes	classifier	
• Bayesian	network	
• Bayesian	knowledge	base	
• Case-based	reasoning	
• Decision	trees	
• Inductive	logic	
programming	
• Gaussian	process	regression	
• Gene	expression	
programming	
• Group	method	of	data	
handling	(GMDH)	
• Learning	Automata	
• Learning	Vector	
Quantization	
• Logistic	Model	Tree	
• Decision	tree	
• Decision	graphs	
• Lazy	learning	
• Monte	Carlo	Method	
• SARSA
• Instance-based	learning	
• Nearest	Neighbor	Algorithm	
• Analogical	modeling	
• Probably	approximately	correct	learning	
(PACL)	
• Symbolic	machine	learning	algorithms	
• Subsymbolic	machine	learning	algorithms	
• Support	vector	machines	
• Random	Forest	
• Ensembles	of	classifiers	
• Bootstrap	aggregating	(bagging)	
• Boosting	(meta-algorithm)	
• Ordinal	classification	
• Regression	analysis	
• Information	fuzzy	networks	(IFN)	
• Linear	classifiers	
• Fisher's	linear	discriminant	
• Logistic	regression	
• Naive	Bayes	classifier	
• Perceptron	
• Support	vector	machines	
• Quadratic	classifiers	
• k-nearest	neighbor	
• Boosting
Pick an algorithm
The data is in the right format — what’s next?
• C4.5	
• Random	forests	
• Bayesian	networks	
• Hidden	Markov	models	
• Artificial	neural	network	
• Data	clustering	
• Expectation-maximization	
algorithm	
• Self-organizing	map	
• Radial	basis	function	network	
• Vector	Quantization	
• Generative	topographic	map	
• Information	bottleneck	method	
• IBSEAD	
• Apriori	algorithm	
• Eclat	algorithm	
• FP-growth	algorithm	
• Single-linkage	clustering	
• Conceptual	clustering	
• K-means	algorithm	
• Fuzzy	clustering	
• Temporal	difference	learning	
• Q-learning	
• Learning	Automata
• AODE	
• Artificial	neural	network	
• Backpropagation	
• Naive	Bayes	classifier	
• Bayesian	network	
• Bayesian	knowledge	base	
• Case-based	reasoning	
• Decision	trees	
• Inductive	logic	
programming	
• Gaussian	process	regression	
• Gene	expression	
programming	
• Group	method	of	data	
handling	(GMDH)	
• Learning	Automata	
• Learning	Vector	
Quantization	
• Logistic	Model	Tree	
• Decision	tree	
• Decision	graphs	
• Lazy	learning	
• Monte	Carlo	Method	
• SARSA
• Instance-based	learning	
• Nearest	Neighbor	Algorithm	
• Analogical	modeling	
• Probably	approximately	correct	learning	
(PACL)	
• Symbolic	machine	learning	algorithms	
• Subsymbolic	machine	learning	algorithms	
• Support	vector	machines	
• Random	Forest	
• Ensembles	of	classifiers	
• Bootstrap	aggregating	(bagging)	
• Boosting	(meta-algorithm)	
• Ordinal	classification	
• Regression	analysis	
• Information	fuzzy	networks	(IFN)	
• Linear	classifiers	
• Fisher's	linear	discriminant	
• Logistic	regression	
• Naive	Bayes	classifier	
• Perceptron	
• Support	vector	machines	
• Quadratic	classifiers	
• k-nearest	neighbor	
• Boosting
Pick an algorithm
DECISION TREE
vs.
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
PIXEL
#417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
PIXEL
#417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
>200 <200
PIXEL
#123
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
>200 <200
PIXEL
#123
<100 >100
PIXEL
#123
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL
#417
>200 <200
<100 >100
PIXEL
#123
DECISION TREE
DECISION TREE
ACCURACY
ACCURACY
Confusion matrix
Trueclass
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Trueclass
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an
imbalanced dataset!
Trueclass
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an
imbalanced dataset!
Consider the following model:
“Always predict 2”
Trueclass
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an
imbalanced dataset!
Consider the following model:
“Always predict 2”
Accuracy 0.9
Trueclass
Predicted class
DECISION TREE
DECISION TREE
“You said 100%
accurate?! Every 10th
digit your system
detects is wrong!”
Angry client
DECISION TREE
“You said 100%
accurate?! Every 10th
digit your system
detects is wrong!”
Angry client
We’ve trained our system on the data the client gave us. But our
system has never seen the new data the client applied it to.
And in the real life — it never will…
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
Underfitting!
“Too stupid”
OK
Overfitting!
“Too smart”
OVERFITTING
Underfitting!
“Too stupid”
OK
Overfitting!
“Too smart”
OVERFITTING
Our current decision tree has too much capacity,
it just has memorized all of the data.
Let’s make it less complex.
You probably did not notice, but we are overfitting again :(
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
Use only once to get
the final performance
estimate
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TEST SET
20%
TRAINING SET
60%
VALIDATION SET
20%
TEST SET
20%
TRAINING SET
60%
VALIDATION SET
20%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
}
Take average
validation score
over 10 runs —
it is a more
stable estimate.
MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
“So it is ~87%…erm…
Could you do better?”
MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
“So it is ~87%…erm…
Could you do better?”
Yes
• C4.5	
• Random	forests	
• Bayesian	networks	
• Hidden	Markov	models	
• Artificial	neural	network	
• Data	clustering	
• Expectation-maximization	
algorithm	
• Self-organizing	map	
• Radial	basis	function	network	
• Vector	Quantization	
• Generative	topographic	map	
• Information	bottleneck	method	
• IBSEAD	
• Apriori	algorithm	
• Eclat	algorithm	
• FP-growth	algorithm	
• Single-linkage	clustering	
• Conceptual	clustering	
• K-means	algorithm	
• Fuzzy	clustering	
• Temporal	difference	learning	
• Q-learning	
• Learning	Automata
• AODE	
• Artificial	neural	network	
• Backpropagation	
• Naive	Bayes	classifier	
• Bayesian	network	
• Bayesian	knowledge	base	
• Case-based	reasoning	
• Decision	trees	
• Inductive	logic	
programming	
• Gaussian	process	regression	
• Gene	expression	
programming	
• Group	method	of	data	
handling	(GMDH)	
• Learning	Automata	
• Learning	Vector	
Quantization	
• Logistic	Model	Tree	
• Decision	tree	
• Decision	graphs	
• Lazy	learning	
• Monte	Carlo	Method	
• SARSA
• Instance-based	learning	
• Nearest	Neighbor	Algorithm	
• Analogical	modeling	
• Probably	approximately	correct	learning	
(PACL)	
• Symbolic	machine	learning	algorithms	
• Subsymbolic	machine	learning	algorithms	
• Support	vector	machines	
• Random	Forest	
• Ensembles	of	classifiers	
• Bootstrap	aggregating	(bagging)	
• Boosting	(meta-algorithm)	
• Ordinal	classification	
• Regression	analysis	
• Information	fuzzy	networks	(IFN)	
• Linear	classifiers	
• Fisher's	linear	discriminant	
• Logistic	regression	
• Naive	Bayes	classifier	
• Perceptron	
• Support	vector	machines	
• Quadratic	classifiers	
• k-nearest	neighbor	
• Boosting
Pick another algorithm
• C4.5	
• Random	forests	
• Bayesian	networks	
• Hidden	Markov	models	
• Artificial	neural	network	
• Data	clustering	
• Expectation-maximization	
algorithm	
• Self-organizing	map	
• Radial	basis	function	network	
• Vector	Quantization	
• Generative	topographic	map	
• Information	bottleneck	method	
• IBSEAD	
• Apriori	algorithm	
• Eclat	algorithm	
• FP-growth	algorithm	
• Single-linkage	clustering	
• Conceptual	clustering	
• K-means	algorithm	
• Fuzzy	clustering	
• Temporal	difference	learning	
• Q-learning	
• Learning	Automata
• AODE	
• Artificial	neural	network	
• Backpropagation	
• Naive	Bayes	classifier	
• Bayesian	network	
• Bayesian	knowledge	base	
• Case-based	reasoning	
• Decision	trees	
• Inductive	logic	
programming	
• Gaussian	process	regression	
• Gene	expression	
programming	
• Group	method	of	data	
handling	(GMDH)	
• Learning	Automata	
• Learning	Vector	
Quantization	
• Logistic	Model	Tree	
• Decision	tree	
• Decision	graphs	
• Lazy	learning	
• Monte	Carlo	Method	
• SARSA
• Instance-based	learning	
• Nearest	Neighbor	Algorithm	
• Analogical	modeling	
• Probably	approximately	correct	learning	
(PACL)	
• Symbolic	machine	learning	algorithms	
• Subsymbolic	machine	learning	algorithms	
• Support	vector	machines	
• Random	Forest	
• Ensembles	of	classifiers	
• Bootstrap	aggregating	(bagging)	
• Boosting	(meta-algorithm)	
• Ordinal	classification	
• Regression	analysis	
• Information	fuzzy	networks	(IFN)	
• Linear	classifiers	
• Fisher's	linear	discriminant	
• Logistic	regression	
• Naive	Bayes	classifier	
• Perceptron	
• Support	vector	machines	
• Quadratic	classifiers	
• k-nearest	neighbor	
• Boosting
Pick another algorithm
RANDOM FOREST
RANDOM FOREST
Decision tree:
pick best out of all features
RANDOM FOREST
Decision tree:
pick best out of all features
Random forest:
pick best out of random
subset of features
RANDOM FOREST
RANDOM FOREST
pick best out of another
random subset of features
RANDOM FOREST
pick best out of another
random subset of features pick best out of yet another
random subset of features
RANDOM FOREST
RANDOM FOREST
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
Happy client
ALL OTHER USE CASES
Sound
Frequency
components
Genre
Bag of
words
Topic
Text
Pixel
values
Image
Cat or
dog
Video
Frame
pixels
Walking
or running
Database records Biometric data
Census
data
Average
salary
…
Dead or
alive
HANDS-ON SESSION
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Introduction to Machine Learning @ Mooncascade ML Camp

Más contenido relacionado

Destacado

Destacado (20)

Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...
 
#48 Machine learning
#48 Machine learning#48 Machine learning
#48 Machine learning
 
Machine Learning for Understanding and Managing Ecosystems
Machine Learning for Understanding and Managing EcosystemsMachine Learning for Understanding and Managing Ecosystems
Machine Learning for Understanding and Managing Ecosystems
 
Demystifying Machine Learning - How to give your business superpowers.
Demystifying Machine Learning - How to give your business superpowers.Demystifying Machine Learning - How to give your business superpowers.
Demystifying Machine Learning - How to give your business superpowers.
 
Actividad 02
Actividad 02Actividad 02
Actividad 02
 
Machine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data RepresentationMachine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data Representation
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hype
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
A Nontechnical Introduction to Machine Learning
A Nontechnical Introduction to Machine LearningA Nontechnical Introduction to Machine Learning
A Nontechnical Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Machine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis LiveMachine Learning in Pathology Diagnostics with Simagis Live
Machine Learning in Pathology Diagnostics with Simagis Live
 
A brief history of machine learning
A brief history of  machine learningA brief history of  machine learning
A brief history of machine learning
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016 Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016
 
MLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a ServiceMLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a Service
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 

Similar a Introduction to Machine Learning @ Mooncascade ML Camp

Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
Fayan TAO
 

Similar a Introduction to Machine Learning @ Mooncascade ML Camp (20)

Research overview Oct. 2018
Research overview Oct. 2018Research overview Oct. 2018
Research overview Oct. 2018
 
Lorentz workshop - 2018
Lorentz workshop - 2018Lorentz workshop - 2018
Lorentz workshop - 2018
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Introducing Reactive Machine Learning
Introducing Reactive Machine LearningIntroducing Reactive Machine Learning
Introducing Reactive Machine Learning
 
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
 
Ltc completed slides
Ltc completed slidesLtc completed slides
Ltc completed slides
 
GANS Project for Image idetification.pdf
GANS Project for Image idetification.pdfGANS Project for Image idetification.pdf
GANS Project for Image idetification.pdf
 
An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
4. Classification.pdf
4. Classification.pdf4. Classification.pdf
4. Classification.pdf
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, py
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
20181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday201820181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday2018
 

Más de Ilya Kuzovkin

Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...
Ilya Kuzovkin
 

Más de Ilya Kuzovkin (14)

Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...
 
The Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious SimilaritiesThe Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious Similarities
 
The First Day at the Deep learning Zoo
The First Day at the Deep learning ZooThe First Day at the Deep learning Zoo
The First Day at the Deep learning Zoo
 
Intuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness TheoremIntuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness Theorem
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Deep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsDeep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical Tools
 
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
 
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
 
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
 
Neuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEGNeuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEG
 
Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
Soft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine LearningSoft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine Learning
 
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer InterfacesIlya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Introduction to Machine Learning @ Mooncascade ML Camp

  • 1. by Ilya Kuzovkin ilya.kuzovkin@gmail.com Mooncascade ML Camp 2016 Machine Learning ESSENTIAL CONCEPTS
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Can we ask a computer to create those patterns automatically?
  • 11. Can we ask a computer to create those patterns automatically? Yes
  • 12. Can we ask a computer to create those patterns automatically? Yes How?
  • 14. Instance Raw data Class (label) A data sample: “7”
  • 15. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form?
  • 16. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form? Feature extraction
  • 17. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form? Feature extraction 28px 28 px
  • 18. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction
  • 19. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) (0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0) (0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) (0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0) “7” “2” “8” “2”
  • 20. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) (0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0) (0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset (0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0) “7” “2” “8” “2”
  • 21. The data is in the right format — what’s next?
  • 22. The data is in the right format — what’s next? • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick an algorithm
  • 23. The data is in the right format — what’s next? • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick an algorithm
  • 25. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0)
  • 26. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417
  • 27. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 PIXEL #417 >200 <200
  • 28. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 PIXEL #417 >200 <200
  • 29. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200
  • 30. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 PIXEL #123
  • 31. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 PIXEL #123 <100 >100 PIXEL #123
  • 32. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 <100 >100 PIXEL #123
  • 37. ACCURACY Confusion matrix acc = correctly classified total number of samples Trueclass Predicted class
  • 38. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Trueclass Predicted class
  • 39. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Consider the following model: “Always predict 2” Trueclass Predicted class
  • 40. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Consider the following model: “Always predict 2” Accuracy 0.9 Trueclass Predicted class
  • 42. DECISION TREE “You said 100% accurate?! Every 10th digit your system detects is wrong!” Angry client
  • 43. DECISION TREE “You said 100% accurate?! Every 10th digit your system detects is wrong!” Angry client We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to. And in the real life — it never will…
  • 44. OVERFITTING Simulate the real-life situation — split the dataset
  • 45. OVERFITTING Simulate the real-life situation — split the dataset
  • 46. OVERFITTING Simulate the real-life situation — split the dataset
  • 47. OVERFITTING Simulate the real-life situation — split the dataset
  • 49. Underfitting! “Too stupid” OK Overfitting! “Too smart” OVERFITTING Our current decision tree has too much capacity, it just has memorized all of the data. Let’s make it less complex.
  • 50.
  • 51.
  • 52.
  • 53. You probably did not notice, but we are overfitting again :(
  • 54. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20%
  • 55. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset
  • 56. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters
  • 57. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI
  • 58. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI
  • 59. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI
  • 60. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI TRA VALI
  • 61. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI TRA VALI TRA VALI
  • 62. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting Use only once to get the final performance estimate TRA VALI TRA VALI TRA VALI TRA VALI TRA VALI
  • 65. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20%
  • 66. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set?
  • 67. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80%
  • 68. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15
  • 69. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15 TRAINING VAL TRAINING VAL TRAININGVAL Repeat 10 times
  • 70. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15 TRAINING VAL TRAINING VAL TRAININGVAL Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.
  • 71.
  • 72.
  • 73.
  • 74. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters
  • 75. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters “So it is ~87%…erm… Could you do better?”
  • 76. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters “So it is ~87%…erm… Could you do better?” Yes
  • 77. • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick another algorithm
  • 78. • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick another algorithm
  • 80. RANDOM FOREST Decision tree: pick best out of all features
  • 81. RANDOM FOREST Decision tree: pick best out of all features Random forest: pick best out of random subset of features
  • 83. RANDOM FOREST pick best out of another random subset of features
  • 84. RANDOM FOREST pick best out of another random subset of features pick best out of yet another random subset of features
  • 91.
  • 92.
  • 94. ALL OTHER USE CASES
  • 95. Sound Frequency components Genre Bag of words Topic Text Pixel values Image Cat or dog Video Frame pixels Walking or running Database records Biometric data Census data Average salary … Dead or alive
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.