SlideShare una empresa de Scribd logo
1 de 29
Limits of Machine Learning
P i c k i n g p r o b l e m s t o s o l v e w i t h M L
M E I R M A O R
C h i e f A r c h i t e c t @ S p a r k B e y o n d
About Me
Meir Maor
Chief Architect @ SparkBeyond
At SparkBeyond we leverage the collective human knowledge to solve the world's
toughest problems
AI the current frontier
What can and can not be done with Machine Learning
Practical advice for setting up Machine Learning Problems
Feature Engineering
Hyper Parameter tuning
Model Selection
Training Huge Neural Netwoks
AI is taking over the wold
Well, not quite, but ...
With no knowledge but the rules and typical game length, Alpha Zero learned to play
both Go and Chess at Super Human level
And more.
Single Sentence translation Chinese<->English at human level
ImageNet Image classification at super human ability
Finding problems in Non-Disclousre-Agreements at human level
Cancer early warning, Churn prediction, Ad optimization, Predictive maintenance,
and many many more.
Predicting is Hard, Especially the Future - N.Bohr
We use machine learning to:
Predict outcome, unseen behavior, future events
To automate human tasks, classify, label, prioritize
To Assume makes an Ass out of U and Me
Common assumptions in Machine Learning:
* Data comes from stationary distribution(train and test have same distribution)
* Training samples are independent
* Past actions were random, no hidden confounders
The unreasonable effectiveness of Data
With plenty of data machine learning becomes easy
The recent rise of Deep Learning, is not from algorithmic advancements but rather
the existence of large data sets and computers which can process them
With enough data you are essentially learning from examples almost identical to
those you later need to predict/classify
The perfect fit
I have tremendous computing power, let’s find a function which perfectly describes
my data
PAC model
No Free Lunch
Better generalization
Leslie Valiant, 1984
Compromise
Bias / Variance tradeoff → We must limit our search space
Shrink the hypothesis space:
limit boosting iterations
tree size
min sample in leaf
number of hidden nodes
impose sparsity constraint
...
Compromise cont.
Penalize “less favourable” models:
Lasso / ridge regularization
Bagging / Boot strap sampling
Drop out
Needle in a Hay Stack
All the Hay in the world won’t teach you what a needle looks like
It’s important to have enough samples of the rare class
What is normal?
Unsupervised learning tries to find a needle using only hay.
Abnormal Hay?
- Extra long?
- a piece of grain?
- Too long in the sun?
- Still green?
- A Needle!
Transfer Learning
Learning to solve a problem, and applying on a similar different problem
Any time samples don’t come from exactly the same distribution:
Learning from one area and applying on the next
Learning from the past and applying on the future
Learning from non random sample
Transfer Learning to the rescue
We can and must learn from previous problems
How can a child learn to identify a Ring Tailed Lemur from a single photo?
The computer isn’t there yet
We can use pre-trained embeddings, and pre-trained networks but only for similar
problems. Good results for text, some for images(similar domain), not so much
beyond.
Even with excellent help can’t learn a new animal from a single photo
What If?
When we are wondering about future actions, and past actions were not random we
are doing transfer learning.
Randomized Study
Always preferable to train machine learning on data were actions were random
Explore vs Exploit
When was that?
We want to learn only with what we will have.
We must know what did we know back then?
Mining for Unobtainium*
A client in the never never land want to find new Unobtainium deposits in the
never-never lands.
A large part of the the land has been explored and we have a map of the mines
Many areas were not explored, we have no Map
* Identifying client details were changed
Modelling Take 1
Place a grid on the never-never land map
All grid square with a known deposit are positive
Since Unobtainium is rare all others can be assumed to be negative
Use advanced imaging, radiometric, magnetic, topographic maps, geological
maps, and more for explaining variables.
99% AUC!! We are going to be rich!
Using topographic data, a big hole in the ground predicts a large deposit perfectly.
We are detecting existing active mines.
Back to the archives to find 50 year old maps from before most mines were open.
96% AUC! We are going to be rich!
Distance from roads, Is an excellent predictor.
Not only do all existing mines have roads to them
Past exploration was primarily in accessible areas
Removing roads is not enough, They are hidden in all the data.
Observational study, Simpson’s Paradox
Obviously we should look at the break down
Low Birth-Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Normal Birth Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Ergo. Smoking is good for your baby!
Summary
Need Stationary distribution, need to predict on samples like those trained on
Need Data, especially of rare events
Need to be aware of changing world
Measure&evaluate everything!

Más contenido relacionado

Similar a Limits of Machine Learning

Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...John Mathon
 
Workshop Slides: Introduction to Innovation
Workshop Slides: Introduction to InnovationWorkshop Slides: Introduction to Innovation
Workshop Slides: Introduction to InnovationSC CTSI at USC and CHLA
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningRahul Sahai
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...Mnemonics and self-pedagogics for law students - Passing the bar with two wee...
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...T. Alexander Puutio
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
What every teacher should know about cognitive science
What every teacher should know about cognitive scienceWhat every teacher should know about cognitive science
What every teacher should know about cognitive scienceStephanie Chasteen
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Artificial intelligence intro cp 1
Artificial intelligence  intro  cp  1Artificial intelligence  intro  cp  1
Artificial intelligence intro cp 1M S Prasad
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
The road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way streetThe road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way streetUniversity of Minnesota, Duluth
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Miningebelani
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptxJeeva Nantham
 
Artificail Intelligent lec-1
Artificail Intelligent lec-1Artificail Intelligent lec-1
Artificail Intelligent lec-1tjunicornfx
 

Similar a Limits of Machine Learning (20)

Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
 
Workshop Slides: Introduction to Innovation
Workshop Slides: Introduction to InnovationWorkshop Slides: Introduction to Innovation
Workshop Slides: Introduction to Innovation
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...Mnemonics and self-pedagogics for law students - Passing the bar with two wee...
Mnemonics and self-pedagogics for law students - Passing the bar with two wee...
 
Year 1 AI.ppt
Year 1 AI.pptYear 1 AI.ppt
Year 1 AI.ppt
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
What every teacher should know about cognitive science
What every teacher should know about cognitive scienceWhat every teacher should know about cognitive science
What every teacher should know about cognitive science
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Artificial intelligence intro cp 1
Artificial intelligence  intro  cp  1Artificial intelligence  intro  cp  1
Artificial intelligence intro cp 1
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
The road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way streetThe road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way street
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptx
 
Artificail Intelligent lec-1
Artificail Intelligent lec-1Artificail Intelligent lec-1
Artificail Intelligent lec-1
 

Último

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Último (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Limits of Machine Learning

  • 1. Limits of Machine Learning P i c k i n g p r o b l e m s t o s o l v e w i t h M L M E I R M A O R C h i e f A r c h i t e c t @ S p a r k B e y o n d
  • 2. About Me Meir Maor Chief Architect @ SparkBeyond At SparkBeyond we leverage the collective human knowledge to solve the world's toughest problems
  • 3. AI the current frontier What can and can not be done with Machine Learning Practical advice for setting up Machine Learning Problems Feature Engineering Hyper Parameter tuning Model Selection Training Huge Neural Netwoks
  • 4. AI is taking over the wold
  • 5. Well, not quite, but ... With no knowledge but the rules and typical game length, Alpha Zero learned to play both Go and Chess at Super Human level
  • 6. And more. Single Sentence translation Chinese<->English at human level ImageNet Image classification at super human ability Finding problems in Non-Disclousre-Agreements at human level Cancer early warning, Churn prediction, Ad optimization, Predictive maintenance, and many many more.
  • 7. Predicting is Hard, Especially the Future - N.Bohr We use machine learning to: Predict outcome, unseen behavior, future events To automate human tasks, classify, label, prioritize
  • 8. To Assume makes an Ass out of U and Me Common assumptions in Machine Learning: * Data comes from stationary distribution(train and test have same distribution) * Training samples are independent * Past actions were random, no hidden confounders
  • 9. The unreasonable effectiveness of Data With plenty of data machine learning becomes easy The recent rise of Deep Learning, is not from algorithmic advancements but rather the existence of large data sets and computers which can process them With enough data you are essentially learning from examples almost identical to those you later need to predict/classify
  • 10. The perfect fit I have tremendous computing power, let’s find a function which perfectly describes my data PAC model No Free Lunch Better generalization Leslie Valiant, 1984
  • 11. Compromise Bias / Variance tradeoff → We must limit our search space Shrink the hypothesis space: limit boosting iterations tree size min sample in leaf number of hidden nodes impose sparsity constraint ...
  • 12. Compromise cont. Penalize “less favourable” models: Lasso / ridge regularization Bagging / Boot strap sampling Drop out
  • 13. Needle in a Hay Stack All the Hay in the world won’t teach you what a needle looks like It’s important to have enough samples of the rare class
  • 14. What is normal? Unsupervised learning tries to find a needle using only hay. Abnormal Hay? - Extra long? - a piece of grain? - Too long in the sun? - Still green? - A Needle!
  • 15. Transfer Learning Learning to solve a problem, and applying on a similar different problem Any time samples don’t come from exactly the same distribution: Learning from one area and applying on the next Learning from the past and applying on the future Learning from non random sample
  • 16. Transfer Learning to the rescue We can and must learn from previous problems How can a child learn to identify a Ring Tailed Lemur from a single photo?
  • 17. The computer isn’t there yet We can use pre-trained embeddings, and pre-trained networks but only for similar problems. Good results for text, some for images(similar domain), not so much beyond. Even with excellent help can’t learn a new animal from a single photo
  • 18. What If? When we are wondering about future actions, and past actions were not random we are doing transfer learning.
  • 19. Randomized Study Always preferable to train machine learning on data were actions were random
  • 21. When was that? We want to learn only with what we will have. We must know what did we know back then?
  • 22. Mining for Unobtainium* A client in the never never land want to find new Unobtainium deposits in the never-never lands. A large part of the the land has been explored and we have a map of the mines Many areas were not explored, we have no Map * Identifying client details were changed
  • 23. Modelling Take 1 Place a grid on the never-never land map All grid square with a known deposit are positive Since Unobtainium is rare all others can be assumed to be negative Use advanced imaging, radiometric, magnetic, topographic maps, geological maps, and more for explaining variables.
  • 24. 99% AUC!! We are going to be rich! Using topographic data, a big hole in the ground predicts a large deposit perfectly. We are detecting existing active mines. Back to the archives to find 50 year old maps from before most mines were open.
  • 25. 96% AUC! We are going to be rich! Distance from roads, Is an excellent predictor. Not only do all existing mines have roads to them Past exploration was primarily in accessible areas Removing roads is not enough, They are hidden in all the data.
  • 26.
  • 28. Obviously we should look at the break down Low Birth-Weight babies born to smoking mothers have lower mortality rates compared to similar weight babies from non smokers. Normal Birth Weight babies born to smoking mothers have lower mortality rates compared to similar weight babies from non smokers. Ergo. Smoking is good for your baby!
  • 29. Summary Need Stationary distribution, need to predict on samples like those trained on Need Data, especially of rare events Need to be aware of changing world Measure&evaluate everything!

Notas del editor

  1. &amp;lt;number&amp;gt;
  2. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  3. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  4. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  5. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  6. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  7. Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  8. background/foregroud, 3d object rotations, identifying limbs and their motion hidden parts,...