SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Introduction To
Machine Learning
Chun Ming Chin
Microsoft Ventures - @MSFTVentures
Chun Ming Chin - @chinchunming
(Machine Learning Workshop by Microsoft Ventures)
Objectives
•Understand why is machine learning important
•Learn how to apply machine learning in your use case
•Adopt best practices in machine learning
Overview
• Why Machine Learning
• What is Machine Learning
• Frame common tasks as machine learning problems
• Example 1: Mobile Optical Character Recognition on Asian text
• Example 2: Predict Housing Rental Prices
• Accuracy Issues (i.e. Generalization)
• Solutions to Generalization
• Putting it all together: Machine learning in stock trading
• Machine Learning Best Practices
Why Machine Learning?
1. Increase barrier to entry when product/service quality is dependent
on data
Product/
Service
Users
Data
Increase
quality/quantity
Why Machine Learning?
2. Automate human operations to increase productivity and lower cost
• Example: Auto identify and ban bots that sign up on your website
• Use Case: Consider rules based approach first. When tasks cannot be completed
with specific rules, then use ML.
_______@______.com
Why Machine Learning?
3. Customize product/service to increase engagement and profits
• Examples: Customize sales page to increase conversion rates for online information
product
A
B
C
D
E
F
A
B
C
D
E
F
Machine Learning Example
Chinese Traditional
(Sophisticated and big)
Japanese
(Squiggly and cute) か き け こ さ し す せ
婆 魔 佛 特 級 氣 喜 歡
Features in 2D space
Chinese
Japanese
No.ofblackpixelsinimage
No. of straight lines in image
Goal: Get computer to classify input image as
Chinese or Japanese.
• Features: Characteristics of the image/
measurements from data
e.g. No. of black pixels/ orientation of strokes on images
Label 2
Label 1
ML Terminology
• Data point = Sample = Example
• Labels/ Classes/ Categories:
• Discrete (e.g. Optical Character Recognition)
• Continuous (e.g. Housing prices)
• Classification/decision boundary:
• Separates regions of feature space
• Hopefully helps separate different classes
Features in 2D space
Indexiofdatapoint
Feature dimension j
𝑉𝑉 =
Label
Data Point
Decision boundary
What is Machine Learning?
Unsupervised learning
• Algos that operate on unlabelled examples
• Discover structure/ patterns in the data.
Supervised Learning
• Algos trained on labelled examples
• Predict an output for previously unseen
inputs.
Supervised Machine Learning (Classification)
Measurements (features) &
associated class labels
Training Data Set
Training stage (Usually offline)
Training
algorithm 𝑓𝑓 𝑥𝑥
Structure + Parameters
Learned Model
Input Test Data Point
Measurements (features) only
𝑓𝑓 𝑥𝑥
Predicted Class Label
Testing stage (Run time, online)
Mobile Optical Character Recognition of Asian Text
Input test images Image classification
Me Competition
Expense Middle
Classifier
𝑓𝑓 𝑥𝑥
Image Measurements:: Orientation of strokes in image/ spatial position of pixels
Use Case: Scale up a product concept with trade off on accuracy.
Supervised Machine Learning (Regression)
Measurements (features) &
associated continuous labels
Training Data Set
Training stage
Training
algorithm 𝑓𝑓 𝑥𝑥
Structure + Parameters
Learned Model
Input Test Data Point
Measurements (features) only
𝑓𝑓 𝑥𝑥
Testing stage
Continuous value
Output
Example: Predict rental prices based on house area (Sq ft)
Training stage
1. Raw Input Data:
2. Use training algorithm from Python’s ML library.
3. Get resulting ML model 𝑓𝑓 𝑥𝑥
Use Case: Make predictions based on historical data
Case Study: Predict rental prices based on house area (Sq ft)
Rental Price ($)
Feature: Area
Regressor
𝑓𝑓 𝑥𝑥
Cheap
Expensive
Testing stage
1800 Sqft
RentalPrice($)
Area (Square Feet)
Optimization with Objective Function
Iteration 1
Iteration 2
Iteration 3
Iteration 4
RentalPrice($)
Area (Square Feet)
Generalization Issues
Legend
Test data point
Train data point
Generalization Issues
Under fitting:
• Number of features used is too small
• There are patterns in the data that algorithm is unable to fit
Over fitting:
• Number of features used is too large
• Fitting serious patterns in the training data set rather than capture true underlying
trends
Under fitting in Regression Over fitting in Regression
Under/Over Fitting in House Rental Prices Prediction
Under fitting in Classification Over fitting in Classification
Under/Over Fitting in Optical Character
Recognition
Outlier
Generalization
Over fitUnder fit
Test Error
Best
generalization
No. of iterations
Error
Training Error
Fixes for ML algorithms
Solutions to accuracy issues prioritized descending order of sensitivity to classification error:
1. Training data improvement
• Get more training examples (Fixes over fitting)
• Ensure training data is high quality (De-noise training data)
2. Modify objective function
3. Feature engineering
• Increase/reduce number of features (Fixes under/over fitting)
• Change features used (Fixes under fitting)
4. Optimization algorithm
• Change the ML model used (SVMs , Decision trees, neural network, etc.)
• Run optimization algo for more iterations to ensure it converges
Solution: Increase amount of training data
Test
Error
Best
generalization
No. of iterations
Error
Training
Error
Test Error
Best
generalization
No. of iterations
Error
Training Error
Before After
Feature Engineering: Increase number of features
• Combine features: 𝑥𝑥3 = 𝑥𝑥1x 𝑥𝑥2
• Convert continuous features into categorical features (i.e. Bucketize feature
values)
• Create a new feature as an indicator for missing values in another feature and
supply a default value to the missing feature value.
• To address non-linearly separable data, use non-linear features (e.g. For original
feature x, add derived feature 𝑥𝑥2
. But not advisable beyond degree 2).
Feature Engineering: Reduce number of features
Reduce no. of features and identify most important features
• Data is more compact and dense
• Can train and classify faster
• Can improve accuracy
Filter out stroke orientation
information along 0, 45, 90
and 135 degrees.
Extract Feature
Feature vector dimension:
5 (No. of sub blocks per row)
x 5 (No. of sub blocks per column)
x 2 (Calculate avg & var per sub block)
x 4 (No. of orientations)
= 200
Feature Engineering Example:
Asian Optical Character Recognition
• Use domain knowledge to choose features that distinguishes different classes well
• Read academic papers to understand prior work in the field
Solution: Try different ML models
(i.e. Optimization algorithms)
Decision Factors
• Ease of training/ testing
• Ease of debugging
• Model size (Memory constraints)
• Accuracy/ generalization potential
• Data characteristics (e.g. For non-linearly separable data, use non-linear
models)
Support Vector Machines (SVM)
Why:
• Linear and non-linear classification
• Best for binary (i.e. 2 class) classification
though can be used for multi-class scenario
• Easy to train
• Guaranteed global optimum
• Scales well to high dimensional data
What:
• Find decision boundary that maximizes
margin between classes. Boundary only
determined by nearby data points
Support Vectors
Support Vector Machines (SVM)
Choosing correct kernel function for non linear SVM (Original video from Udi Aharoni here:
https://www.youtube.com/watch?v=3liCbRZPrZA)
1. Find non linear
boundary that
separates blue from red
data points in 2D space
2. Map data points into
3D space using
polynomial kernel.
3. Linearly separate
data points in 3D space
using a plane.
4. Map back to 2D
space to determine non
linear boundary.
Decision Trees
Why:
• Non linear classification & regression
• Pros:
• Easy to understand and debug
• Finds most important features in data
• Requires little data preparation
• Cons:
• Memory concerns limit accuracy of
decision trees. Deeper trees, higher
test accuracy. But trees grow
exponentially with depth
What:
• Partition feature space into smaller pieces
• Learn tree structure and split functions
Root
Node
C = 11
J = 1
𝑥𝑥1 > 140𝑥𝑥1 < 140
𝑥𝑥2 > 140𝑥𝑥2 < 140
C = 3
J = 13
C = 2
J = 1
C = 1
J = 12
Split function
Decision Forests
Why:
• Solves instability in decision trees - Small variations in data can generate a different tree
• Improves memory-accuracy tradeoff as trees can be parallelized.
What:
• A collection (i.e. Ensemble) of trees
• Aggregate predictions across all trees.
(Deep) Neural Networks
Why:
• For non linear classification & regression
• Pros:
• Fast in testing stage
• Robust to noise
• Cons:
• Slow in training stage
• Only guarantees local optima
What
• Sequence of non-linear combinations of extremely
simple computations with high connectivity
Input layer
Output layer
Hidden (Deep)
Layers
Model selection: Use Cross Validation
Indexiofdatapoint
Feature Dimension j Label
Validation Set
Training Set
Held Out Cross Validation
1. Randomly split all data into 2 subsets:
• Training set (70%)
• Validation set (30%)
2. Train machine learning model on training set.
3. Pick model with lowest error on validation set.
K Fold Cross Validation
1. Divide data into K pieces
2. Train on K – 1 pieces
3. Validate on remaining piece
4. Average over K results to get generalization error of
model
91.66% 93.10%
21.40%
89.00% 89.00%
92.70%
75.00%
87.04%
0.02%
35.96% 35.96%
85.92%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Nearest
Neighbor
Linear SVM Sigmoid SVM Polynomial SVM RBF SVM Intersection
SVM
Asian Optical Character Recognition Validation Results
With reduced feature dimension Without reduced feature dimension
Putting it all together:
Machine Learning in Stock Trading
Use case: Machines see patterns in big data faster and perform tasks faster than
humans.
Case Study: Machine Learning in Stock Trading
• Goal: Create profitable trading strategy
• Data: Use company (e.g. United Airlines) stock data from Wharton
Research Data Services (WRDS)
https://wrdsweb.wharton.upenn.edu/wrds/
• Implementation: Predict closing price on next time step based on
information from current time step
*Disclaimer: This may not make money
Case Study: Machine Learning in Stock Trading
Training stage
Training
algorithm
𝑓𝑓 𝑥𝑥
Structure + Parameters
Learned Model
Case Study: Machine Learning in Stock Trading
Buy
Trade price at next time step
> Trade price of the current
time step (i.e. positive
returns). Label = +1.
Hold
Trade price at next time step
= Trade price of current time
step. Label 0.
Sell
Trade price at next time step
< Trade price of the current
time step (i.e. negative
returns). Label = -1.
Measurements: Average trade price per sec,
standard deviation of trade price per sec
Trade Price
Testing stage
Classifier
𝑓𝑓 𝑥𝑥
Training Data Improvements
o Terminology
• Trade price: Price at which shares last traded hands
• Bid price: Price a buyer is willing to pay
• Bid size: No. of shares available at bid price
• Offer price: Price a seller is looking to get
• Offer size: No. of shares available at offer price
o Intuition… trade price next second should
depend on the bid-offer information now.
o Add bid and offer price data (That is within 5%
of trade price) to training data
Feature Engineering
Time = 9:30:03 am Time = 9:30:04 am Time = 9:30:05 am
United Airlines Inc. (Ticker Symbol UAL) Date: 2011 Dec 01
• Extract measurements from the distribution of the bid-offer curve at
each second window of the bid-offer curve at each second window.
Bid
Offer
Bid
Offer
Bid
Offer
Machine Learning Best Practices
Combine human intuition/ wisdom with machine speed/ pattern recognition.
• Use domain knowledge to choose features that distinguishes different classes well
• Add specific rules to process input data before training stage/ output data after test
stage. (In contrast with generalized rules from machine learning) Reduces training
time when data is pre processed instead of letting ML model learn the patterns
eventually.
Useful Machine Learning Tools
• Python installer for Windows with all necessary ML libraries (e.g.,
SciPy, NumPy, etc.) http://winpython.sourceforge.net/
• http://www.r-project.org/ R project for statistical computing
• http://prediction.io/ Create predictive features, such as
personalization, recommendation and content discovery
• http://www.tableausoftware.com/ Enables you to visually analyze
your data
Unsupervised Learning Application Scenarios
(e.g. MinHash Clustering, Matrix Factorization, Dimensionality Reduction)
1. Simplify your data so as to provide insights for 3rd party businesses.
2. Interpret data to test assumptions about your user’s behavior/ market
3. Cluster your customers into different groups with different needs so as to
increase monetization.
4. Make more informed business decisions for your startup/ customers.
More Best Machine Learning Practices
• Having simple models with clear explanations is better than complicated moel with unclear explanations for
debugging purposes.
• To address non-linearly separable data: Use non linear features/ classifiers.
• Don’t confuse cause and effect with noise from your data. Consider using certain statistical tests (e.g.
McNemmar’s statistics) to check whether your result is statistically significant.
• If your training data is too small, it can cause over fitting problem.
• Modularize machine learning model so that it is easy for new people to see conveniently and experiment
easily.
• Have a common baseline when comparing improvements to your machine learned model. Common baselines
enable you to share resources when comparing between different techniques.
• Modularize your code such that is easy for new people to experiment with different techniques (Parameter
sweeping etc.) quickly. Define inputs/ outputs/ training parameters clearly.
• Compare to natural baselines. Guess global average for items ratings. Suggest globally popular items.
• You can use UI/ UX to your advantage to find hacks around the balance issue of computational speed
(Latency) and memory capacity.
• Incrementally update your ratings using Stochastic Gradient descent. (i.e. As I get new observations, I’ll
update for that user and item only). An alternative is weekly batch retraining.
• The more expressive your model, the less expressive your features need to be. The less expressive your
model, the more expressive your features need to be.
• Think about scaling early:
1. Sample a subset of ratings for each user so that you can handle the matrix
in memory.
2. Use MinHash to cluster users (DDGR07)
3. Distribute calculations with Map Reduce
4. Distribute matrix operations with Map Reduce [GHNS11]
5. Parallelize stochastic gradient descent [ZWSL10]
6. Expectation-maximization for pLSI with MapReduce [DDGR07]
Note: Niche vs general – tf-idf. Both of us watching a niche movie should mean
more than if I watch a popular movie. For practical constraints, people use item
based similarity very often.
More Best Machine Learning Practices

Más contenido relacionado

La actualidad más candente

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Adversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaAdversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaRupam Bhattacharya
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Demystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersDemystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersMicrosoft Tech Community
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning OverviewMykhailo Koval
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
Machine Learning under Attack: Vulnerability Exploitation and Security MeasuresMachine Learning under Attack: Vulnerability Exploitation and Security Measures
Machine Learning under Attack: Vulnerability Exploitation and Security MeasuresPluribus One
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 

La actualidad más candente (20)

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Adversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaAdversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam Bhattacharya
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Demystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersDemystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for Developers
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
Machine Learning under Attack: Vulnerability Exploitation and Security MeasuresMachine Learning under Attack: Vulnerability Exploitation and Security Measures
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 

Destacado

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShao-Chuan Wang
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
Introduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseIntroduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseMuhammad Imran
 
Ultrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle reviewUltrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle reviewEduard Tyantov
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learningbutest
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3Srinivasan R
 
Sensitive skin
Sensitive skinSensitive skin
Sensitive skinVivek Jha
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 

Destacado (9)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Introduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseIntroduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster Response
 
Ultrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle reviewUltrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle review
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learning
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Sensitive skin
Sensitive skinSensitive skin
Sensitive skin
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 

Similar a Intro to Machine Learning by Microsoft Ventures

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1arthi v
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selectionDavis David
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 

Similar a Intro to Machine Learning by Microsoft Ventures (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
PPT s09-machine vision-s2
PPT s09-machine vision-s2PPT s09-machine vision-s2
PPT s09-machine vision-s2
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 

Más de microsoftventures

Más de microsoftventures (12)

Spain Startup Ecosystem
Spain Startup EcosystemSpain Startup Ecosystem
Spain Startup Ecosystem
 
Navigating the Legal World
Navigating the Legal WorldNavigating the Legal World
Navigating the Legal World
 
Setting Target Markets
Setting Target MarketsSetting Target Markets
Setting Target Markets
 
Networking the Force Multiplier
Networking the Force MultiplierNetworking the Force Multiplier
Networking the Force Multiplier
 
Focusing on Growth Through Business Development
Focusing on Growth Through Business DevelopmentFocusing on Growth Through Business Development
Focusing on Growth Through Business Development
 
Focusing on Growth
Focusing on GrowthFocusing on Growth
Focusing on Growth
 
Harnessing the Power of Social Media
Harnessing the Power of Social MediaHarnessing the Power of Social Media
Harnessing the Power of Social Media
 
Investments 101
Investments 101Investments 101
Investments 101
 
Positioning for Success + Hing a Team
Positioning for Success + Hing a TeamPositioning for Success + Hing a Team
Positioning for Success + Hing a Team
 
Finding Potential Customers
Finding Potential CustomersFinding Potential Customers
Finding Potential Customers
 
Your Big Idea
Your Big IdeaYour Big Idea
Your Big Idea
 
Technology M&A Monitor - India 2014
Technology M&A Monitor - India 2014Technology M&A Monitor - India 2014
Technology M&A Monitor - India 2014
 

Último

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Intro to Machine Learning by Microsoft Ventures

  • 1. Introduction To Machine Learning Chun Ming Chin Microsoft Ventures - @MSFTVentures Chun Ming Chin - @chinchunming (Machine Learning Workshop by Microsoft Ventures)
  • 2. Objectives •Understand why is machine learning important •Learn how to apply machine learning in your use case •Adopt best practices in machine learning
  • 3. Overview • Why Machine Learning • What is Machine Learning • Frame common tasks as machine learning problems • Example 1: Mobile Optical Character Recognition on Asian text • Example 2: Predict Housing Rental Prices • Accuracy Issues (i.e. Generalization) • Solutions to Generalization • Putting it all together: Machine learning in stock trading • Machine Learning Best Practices
  • 4. Why Machine Learning? 1. Increase barrier to entry when product/service quality is dependent on data Product/ Service Users Data Increase quality/quantity
  • 5. Why Machine Learning? 2. Automate human operations to increase productivity and lower cost • Example: Auto identify and ban bots that sign up on your website • Use Case: Consider rules based approach first. When tasks cannot be completed with specific rules, then use ML. _______@______.com
  • 6. Why Machine Learning? 3. Customize product/service to increase engagement and profits • Examples: Customize sales page to increase conversion rates for online information product A B C D E F A B C D E F
  • 7. Machine Learning Example Chinese Traditional (Sophisticated and big) Japanese (Squiggly and cute) か き け こ さ し す せ 婆 魔 佛 特 級 氣 喜 歡 Features in 2D space Chinese Japanese No.ofblackpixelsinimage No. of straight lines in image Goal: Get computer to classify input image as Chinese or Japanese. • Features: Characteristics of the image/ measurements from data e.g. No. of black pixels/ orientation of strokes on images
  • 8. Label 2 Label 1 ML Terminology • Data point = Sample = Example • Labels/ Classes/ Categories: • Discrete (e.g. Optical Character Recognition) • Continuous (e.g. Housing prices) • Classification/decision boundary: • Separates regions of feature space • Hopefully helps separate different classes Features in 2D space Indexiofdatapoint Feature dimension j 𝑉𝑉 = Label Data Point Decision boundary
  • 9. What is Machine Learning? Unsupervised learning • Algos that operate on unlabelled examples • Discover structure/ patterns in the data. Supervised Learning • Algos trained on labelled examples • Predict an output for previously unseen inputs.
  • 10. Supervised Machine Learning (Classification) Measurements (features) & associated class labels Training Data Set Training stage (Usually offline) Training algorithm 𝑓𝑓 𝑥𝑥 Structure + Parameters Learned Model Input Test Data Point Measurements (features) only 𝑓𝑓 𝑥𝑥 Predicted Class Label Testing stage (Run time, online)
  • 11. Mobile Optical Character Recognition of Asian Text Input test images Image classification Me Competition Expense Middle Classifier 𝑓𝑓 𝑥𝑥 Image Measurements:: Orientation of strokes in image/ spatial position of pixels Use Case: Scale up a product concept with trade off on accuracy.
  • 12. Supervised Machine Learning (Regression) Measurements (features) & associated continuous labels Training Data Set Training stage Training algorithm 𝑓𝑓 𝑥𝑥 Structure + Parameters Learned Model Input Test Data Point Measurements (features) only 𝑓𝑓 𝑥𝑥 Testing stage Continuous value Output
  • 13. Example: Predict rental prices based on house area (Sq ft) Training stage 1. Raw Input Data: 2. Use training algorithm from Python’s ML library. 3. Get resulting ML model 𝑓𝑓 𝑥𝑥 Use Case: Make predictions based on historical data
  • 14. Case Study: Predict rental prices based on house area (Sq ft) Rental Price ($) Feature: Area Regressor 𝑓𝑓 𝑥𝑥 Cheap Expensive Testing stage 1800 Sqft
  • 15. RentalPrice($) Area (Square Feet) Optimization with Objective Function Iteration 1 Iteration 2 Iteration 3 Iteration 4
  • 16. RentalPrice($) Area (Square Feet) Generalization Issues Legend Test data point Train data point
  • 17. Generalization Issues Under fitting: • Number of features used is too small • There are patterns in the data that algorithm is unable to fit Over fitting: • Number of features used is too large • Fitting serious patterns in the training data set rather than capture true underlying trends
  • 18. Under fitting in Regression Over fitting in Regression Under/Over Fitting in House Rental Prices Prediction
  • 19. Under fitting in Classification Over fitting in Classification Under/Over Fitting in Optical Character Recognition Outlier
  • 20. Generalization Over fitUnder fit Test Error Best generalization No. of iterations Error Training Error
  • 21. Fixes for ML algorithms Solutions to accuracy issues prioritized descending order of sensitivity to classification error: 1. Training data improvement • Get more training examples (Fixes over fitting) • Ensure training data is high quality (De-noise training data) 2. Modify objective function 3. Feature engineering • Increase/reduce number of features (Fixes under/over fitting) • Change features used (Fixes under fitting) 4. Optimization algorithm • Change the ML model used (SVMs , Decision trees, neural network, etc.) • Run optimization algo for more iterations to ensure it converges
  • 22. Solution: Increase amount of training data Test Error Best generalization No. of iterations Error Training Error Test Error Best generalization No. of iterations Error Training Error Before After
  • 23. Feature Engineering: Increase number of features • Combine features: 𝑥𝑥3 = 𝑥𝑥1x 𝑥𝑥2 • Convert continuous features into categorical features (i.e. Bucketize feature values) • Create a new feature as an indicator for missing values in another feature and supply a default value to the missing feature value. • To address non-linearly separable data, use non-linear features (e.g. For original feature x, add derived feature 𝑥𝑥2 . But not advisable beyond degree 2).
  • 24. Feature Engineering: Reduce number of features Reduce no. of features and identify most important features • Data is more compact and dense • Can train and classify faster • Can improve accuracy
  • 25. Filter out stroke orientation information along 0, 45, 90 and 135 degrees. Extract Feature Feature vector dimension: 5 (No. of sub blocks per row) x 5 (No. of sub blocks per column) x 2 (Calculate avg & var per sub block) x 4 (No. of orientations) = 200 Feature Engineering Example: Asian Optical Character Recognition • Use domain knowledge to choose features that distinguishes different classes well • Read academic papers to understand prior work in the field
  • 26. Solution: Try different ML models (i.e. Optimization algorithms) Decision Factors • Ease of training/ testing • Ease of debugging • Model size (Memory constraints) • Accuracy/ generalization potential • Data characteristics (e.g. For non-linearly separable data, use non-linear models)
  • 27. Support Vector Machines (SVM) Why: • Linear and non-linear classification • Best for binary (i.e. 2 class) classification though can be used for multi-class scenario • Easy to train • Guaranteed global optimum • Scales well to high dimensional data What: • Find decision boundary that maximizes margin between classes. Boundary only determined by nearby data points Support Vectors
  • 28. Support Vector Machines (SVM) Choosing correct kernel function for non linear SVM (Original video from Udi Aharoni here: https://www.youtube.com/watch?v=3liCbRZPrZA) 1. Find non linear boundary that separates blue from red data points in 2D space 2. Map data points into 3D space using polynomial kernel. 3. Linearly separate data points in 3D space using a plane. 4. Map back to 2D space to determine non linear boundary.
  • 29. Decision Trees Why: • Non linear classification & regression • Pros: • Easy to understand and debug • Finds most important features in data • Requires little data preparation • Cons: • Memory concerns limit accuracy of decision trees. Deeper trees, higher test accuracy. But trees grow exponentially with depth What: • Partition feature space into smaller pieces • Learn tree structure and split functions Root Node C = 11 J = 1 𝑥𝑥1 > 140𝑥𝑥1 < 140 𝑥𝑥2 > 140𝑥𝑥2 < 140 C = 3 J = 13 C = 2 J = 1 C = 1 J = 12 Split function
  • 30. Decision Forests Why: • Solves instability in decision trees - Small variations in data can generate a different tree • Improves memory-accuracy tradeoff as trees can be parallelized. What: • A collection (i.e. Ensemble) of trees • Aggregate predictions across all trees.
  • 31. (Deep) Neural Networks Why: • For non linear classification & regression • Pros: • Fast in testing stage • Robust to noise • Cons: • Slow in training stage • Only guarantees local optima What • Sequence of non-linear combinations of extremely simple computations with high connectivity Input layer Output layer Hidden (Deep) Layers
  • 32. Model selection: Use Cross Validation Indexiofdatapoint Feature Dimension j Label Validation Set Training Set Held Out Cross Validation 1. Randomly split all data into 2 subsets: • Training set (70%) • Validation set (30%) 2. Train machine learning model on training set. 3. Pick model with lowest error on validation set. K Fold Cross Validation 1. Divide data into K pieces 2. Train on K – 1 pieces 3. Validate on remaining piece 4. Average over K results to get generalization error of model
  • 33. 91.66% 93.10% 21.40% 89.00% 89.00% 92.70% 75.00% 87.04% 0.02% 35.96% 35.96% 85.92% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Nearest Neighbor Linear SVM Sigmoid SVM Polynomial SVM RBF SVM Intersection SVM Asian Optical Character Recognition Validation Results With reduced feature dimension Without reduced feature dimension
  • 34. Putting it all together: Machine Learning in Stock Trading Use case: Machines see patterns in big data faster and perform tasks faster than humans.
  • 35. Case Study: Machine Learning in Stock Trading • Goal: Create profitable trading strategy • Data: Use company (e.g. United Airlines) stock data from Wharton Research Data Services (WRDS) https://wrdsweb.wharton.upenn.edu/wrds/ • Implementation: Predict closing price on next time step based on information from current time step *Disclaimer: This may not make money
  • 36. Case Study: Machine Learning in Stock Trading Training stage Training algorithm 𝑓𝑓 𝑥𝑥 Structure + Parameters Learned Model
  • 37. Case Study: Machine Learning in Stock Trading Buy Trade price at next time step > Trade price of the current time step (i.e. positive returns). Label = +1. Hold Trade price at next time step = Trade price of current time step. Label 0. Sell Trade price at next time step < Trade price of the current time step (i.e. negative returns). Label = -1. Measurements: Average trade price per sec, standard deviation of trade price per sec Trade Price Testing stage Classifier 𝑓𝑓 𝑥𝑥
  • 38. Training Data Improvements o Terminology • Trade price: Price at which shares last traded hands • Bid price: Price a buyer is willing to pay • Bid size: No. of shares available at bid price • Offer price: Price a seller is looking to get • Offer size: No. of shares available at offer price o Intuition… trade price next second should depend on the bid-offer information now. o Add bid and offer price data (That is within 5% of trade price) to training data
  • 39. Feature Engineering Time = 9:30:03 am Time = 9:30:04 am Time = 9:30:05 am United Airlines Inc. (Ticker Symbol UAL) Date: 2011 Dec 01 • Extract measurements from the distribution of the bid-offer curve at each second window of the bid-offer curve at each second window. Bid Offer Bid Offer Bid Offer
  • 40. Machine Learning Best Practices Combine human intuition/ wisdom with machine speed/ pattern recognition. • Use domain knowledge to choose features that distinguishes different classes well • Add specific rules to process input data before training stage/ output data after test stage. (In contrast with generalized rules from machine learning) Reduces training time when data is pre processed instead of letting ML model learn the patterns eventually.
  • 41. Useful Machine Learning Tools • Python installer for Windows with all necessary ML libraries (e.g., SciPy, NumPy, etc.) http://winpython.sourceforge.net/ • http://www.r-project.org/ R project for statistical computing • http://prediction.io/ Create predictive features, such as personalization, recommendation and content discovery • http://www.tableausoftware.com/ Enables you to visually analyze your data
  • 42. Unsupervised Learning Application Scenarios (e.g. MinHash Clustering, Matrix Factorization, Dimensionality Reduction) 1. Simplify your data so as to provide insights for 3rd party businesses. 2. Interpret data to test assumptions about your user’s behavior/ market 3. Cluster your customers into different groups with different needs so as to increase monetization. 4. Make more informed business decisions for your startup/ customers.
  • 43. More Best Machine Learning Practices • Having simple models with clear explanations is better than complicated moel with unclear explanations for debugging purposes. • To address non-linearly separable data: Use non linear features/ classifiers. • Don’t confuse cause and effect with noise from your data. Consider using certain statistical tests (e.g. McNemmar’s statistics) to check whether your result is statistically significant. • If your training data is too small, it can cause over fitting problem. • Modularize machine learning model so that it is easy for new people to see conveniently and experiment easily. • Have a common baseline when comparing improvements to your machine learned model. Common baselines enable you to share resources when comparing between different techniques. • Modularize your code such that is easy for new people to experiment with different techniques (Parameter sweeping etc.) quickly. Define inputs/ outputs/ training parameters clearly. • Compare to natural baselines. Guess global average for items ratings. Suggest globally popular items. • You can use UI/ UX to your advantage to find hacks around the balance issue of computational speed (Latency) and memory capacity. • Incrementally update your ratings using Stochastic Gradient descent. (i.e. As I get new observations, I’ll update for that user and item only). An alternative is weekly batch retraining. • The more expressive your model, the less expressive your features need to be. The less expressive your model, the more expressive your features need to be.
  • 44. • Think about scaling early: 1. Sample a subset of ratings for each user so that you can handle the matrix in memory. 2. Use MinHash to cluster users (DDGR07) 3. Distribute calculations with Map Reduce 4. Distribute matrix operations with Map Reduce [GHNS11] 5. Parallelize stochastic gradient descent [ZWSL10] 6. Expectation-maximization for pLSI with MapReduce [DDGR07] Note: Niche vs general – tf-idf. Both of us watching a niche movie should mean more than if I watch a popular movie. For practical constraints, people use item based similarity very often. More Best Machine Learning Practices