В этом докладе мы обсудим базовые алгоритмы и области применения Machine Learning (ML), затем рассмотрим практический пример построения системы классификации результатов измерения производительности, получаемых в Unity с помощью внутренней системы Performance Test Framework, для поиска регрессий производительности или нестабильных тестов. Также попробуем разобраться в критериях, по которым можно оценивать производительность алгоритмов ML и способы их отладки.
2. What dog are you?
.NET developer since 2007
Python developer since 2015
Toolsmith for Unity
Technologies
Religious about good code,
software design, TDD, SOLID
Love to learn new stuff
Fun Microsoft booth at NDC Oslo 2016
3. In this talk
❏ Applications of machine learning and most common algorithms
❏ Using machine learning to classify performance tests results in Unity
implemented in .NET
❏ How to debug machine learning algorithms
4. The definition of Machine Learning (ML)
Field of study that gives computers
the ability to learn without being
explicitly programmed - Arthur Samuel (1959)
A computer program is said to learn
from experience E with respect to some
class of tasks T and performance
measure P, if its performance at tasks in
T, as measured by P, improves with
experience E. - Tom Michel (1999)
9. Performance Tests - The problem we are solving
In Performance Tests we have:
● Around 120 runtime tests
● Around 500 native tests
● Which run nightly on 8 platforms:
iOS, Android, mac/win
editor/standalone, ps4, xbox
● Also about 25 editor tests for 2
platformsTotals of 5000 tests producing historical data points (performance of measured
component in ms) nightly across few major branches
10. Performance Tests - Classify into 1 of 4 categories
❏ Stable
❏ Unstable
❏ Progression
❏ Regression
200 inputs - Chronologically ordered set of samples from performance tests
4 outputs - Regression, progression, unstable, stable
15. Classification problem and Decision boundary
Classify input data into
one of two discrete
classes (yes/no, 1/0, etc)
Find the best “line”
separating negative and
positive examples (y = 1,
y = 0)
20. How do we build and train NN?
Structure:
● Define input layer (number of input nodes)
● Define output layer (number of output nodes)
● Define hidden layer (number of nodes and layers)
Training:
● Randomize the weights and apply them to the inputs (forward propagation)
● Adjust the weights guided by output error (back propagation)
Objective:
24. To access performance of the algorithm split
training data into 3 subsets
● Training set (about 60% of your data)
● Cross validation set (20%)
● Test set (20%)
Use test set to validate % of correct answers on unseen data
Use cross validation (CV) set to fine tune your algorithm, plot errors as a function
for both Training and CV sets
25. Learning curves or ‘do we need more data?’
Smaller sample size
usually means less error
on the training data but
more error on ‘unseen’
data
With more training data
CV error should go down,
but watch the gap
between Jcv and Jtrain
(less is better)
26. More complex models try to fit all training data but
tend to perform worse on ‘real’ data
27. Plot errors as you tweak parameters
As you increase d both training
error and cross validation error
go down as we better fit our data.
But at some point CV error starts
to go up again, since we
overfitting our training data and
failing to generalize to new
unseen data
32. In order to successfully solve machine learning
problem
● Identify task at hand and figure out suitable algorithm
● Carefully select your training (and validation and testing) data
● Normalize your data
● Validate results
● Debug your model and diagnose problem instead of randomly tweaking
parameters
33. References
C# version developed based on AForge.NET
https://github.com/IgorKochetov/Machine-Learning-PerfTests-Classifying
http://www.aforgenet.com/framework/docs/
http://accord-framework.net/
Stanford University course on Machine Learning by prof. Andrew Ng
https://www.coursera.org/learn/machine-learning
Book by Tariq Rashid “Make Your Own Neural Network”
https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork
34. How to reach me
Twitter: @k04a
Linkedin: Igor Kochetov
Instead of programming some rules we feed training data (learning examples) into algorithm and access results
Web data (click-stream or click through data)
Mine to understand users better
Huge segment of silicon valley
Self customizing programs
Netflix
Amazon
iTunes genius
Take users info
Learn based on your behavior
Next - types of learning tasks
Unsupervised - unlabeled data. Given the data find patterns and structure in the data
Anomaly Detection (Fraud detection, Manufacturing, DataCenter monitoring)
Anomaly detection vs. supervised learning: very small number of positive examples
Content based recommendation and Collaborative filtering (if we have a set of features for movie rating you can learn a user's preferences, and vice versa, If you have your users preferences you can therefore determine a film's features)
More examples: cocktail party algorithmMore details on Recommender Systems:Recommender systems typically produce a list of recommendations in one of two ways – through collaborative and content-based filtering or the personality-based approach.[7] Collaborative filtering approaches build a model from a user's past behaviour (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.[8] Content-based filtering approaches utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties.[9] These approaches are often combined (see Hybrid Recommender Systems).
Each test run provides us with decimal value as a result: milliseconds needed to complete. So we have a historic data for every measured feature and what to know if it increases, decreases, stays the same or jumps all around.
Our problem could be modeled as Handwriting recognition one
Every image is just an array of numbersWhich we feed into an algorithm (i.e. input)
And the output is one of 10 digits
Which brings us back to our problem:
Brain
Does loads of crazy things
Hypothesis is that the brain has a single learning algorithm
Neuron:
Three things to notice
Cell body
Number of input wires (dendrites)
Output wire (axon)
Simple level
Neuron gets one or more inputs through dendrites
Does processing
Sends output down axon
a neuron is a logistic unit
That logistic computation is just like logistic regression hypothesis calculation
X vector is our input (X0 is a constant, known as bias)
Ɵ vector is our parameters which may also be called the weights of a model (that’s what we want to learn)
This is the sigmoid function, or the logistic function
Crosses 0.5 at the origin, then flattens out, Asymptotes at 0 and 1
Which gives us DECISION BOUNDARY
When using linear regression we did hθ(x) = (θT x)
For classification hypothesis representation we do hθ(x) = g((θT x))
Where we define g(z)
z is a real number
g(z) = 1/(1 + e-z)
It could be more than a line, actually
In order to achieve that we can apply higher order polynomial or use NN
First layer is the input layer
Final layer is the output layer - produces value computed by a hypothesis
Middle layer(s) are called the hidden layers
ai(j) - activation of unit i in layer j
Ɵ(j) - matrix of parameters controlling the function mapping from layer j to layer j + 1
Every input/activation goes to every node in following layer
NN is a logistic regression at scale
Neural networks learning its own features!!!!!
ai(j) - activation of unit i in layer j
Ɵ(j) - matrix of parameters controlling the function mapping from layer j to layer j + 1
Every input/activation goes to every node in following layer
Next - multiclass
Recognizing stable, unstable, regression or progression
Build a neural network with four output units
Output a vector of four numbers
1 is 0/1 stable
2 is 0/1 unstable
3 is 0/1 regression
4 is 0/1 progression
Inputs = features
Outputs = number of classification categories
Flip back to explain forward and back propagation
We will use AForge.NET library.
We have to prepare Inputs and Outputs, choose Activation function and Network Structure (number of nodes, layers)
And train the network until error is small enough
Having single value to measure performance of the algorithm is really important
So the first step is to compare labeled inputs with algorithm outputs and calculate %% of correct results
Jtrain
Error on smaller sample sizes is smaller (as less variance to accommodate)
So as m grows error grows
Jcv
Error on cross validation set
When you have a tiny training set your generalize badly
But as training set grows your hypothesis generalize better
So cv error will decrease as m increases
High bias
e.g. setting straight line to data
Jtrain
Training error is small at first and grows
Training error becomes close to cross validation
So the performance of the cross validation and training set end up being similar (but very poor)
Jcv
Straight line fit is similar for a few vs. a lot of data
So it doesn't generalize any better with lots of data because the function just doesn't fit the data
The problem with high bias is because cross validation and training error are both high
Also implies that if a learning algorithm as high bias as we get more examples the cross validation error doesn't decrease
So if an algorithm is already suffering from high bias, more data does not help
High variance
e.g. high order polynomial
Jtrain
When set is small, training error is small too
As training set sizes increases, value is still small
But slowly increases (in a near linear fashion)
Error is still low
Jcv
Error remains high, even when you have a moderate number of examples
Because the problem with high variance (overfitting) is your model doesn't generalize
An indicative diagnostic that you have high variance is that there's a big gap between training error and cross validation error
If a learning algorithm is suffering from high variance, more data is probably going to help
Applying higher order polynomial (or complex NN)
Precision
How often does our algorithm cause a false alarm?
Of all patients we predicted have cancer, what fraction of them actually have cancer
= true positives / # predicted positive
= true positives / (true positive + false positive)
High precision is good (i.e. closer to 1)
You want a big number, because you want false positive to be as close to 0 as possible
Recall
How sensitive is our algorithm?
Of all patients in set that actually have cancer, what fraction did we correctly detect
= true positives / # actual positives
= true positive / (true positive + false negative)
High recall is good (i.e. closer to 1)
You want a big number, because you want false negative to be as close to 0 as possible
F1Score (fscore)
= 2 * (PR/ [P + R])
Fscore is like taking the average of precision and recall giving a higher weight to the lower value
Many formulas for computing comparable precision/accuracy values
If P = 0 or R = 0 the Fscore = 0
If P = 1 and R = 1 then Fscore = 1
The remaining values lie between 0 and 1
Find average value (mean) and subtract and, then divide by the range (st deviation)
Don’t be afraid to try, even small projects could be fun and useful