SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
Machine Learning in Science and Industry
Day 4, lectures 7 & 8
Alex Rogozhnikov, Tatiana Likhomanenko
Heidelberg, GradDays 2017
Recapitulation: linear models
work with many features
bag-of-words for texts / images
SVM + kernel trick or add new features to introduce interaction between features
regularizations to make it stable
1 / 66
Recapitulation: factorization models
recommender systems
=< U V >
− R → min
learning representations
learning through the prism of interaction between
entities
R^u,i u,⋅ ⋅,i
known Ru,i
∑ [R^u,i u,i]
2
2 / 66
Recapitulation: dimensionality reduction
compressing data of large dimensionality
output is of small dimensionality and can be used efficiently by ML techniques
expected to keep most of the information in an accessible form
3 / 66
Multilayer Perceptron
feed-forward
function which consists of sequentially applying
linear operations
nonlinear operations (activations)
appropriate loss function + optimization
in applications — one or two hidden layers
4 / 66
Tabular data approach
In rare decays analysis:
momenta, energies, IPs,
particle identification, isolation variables, flight
distances, angles ...
→
machine
learning
→
signal or
background
In a search engine
How many words from query in the document
(url, title, body, links...), TF-IDFs, BM25,
number of links to / from a page, geodistance,
how frequenly a link was clicked before,
estimates of fraud / search spam, ...
→
machine
learning
→
real-valued
estimation of
relevance
This approach works well enough in many areas.
Features can be either computed manually or also be estimated by ML (stacking). 5 / 66
When it didn't work (good enough)?
feature engineering → ML is a widespread
traditional pipeline
which features help to identify a drum on the
picture?
a car?
an animal?
in the sound processing
which features are good to detect a particular phoneme?
6 / 66
Image recognition today
can classify objects on images with very
high accuracy
even if those are images of 500 x 500 and
we have only 400 output neurons, that's
500 ∗ 500 ∗ 400 = 100 000 000
parameters
and that's only linear model!
1. can we reduce the number of parameters?
2. is there something we're missing about
image data when treating picture as a
vector?
′ ′
7 / 66
Convolutions
8 / 66
Pooling operation
Another important element for image processing is pooling.
Pooling should reduce size of image representation, but keep things important for further
analysis.
In particular, max-pooling (shown on the picture) is very popular.
9 / 66
Convolutional neural network
10 / 66
CNN architecture components:
convolutions
pooling (reducing the size)
flattening
simple MLP
LeNet 1 based on CNNs — was developed in 1993
11 / 66
Some idea behind
Detecting particular parts of a face is much easier. 12 / 66
Some idea behind
13 / 66
Problems with deep architectures
In deep architectures, the gradient varies between
parameters drastically (10 − 10 times difference,
this is not a limit).
This problem usually referred to as 'gradient
diminishing'.
Adaptive versions of SGD are used today.
3 6
14 / 66
Deep learning (as it was initially thought of)
Deep Belief Network (Hinton et al, 2006) was one of the first ways to
train deep structures.
trained in an unsupervised manner layer-by-layer
next layer should "find useful dependencies" in previous layer
output
finally, a classification done using output of last layer
(optionally, minor fine-tuning of the network is also done)
Why this is important?
we can recognize objects and learn patterns without supervision
gradient backpropagation in the brain??
15 / 66
Deep learning (as it was initially thought of)
Deep Belief Network (Hinton et al, 2006) was one of the first ways to
train deep structures.
trained in an unsupervised manner layer-by-layer
next layer should "find useful dependencies" in previous layer
output
finally, a classification done using output of last layer
(optionally, minor fine-tuning of the network is also done)
Why this is important?
we can recognize objects and learn patterns without supervision
gradient backpropagation in the brain??
Practical? No, today deep learning relies solely on stochastic gradient optimization
16 / 66
Deep learning today
automatic inference of helpful intermediate representations ...
with deep architecture
mostly, trained altogether with gradient-based methods
17 / 66
Inference of high-level information
Ability of neural networks to extract
high-level features can be demonstrated
with artistic style transfer.
Image is smoothly modified to preserve
the activations on some convolutional
layer.
18 / 66
Sketch to picture (texturing)
19 / 66
AlphaGo
too many combinations to check
no reliable way to assess the quality of position
20 / 66
AlphaGo
too many combinations to check
no reliable way to assess the quality of position
AlphaGo has 2 CNNs:
value network predicts probability to win
policy network predicts the next step of a player
both are given the board (with additional features
for each position on the board)
21 / 66
Applications of CNNs
Convolutional neural networks are top-performing on the
image and image-like data
image recognition and annotation
object detection
image segmentation
segmentation of neuronal membranes
22 / 66
Pitfalls of applying machine
learning
CNNs were applied to jet substructure analyses and
shown promising results on simulated data, but a
simple study shows model applied to different
simulation generators output has up to twice higher
error (FPR)
thus, learnt dependencies are simulation-specific
and may turn out to be useless
machine learning uses all dependencies it finds
useful, those may have no relation to real physics
there are approaches to domain adaptation,
but their applicability is disputable
23 / 66
Autoencoding
compressed data is treated as a result of an algorithm
trained representations can be made stable against different noise
24 / 66
Autoencoding
Comparison of PCA (left) and autoencoder with 2 neurons in the middle layer (right)
You can also play with it
25 / 66
Variational autoencoder
(an example to play with the variables of latent space)
26 / 66
An example with fonts
autoencoder trained to reconstruct symbols from different fonts can generate new fonts
when walking along the latent space
27 / 66
Gaze correction (demo)
shift correction + light correction applied
training data was collected specially for this task
28 / 66
n-minutes break
29 / 66
Recapitulation
analysis of tabular data
convolutional neural networks
and their applications to images and image-like data
many parameters
specific optimization methods
... amount of data played crucial role
autoencoding
30 / 66
Sequences and Recurrent NN
sound, text
any information that is time dependent (daily revenues, stocks prices... )
Inspiration: brain contains information about past and is able to process the information
given in sequences.
31 / 66
LSTM: a popular recurrent unit
Those are all tensor operations, see post for details
32 / 66
Character-level text generation
Given a text, network predicts next character. Then this network is used to generate a text
character-by-character. More examples
33 / 66
Code generation
Network trained on the linux source:
/*
* Increment the size file of the new incorrect UI_FILTER group information
* of the size generatively.
*/
static int indicate_policy(void)
{
int error;
if (fd == MARN_EPT) {
/*
* The kernel blank will coeld it to userspace.
*/
if (ss->segment < mem_total)
unblock_graph_and_set_blocked();
else
ret = 1;
goto bail;
}
segaddr = in_SB(in.addr);
selector = seg / 16;
setup_works = true;
for (i = 0; i < blocks; i++) {
seq = buf[i++];
bpf = bd->bd.next + i * search;
if (fd) {
current = blocked;
}
}
rw->name = "Getjbbregs";
bprm_self_clearl(&iv->version);
regs->new = blocks[(BPF_STATS << info->historidac)] | PFMR_CLOBATHINC_SECONDS << 12;
return segtable;
}
34 / 66
Style in RNNs (demo)
Position of the pen in the next moment is predicted.
35 / 66
Different strategies for prediction
one can predict a single observation from sequence
or map sequence to sequence
36 / 66
Applications of Recurrent NNs
natural language processing
translation
speech recognition
sound record is a sequence!
and speech synthesis
demand forecasting in retail systems / stock prices forecasting
37 / 66
Mixing RNN and CNN
Sometimes you need both to analyze dependencies in space and time. For example, in
weather forecasting a sequence of photos from satellites can be used to predict snow/rain
38 / 66
Why looking at sequence of images?
Animation from CIMSS Satellite Blog
39 / 66
(Word) Embeddings
I would like a  ?  of coffee.
Skip-gram model takes a context around the word:
1 word to the left and one to the right:
a  ?  of
2 words to the left and right
like a  ?  of coffee
When neural networks are trained to predict missing
word, at the first stage each word is associated with (trainable) vector — representation.
If model can predict the word well, it already learnt something very useful about the
language. 40 / 66
Embeddings
Capitals-countries example from Word2Vec Comparative-superlative example from GloVe
Similar (synonymic) words have very close representations.
Embeddings are frequently used to build representations e.g. of text to extract information
that can be used by various ML techniques. 41 / 66
RNN + Embeddings
A problem of pure factorization is the cold start:
until there is some feedback about a song, you can't
infer representation, thus can't recommend it.
To find similar tracks using the track itself:
RNN is trained to convert sound to embedding
optimization goal: embeddings of known similar
tracks should be closer
This makes it possible to recommend new tracks
that have no feedback.
42 / 66
Joint Embeddings
Embeddings can be built for images,
sounds, words, phrases...
We can use one space to fit them!
In this example embedding of images and
words are shown.
'cat' and 'truck' categories were not
provided in the training.
43 / 66
How can we generate new images? (image source)
44 / 66
Generative adversarial networks
Discriminator and Generator are working together
Generator converts noise to image
Discriminator distinguishes real images from
generated
Log-loss is used, D(x) is a probability that image is
fake
L = E log(1 − D(x)) + E log D(G(noise))
But the optimization is minimax:
L →
x−real noise
G
min
D
max
45 / 66
Detecting cosmic rays
cosmic rays coming from space have huge
energies
we don't know their source
those can be detected
need to cover large areas with detectors
46 / 66
Detecting cosmic rays
cosmic rays coming from space have huge
energies
we don't know their source
those can be detected
need to cover large areas with detectors
CRAYFIS idea
smartphone camera is a detector
a problem is to simulate well camera response for the particles
Materials were taken from slides (in russian) by Maxim Borisyak
47 / 66
Transforming adversarial networks
We can 'calibrate' the simulation with adversarial networks. That's real data images:
Input is not a noise, but output of a simulator
simulator output 1 epoch 2 epochs 3 epochs 48 / 66
Neural networks summary
NNs are very popular today and are a subject of intensive research
differentiable operations with tensors provide a very good basis for defining useful
predicting models
Structure of a problem can be effectively used in the structure of the model
DL requires much data and many resources that became available recently
and also numerous quite simple tricks to train it better
NNs are awesome, but not guaranteed to work well on problems without specific
structure
49 / 66
Finding optimal hyperparameters
50 / 66
Motivation
some algorithms have many hyperparameters
(regularizations, depth, learning rate, min_samples_leaf, ...)
not all the parameters are guessed
checking all combinations takes too much time
How to optimize this process? Maybe automated tuning?
51 / 66
Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
52 / 66
Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
Problems
no gradient with respect to parameters
noisy results
function reconstruction is a problem
53 / 66
Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
Problems
no gradient with respect to parameters
noisy results
function reconstruction is a problem
Practical note: Before running grid optimization make sure your metric is stable (i.e. by
train/testing on different subsets).
Overfitting (=getting too optimistic estimate of quality on a holdout) by using many
attempts is a real issue, resolved by one more holdout. 54 / 66
Optimal grid search
stochastic optimization (Metropolis-Hastings, annealing)
requires too many evaluations, using only last checked combination
regression techniques, reusing all known information
(ML to optimize ML!)
55 / 66
Optimal grid search using regression
General algorithm (point of grid = set of parameters):
1. evaluations at random points
2. build regression model based on known results
3. select the point with best expected quality according to trained model
4. evaluate quality at this point
5. Go to 2 (until complete happiness)
Question: can we use linear regression in this case?
56 / 66
Optimal grid search using regression
General algorithm (point of grid = set of parameters):
1. evaluations at random points
2. build regression model based on known results
3. select the point with best expected quality according to trained model
4. evaluate quality at this point
5. Go to 2 (until complete happiness)
Question: can we use linear regression in this case?
Exploration vs. exploitation trade-off: should we try explore poorly-covered regions or try
to enhance currently seen to be optimal?
57 / 66
Gaussian processes for regression
Some definitions: Y ∼ GP(m, K), where m and K are functions of mean and covariance:
m(x), K(x, )
m(x) = EY (x) represents our prior expectation of quality (may be taken zero)
K(x, ) = EY (x)Y ( ) represents influence of known results on the expectation of
values in new points
RBF kernel is used here too: K(x, ) = exp −
Another popular choice: K(x, ) = exp −
We can model the posterior distribution of results in each point.
x~
x~ x~
x~ ( h2
∣∣x− ∣∣x~ 2
)
x~ ( h
∣∣x− ∣∣x~
)
58 / 66
Gaussian processes and active learning demo
59 / 66
Active learning with gaussian processes
Gaussian processes model posterior distribution at each point of the grid
thus, regions with well-estimated quality are known
and we are able to find regions which need exploration (those have large variance)
See also this demo.
60 / 66
Other applications of gaussian processes
Gaussian processes are good at black-box optimization when computations are costly:
Optimizing parameters of Monte-Carlo simulations in HEP
the goal is to decrease mismatch with real data
Optimizing airfoil of an airplane wing
to avoid testing all configurations, gaussian
processes are used
gaussian processes provide a surrogate model
which encounters two sources:
high-fidelity, wind tube tests
low-fidelity, simulation
61 / 66
Active learning
optimization of hyperparameters is a major usecase
can deal with optimization of noisy non-continuous functions
in a broader sense, active learning deals with situations when system can make requests
for new information
production systems should adapt dynamically to new situations
new trends / new tracks / new pages
...this user didn't like classical music a year ago, but what if...
active learning is a way to include automated experimenting in machine learning
62 / 66
Instead of conclusion
63 / 66
Summary of the lectures
data arrives today in huge amounts
there is much interest in obtaining useful information from it
both in science and in industry
problems of interest become more and more complex
64 / 66
Summary of the lectures
data arrives today in huge amounts
there is much interest in obtaining useful information from it
both in science and in industry
problems of interest become more and more complex
machine learning techniques become more sophisticated and more powerful
and influence our world
65 / 66
66 / 66

Más contenido relacionado

La actualidad más candente

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Kaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurKaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurSawinder Pal Kaur
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machineRuta Kambli
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 

La actualidad más candente (20)

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
Kaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurKaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal Kaur
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

Similar a Machine learning in science and industry — day 4

A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxRiteshPandey184067
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classificationijtsrd
 
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET Journal
 
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...A Survey of Spiking Neural Networks and Support Vector Machine Performance By...
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...ijsc
 
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...ijdms
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesNamkug Kim
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inAmanKumarSingh97
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...CSCJournals
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS ijgca
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningIRJET Journal
 

Similar a Machine learning in science and industry — day 4 (20)

AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural Network
 
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...A Survey of Spiking Neural Networks and Support Vector Machine Performance By...
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...
 
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know in
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Occurrence Prediction_NLP
Occurrence Prediction_NLPOccurrence Prediction_NLP
Occurrence Prediction_NLP
 
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 

Último

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 

Último (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

Machine learning in science and industry — day 4

  • 1. Machine Learning in Science and Industry Day 4, lectures 7 & 8 Alex Rogozhnikov, Tatiana Likhomanenko Heidelberg, GradDays 2017
  • 2. Recapitulation: linear models work with many features bag-of-words for texts / images SVM + kernel trick or add new features to introduce interaction between features regularizations to make it stable 1 / 66
  • 3. Recapitulation: factorization models recommender systems =< U V > − R → min learning representations learning through the prism of interaction between entities R^u,i u,⋅ ⋅,i known Ru,i ∑ [R^u,i u,i] 2 2 / 66
  • 4. Recapitulation: dimensionality reduction compressing data of large dimensionality output is of small dimensionality and can be used efficiently by ML techniques expected to keep most of the information in an accessible form 3 / 66
  • 5. Multilayer Perceptron feed-forward function which consists of sequentially applying linear operations nonlinear operations (activations) appropriate loss function + optimization in applications — one or two hidden layers 4 / 66
  • 6. Tabular data approach In rare decays analysis: momenta, energies, IPs, particle identification, isolation variables, flight distances, angles ... → machine learning → signal or background In a search engine How many words from query in the document (url, title, body, links...), TF-IDFs, BM25, number of links to / from a page, geodistance, how frequenly a link was clicked before, estimates of fraud / search spam, ... → machine learning → real-valued estimation of relevance This approach works well enough in many areas. Features can be either computed manually or also be estimated by ML (stacking). 5 / 66
  • 7. When it didn't work (good enough)? feature engineering → ML is a widespread traditional pipeline which features help to identify a drum on the picture? a car? an animal? in the sound processing which features are good to detect a particular phoneme? 6 / 66
  • 8. Image recognition today can classify objects on images with very high accuracy even if those are images of 500 x 500 and we have only 400 output neurons, that's 500 ∗ 500 ∗ 400 = 100 000 000 parameters and that's only linear model! 1. can we reduce the number of parameters? 2. is there something we're missing about image data when treating picture as a vector? ′ ′ 7 / 66
  • 10. Pooling operation Another important element for image processing is pooling. Pooling should reduce size of image representation, but keep things important for further analysis. In particular, max-pooling (shown on the picture) is very popular. 9 / 66
  • 12. CNN architecture components: convolutions pooling (reducing the size) flattening simple MLP LeNet 1 based on CNNs — was developed in 1993 11 / 66
  • 13. Some idea behind Detecting particular parts of a face is much easier. 12 / 66
  • 15. Problems with deep architectures In deep architectures, the gradient varies between parameters drastically (10 − 10 times difference, this is not a limit). This problem usually referred to as 'gradient diminishing'. Adaptive versions of SGD are used today. 3 6 14 / 66
  • 16. Deep learning (as it was initially thought of) Deep Belief Network (Hinton et al, 2006) was one of the first ways to train deep structures. trained in an unsupervised manner layer-by-layer next layer should "find useful dependencies" in previous layer output finally, a classification done using output of last layer (optionally, minor fine-tuning of the network is also done) Why this is important? we can recognize objects and learn patterns without supervision gradient backpropagation in the brain?? 15 / 66
  • 17. Deep learning (as it was initially thought of) Deep Belief Network (Hinton et al, 2006) was one of the first ways to train deep structures. trained in an unsupervised manner layer-by-layer next layer should "find useful dependencies" in previous layer output finally, a classification done using output of last layer (optionally, minor fine-tuning of the network is also done) Why this is important? we can recognize objects and learn patterns without supervision gradient backpropagation in the brain?? Practical? No, today deep learning relies solely on stochastic gradient optimization 16 / 66
  • 18. Deep learning today automatic inference of helpful intermediate representations ... with deep architecture mostly, trained altogether with gradient-based methods 17 / 66
  • 19. Inference of high-level information Ability of neural networks to extract high-level features can be demonstrated with artistic style transfer. Image is smoothly modified to preserve the activations on some convolutional layer. 18 / 66
  • 20. Sketch to picture (texturing) 19 / 66
  • 21. AlphaGo too many combinations to check no reliable way to assess the quality of position 20 / 66
  • 22. AlphaGo too many combinations to check no reliable way to assess the quality of position AlphaGo has 2 CNNs: value network predicts probability to win policy network predicts the next step of a player both are given the board (with additional features for each position on the board) 21 / 66
  • 23. Applications of CNNs Convolutional neural networks are top-performing on the image and image-like data image recognition and annotation object detection image segmentation segmentation of neuronal membranes 22 / 66
  • 24. Pitfalls of applying machine learning CNNs were applied to jet substructure analyses and shown promising results on simulated data, but a simple study shows model applied to different simulation generators output has up to twice higher error (FPR) thus, learnt dependencies are simulation-specific and may turn out to be useless machine learning uses all dependencies it finds useful, those may have no relation to real physics there are approaches to domain adaptation, but their applicability is disputable 23 / 66
  • 25. Autoencoding compressed data is treated as a result of an algorithm trained representations can be made stable against different noise 24 / 66
  • 26. Autoencoding Comparison of PCA (left) and autoencoder with 2 neurons in the middle layer (right) You can also play with it 25 / 66
  • 27. Variational autoencoder (an example to play with the variables of latent space) 26 / 66
  • 28. An example with fonts autoencoder trained to reconstruct symbols from different fonts can generate new fonts when walking along the latent space 27 / 66
  • 29. Gaze correction (demo) shift correction + light correction applied training data was collected specially for this task 28 / 66
  • 31. Recapitulation analysis of tabular data convolutional neural networks and their applications to images and image-like data many parameters specific optimization methods ... amount of data played crucial role autoencoding 30 / 66
  • 32. Sequences and Recurrent NN sound, text any information that is time dependent (daily revenues, stocks prices... ) Inspiration: brain contains information about past and is able to process the information given in sequences. 31 / 66
  • 33. LSTM: a popular recurrent unit Those are all tensor operations, see post for details 32 / 66
  • 34. Character-level text generation Given a text, network predicts next character. Then this network is used to generate a text character-by-character. More examples 33 / 66
  • 35. Code generation Network trained on the linux source: /* * Increment the size file of the new incorrect UI_FILTER group information * of the size generatively. */ static int indicate_policy(void) { int error; if (fd == MARN_EPT) { /* * The kernel blank will coeld it to userspace. */ if (ss->segment < mem_total) unblock_graph_and_set_blocked(); else ret = 1; goto bail; } segaddr = in_SB(in.addr); selector = seg / 16; setup_works = true; for (i = 0; i < blocks; i++) { seq = buf[i++]; bpf = bd->bd.next + i * search; if (fd) { current = blocked; } } rw->name = "Getjbbregs"; bprm_self_clearl(&iv->version); regs->new = blocks[(BPF_STATS << info->historidac)] | PFMR_CLOBATHINC_SECONDS << 12; return segtable; } 34 / 66
  • 36. Style in RNNs (demo) Position of the pen in the next moment is predicted. 35 / 66
  • 37. Different strategies for prediction one can predict a single observation from sequence or map sequence to sequence 36 / 66
  • 38. Applications of Recurrent NNs natural language processing translation speech recognition sound record is a sequence! and speech synthesis demand forecasting in retail systems / stock prices forecasting 37 / 66
  • 39. Mixing RNN and CNN Sometimes you need both to analyze dependencies in space and time. For example, in weather forecasting a sequence of photos from satellites can be used to predict snow/rain 38 / 66
  • 40. Why looking at sequence of images? Animation from CIMSS Satellite Blog 39 / 66
  • 41. (Word) Embeddings I would like a  ?  of coffee. Skip-gram model takes a context around the word: 1 word to the left and one to the right: a  ?  of 2 words to the left and right like a  ?  of coffee When neural networks are trained to predict missing word, at the first stage each word is associated with (trainable) vector — representation. If model can predict the word well, it already learnt something very useful about the language. 40 / 66
  • 42. Embeddings Capitals-countries example from Word2Vec Comparative-superlative example from GloVe Similar (synonymic) words have very close representations. Embeddings are frequently used to build representations e.g. of text to extract information that can be used by various ML techniques. 41 / 66
  • 43. RNN + Embeddings A problem of pure factorization is the cold start: until there is some feedback about a song, you can't infer representation, thus can't recommend it. To find similar tracks using the track itself: RNN is trained to convert sound to embedding optimization goal: embeddings of known similar tracks should be closer This makes it possible to recommend new tracks that have no feedback. 42 / 66
  • 44. Joint Embeddings Embeddings can be built for images, sounds, words, phrases... We can use one space to fit them! In this example embedding of images and words are shown. 'cat' and 'truck' categories were not provided in the training. 43 / 66
  • 45. How can we generate new images? (image source) 44 / 66
  • 46. Generative adversarial networks Discriminator and Generator are working together Generator converts noise to image Discriminator distinguishes real images from generated Log-loss is used, D(x) is a probability that image is fake L = E log(1 − D(x)) + E log D(G(noise)) But the optimization is minimax: L → x−real noise G min D max 45 / 66
  • 47. Detecting cosmic rays cosmic rays coming from space have huge energies we don't know their source those can be detected need to cover large areas with detectors 46 / 66
  • 48. Detecting cosmic rays cosmic rays coming from space have huge energies we don't know their source those can be detected need to cover large areas with detectors CRAYFIS idea smartphone camera is a detector a problem is to simulate well camera response for the particles Materials were taken from slides (in russian) by Maxim Borisyak 47 / 66
  • 49. Transforming adversarial networks We can 'calibrate' the simulation with adversarial networks. That's real data images: Input is not a noise, but output of a simulator simulator output 1 epoch 2 epochs 3 epochs 48 / 66
  • 50. Neural networks summary NNs are very popular today and are a subject of intensive research differentiable operations with tensors provide a very good basis for defining useful predicting models Structure of a problem can be effectively used in the structure of the model DL requires much data and many resources that became available recently and also numerous quite simple tricks to train it better NNs are awesome, but not guaranteed to work well on problems without specific structure 49 / 66
  • 52. Motivation some algorithms have many hyperparameters (regularizations, depth, learning rate, min_samples_leaf, ...) not all the parameters are guessed checking all combinations takes too much time How to optimize this process? Maybe automated tuning? 51 / 66
  • 53. Finding optimal hyperparameters randomly picking parameters is a partial solution definitely much better than checking all combinations given a target optimal value we can optimize it 52 / 66
  • 54. Finding optimal hyperparameters randomly picking parameters is a partial solution definitely much better than checking all combinations given a target optimal value we can optimize it Problems no gradient with respect to parameters noisy results function reconstruction is a problem 53 / 66
  • 55. Finding optimal hyperparameters randomly picking parameters is a partial solution definitely much better than checking all combinations given a target optimal value we can optimize it Problems no gradient with respect to parameters noisy results function reconstruction is a problem Practical note: Before running grid optimization make sure your metric is stable (i.e. by train/testing on different subsets). Overfitting (=getting too optimistic estimate of quality on a holdout) by using many attempts is a real issue, resolved by one more holdout. 54 / 66
  • 56. Optimal grid search stochastic optimization (Metropolis-Hastings, annealing) requires too many evaluations, using only last checked combination regression techniques, reusing all known information (ML to optimize ML!) 55 / 66
  • 57. Optimal grid search using regression General algorithm (point of grid = set of parameters): 1. evaluations at random points 2. build regression model based on known results 3. select the point with best expected quality according to trained model 4. evaluate quality at this point 5. Go to 2 (until complete happiness) Question: can we use linear regression in this case? 56 / 66
  • 58. Optimal grid search using regression General algorithm (point of grid = set of parameters): 1. evaluations at random points 2. build regression model based on known results 3. select the point with best expected quality according to trained model 4. evaluate quality at this point 5. Go to 2 (until complete happiness) Question: can we use linear regression in this case? Exploration vs. exploitation trade-off: should we try explore poorly-covered regions or try to enhance currently seen to be optimal? 57 / 66
  • 59. Gaussian processes for regression Some definitions: Y ∼ GP(m, K), where m and K are functions of mean and covariance: m(x), K(x, ) m(x) = EY (x) represents our prior expectation of quality (may be taken zero) K(x, ) = EY (x)Y ( ) represents influence of known results on the expectation of values in new points RBF kernel is used here too: K(x, ) = exp − Another popular choice: K(x, ) = exp − We can model the posterior distribution of results in each point. x~ x~ x~ x~ ( h2 ∣∣x− ∣∣x~ 2 ) x~ ( h ∣∣x− ∣∣x~ ) 58 / 66
  • 60. Gaussian processes and active learning demo 59 / 66
  • 61. Active learning with gaussian processes Gaussian processes model posterior distribution at each point of the grid thus, regions with well-estimated quality are known and we are able to find regions which need exploration (those have large variance) See also this demo. 60 / 66
  • 62. Other applications of gaussian processes Gaussian processes are good at black-box optimization when computations are costly: Optimizing parameters of Monte-Carlo simulations in HEP the goal is to decrease mismatch with real data Optimizing airfoil of an airplane wing to avoid testing all configurations, gaussian processes are used gaussian processes provide a surrogate model which encounters two sources: high-fidelity, wind tube tests low-fidelity, simulation 61 / 66
  • 63. Active learning optimization of hyperparameters is a major usecase can deal with optimization of noisy non-continuous functions in a broader sense, active learning deals with situations when system can make requests for new information production systems should adapt dynamically to new situations new trends / new tracks / new pages ...this user didn't like classical music a year ago, but what if... active learning is a way to include automated experimenting in machine learning 62 / 66
  • 65. Summary of the lectures data arrives today in huge amounts there is much interest in obtaining useful information from it both in science and in industry problems of interest become more and more complex 64 / 66
  • 66. Summary of the lectures data arrives today in huge amounts there is much interest in obtaining useful information from it both in science and in industry problems of interest become more and more complex machine learning techniques become more sophisticated and more powerful and influence our world 65 / 66