SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
STYLE TRANSFER
Lars Lowe Sjösund
AI Research Engineer at Peltarion
OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements
OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements
+ =
Content Style Desired output
Image courtesy: https://github.com/jcjohnson/neural-style
STYLE TRANSFER
OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements
Image courtesy: Matthieu Cord : Deep CNN and Weak Supervision Learning for visual recognition, https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-
learning-meetup-5/
HOW DOES A CNN WORK?
16
32
32
3
Convolution Layer
32x32x3 image
width
height
depth
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
17
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
18
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Filters always extend the full
depth of the input volume
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
19
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product + bias)
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
20
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation map
1
28
28
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
21
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation maps
1
28
28
consider a second, green filter
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
22
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
Image courtesy: http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements
+ =
Content Style Desired output
Image courtesy: https://github.com/jcjohnson/neural-style
STYLE TRANSFER
+ =
Content Style Desired output
Image courtesy: https://github.com/jcjohnson/neural-style
STYLE TRANSFER
Image courtesy: https://github.com/jcjohnson/neural-style
RECONSTRUCTING CONTENT
➤ Given image, how can we find a
new one with the same content?
➤ Find content distance measure
between images
➤ Start from random noise image
➤ Minimize distance through iteration
Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
1. Load a pre-trained CNN (e.g. VGG19)
2. Pass image #1 through the net
3. Save activation maps from conv-layers
4. Pass image #2 through the net
5. Save activation maps from conv-layers
6. Calculate Euclidean distance between
activation maps from image #1 and #2
and sum up for all layers
CONTENT DISTANCE MEASURE
Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
Lcontent (x, ˆx) =
1
2
wl (Al (x)− Al ( ˆx))2
l
∑
x ˆx
RECONSTRUCTING CONTENT
➤ Start from random image
➤ Update it using gradient descent
Lcontent (x, ˆx) =
1
2
wl (Al (x)− Al ( ˆx))2
l
∑
ˆxt+1 = ˆxt − ε
∂Lcontent
∂ ˆx
Image courtesy: D. Ulyanov, https://bayesgroup.github.io/bmml_sem/2016/style.pdf
RECONSTRUCTING CONTENT
➤ Start from random image
➤ Update it using gradient descent
Lcontent (x, ˆx) =
1
2
wl (Al (x)− Al ( ˆx))2
l
∑
ˆxt+1 = ˆxt − ε
∂Lcontent
∂ ˆx
55
Reconstructions from intermediate layers
Higher layers are less sensitive to changes in
color, texture, and shape
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Feature Inversion
Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
54
Feature Inversion
Reconstructions from the representation after last last pooling layer
(immediately before the first Fully Connected layer)
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
+ =
Content Style Desired output
Image courtesy: https://github.com/jcjohnson/neural-style
STYLE TRANSFER
+ =
Content Style Desired output
STYLE TRANSFER
Image courtesy: https://github.com/jcjohnson/neural-style
Style = Texture / Local structure

Ignores global semantic content
STYLE DISTANCE MEASURE
➤ Represent style by Gram matrix - pairwise covariance of activation maps
➤ Just the uncentered covariance matrix between vectorized activation maps
Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
Gij
l
(x) =
!
Ai
l
(x)i
!
Aj
l
(x)
G(A1,A1) … G(A1,An )
! " !
G(An,A1) # G(An,An )
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
STYLE DISTANCE MEASURE
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
➤ Style loss - Euclidean distance between Gram matrices
from two images
RECONSTRUCTING STYLE
➤ Start from random image
➤ Update it using gradient descent
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
ˆxt+1 = ˆxt − ε
∂Lstyle
∂ ˆx
Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
RECONSTRUCTING STYLE
➤ Start from random image
➤ Update it using gradient descent
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
ˆxt+1 = ˆxt − ε
∂Lstyle
∂ ˆx
RECONSTRUCTING STYLE
Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
MATHEMATICAL SIDE NOTE
Special case of square of Maximum Mean Discrepancy (MMD)
with
Further reading: Demystifying Style Transfer, Li et al.
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
Lstyle
l
=
1
Zk
l
MMD2
(Al
(x),Al
( ˆx))
= E[φ(Al
(x))]− E[φ(Al
( ˆx))]
2
=
1
Zk
l
(k(A:,i
l
,A:, j
l
)+k( ˆA:,i
l
, ˆA:, j
l
)
j=1
Ml
∑
i=1
Ml
∑ + 2k(A:,i
l
, ˆA:, j
l
))
k(x, ˆx) = (xT
ˆx)2
+ =
Content Style Desired output
STYLE TRANSFER
Ltotal (x, ˆx) = αLcontent (x, ˆx)+ βLstyle(x, ˆx)
Image courtesy: https://github.com/jcjohnson/neural-style
OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements
TOTAL VARIATION LOSS
TOTAL VARIATION LOSS
LTV = (vi+1, j − vi, j )2
+ (vi, j+1 − vi, j )2
i, j
∑
PERCEPTUAL LOSSES FOR REAL-TIME STYLE TRANSFER AND SUPER-RESOLUTION
➤ Train a network to do the optimization
➤ + Fast
➤ - One network per style
➤ - Quantitatively slightly worse
Image courtesy: Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super-Resolution, https://arxiv.org/abs/1603.08155
ARBITRARY STYLE TRANSFER IN REAL-TIME WITH ADAPTIVE INSTANCE NORMALIZATION
Image courtesy: Huang et al., Arbitrary style transfer in real-time with adaptive instance normalization
AdaIN(xc,xs ) = σ (xs )
xc − µ(xc )
σ (xc )
⎛
⎝⎜
⎞
⎠⎟ + µ(xs )
➤ Align mean and variance for activation maps
➤ + Fast (15 fps, 512x512px)
➤ + One net, arbitrary style
➤ - Quantitatively slightly worse
QUESTIONS
&
DISCUSSION
THANK YOU!
Email: lars@peltarion.com
Twitter: sjosund

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Deep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hullDeep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hull
 
Writing your own Neural Network.
Writing your own Neural Network.Writing your own Neural Network.
Writing your own Neural Network.
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Composition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel ArchitecturesComposition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel Architectures
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Pyramid Algorithm Framework for Real-Time Image Effects in Game Engines
Pyramid Algorithm Framework for Real-Time Image Effects in Game EnginesPyramid Algorithm Framework for Real-Time Image Effects in Game Engines
Pyramid Algorithm Framework for Real-Time Image Effects in Game Engines
 
Performance evaluation of ds cdma
Performance evaluation of ds cdmaPerformance evaluation of ds cdma
Performance evaluation of ds cdma
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
Aleksander gegov
Aleksander gegovAleksander gegov
Aleksander gegov
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblages
 

Similar a Stockholm AI study group #1 - Style Transfer

Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
AdnanHaider234505
 

Similar a Stockholm AI study group #1 - Style Transfer (20)

Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Conv xg
Conv xgConv xg
Conv xg
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
4.Do& Martion- Contourlet transform (Backup side-4)
4.Do& Martion- Contourlet transform (Backup side-4)4.Do& Martion- Contourlet transform (Backup side-4)
4.Do& Martion- Contourlet transform (Backup side-4)
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Stockholm AI study group #1 - Style Transfer

  • 1. STYLE TRANSFER Lars Lowe Sjösund AI Research Engineer at Peltarion
  • 2. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  • 3. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  • 4. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  • 5. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  • 6.
  • 7. Image courtesy: Matthieu Cord : Deep CNN and Weak Supervision Learning for visual recognition, https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep- learning-meetup-5/ HOW DOES A CNN WORK?
  • 8. 16 32 32 3 Convolution Layer 32x32x3 image width height depth Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 9. 17 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 10. 18 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 11. 19 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 12. 20 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation map 1 28 28 Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 13. 21 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation maps 1 28 28 consider a second, green filter Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 14. 22 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6! Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 16. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  • 17. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  • 18. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  • 20. RECONSTRUCTING CONTENT ➤ Given image, how can we find a new one with the same content? ➤ Find content distance measure between images ➤ Start from random noise image ➤ Minimize distance through iteration Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  • 21. 1. Load a pre-trained CNN (e.g. VGG19) 2. Pass image #1 through the net 3. Save activation maps from conv-layers 4. Pass image #2 through the net 5. Save activation maps from conv-layers 6. Calculate Euclidean distance between activation maps from image #1 and #2 and sum up for all layers CONTENT DISTANCE MEASURE Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ x ˆx
  • 22. RECONSTRUCTING CONTENT ➤ Start from random image ➤ Update it using gradient descent Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lcontent ∂ ˆx Image courtesy: D. Ulyanov, https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  • 23. RECONSTRUCTING CONTENT ➤ Start from random image ➤ Update it using gradient descent Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lcontent ∂ ˆx
  • 24. 55 Reconstructions from intermediate layers Higher layers are less sensitive to changes in color, texture, and shape Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015 Feature Inversion Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 25. 54 Feature Inversion Reconstructions from the representation after last last pooling layer (immediately before the first Fully Connected layer) Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015 Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  • 26. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  • 27. + = Content Style Desired output STYLE TRANSFER Image courtesy: https://github.com/jcjohnson/neural-style
  • 28. Style = Texture / Local structure Ignores global semantic content
  • 29. STYLE DISTANCE MEASURE ➤ Represent style by Gram matrix - pairwise covariance of activation maps ➤ Just the uncentered covariance matrix between vectorized activation maps Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf Gij l (x) = ! Ai l (x)i ! Aj l (x) G(A1,A1) … G(A1,An ) ! " ! G(An,A1) # G(An,An ) ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟
  • 30. STYLE DISTANCE MEASURE Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf ➤ Style loss - Euclidean distance between Gram matrices from two images
  • 31. RECONSTRUCTING STYLE ➤ Start from random image ➤ Update it using gradient descent Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lstyle ∂ ˆx Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  • 32. RECONSTRUCTING STYLE ➤ Start from random image ➤ Update it using gradient descent Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lstyle ∂ ˆx
  • 33. RECONSTRUCTING STYLE Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
  • 34. MATHEMATICAL SIDE NOTE Special case of square of Maximum Mean Discrepancy (MMD) with Further reading: Demystifying Style Transfer, Li et al. Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ Lstyle l = 1 Zk l MMD2 (Al (x),Al ( ˆx)) = E[φ(Al (x))]− E[φ(Al ( ˆx))] 2 = 1 Zk l (k(A:,i l ,A:, j l )+k( ˆA:,i l , ˆA:, j l ) j=1 Ml ∑ i=1 Ml ∑ + 2k(A:,i l , ˆA:, j l )) k(x, ˆx) = (xT ˆx)2
  • 35. + = Content Style Desired output STYLE TRANSFER Ltotal (x, ˆx) = αLcontent (x, ˆx)+ βLstyle(x, ˆx) Image courtesy: https://github.com/jcjohnson/neural-style
  • 36.
  • 37. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  • 39. TOTAL VARIATION LOSS LTV = (vi+1, j − vi, j )2 + (vi, j+1 − vi, j )2 i, j ∑
  • 40. PERCEPTUAL LOSSES FOR REAL-TIME STYLE TRANSFER AND SUPER-RESOLUTION ➤ Train a network to do the optimization ➤ + Fast ➤ - One network per style ➤ - Quantitatively slightly worse Image courtesy: Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super-Resolution, https://arxiv.org/abs/1603.08155
  • 41. ARBITRARY STYLE TRANSFER IN REAL-TIME WITH ADAPTIVE INSTANCE NORMALIZATION Image courtesy: Huang et al., Arbitrary style transfer in real-time with adaptive instance normalization AdaIN(xc,xs ) = σ (xs ) xc − µ(xc ) σ (xc ) ⎛ ⎝⎜ ⎞ ⎠⎟ + µ(xs ) ➤ Align mean and variance for activation maps ➤ + Fast (15 fps, 512x512px) ➤ + One net, arbitrary style ➤ - Quantitatively slightly worse