Deep Style: Using Variational Auto-encoders for Image Generation
11 de Nov de 2015•0 recomendaciones
15 recomendaciones
Sé el primero en que te guste
ver más
•5,363 vistas
vistas
Total de vistas
0
En Slideshare
0
De embebidos
0
Número de embebidos
0
Descargar para leer sin conexión
Denunciar
Datos y análisis
This talk is about some work done at Stitch Fix surrounding the use of Variational Auto-encoders to efficiently create distributed representation spaces of style and generative image models for new clothing.
New Clients New Clothing
1. Get new clothing.
2. Get new clients.
3. ????????
4.PROFIT!!!
COLD START PROBLEM
New Clients New Clothing
1. Get new clothing.
2. Get new clients.
3. ????????
4.PROFIT!!!
Preemptive Modeling
COLD START PROBLEM
TURN TO IMAGES
• Style/fashion is primarily visual.
• We wish to use images for modeling purposes.
• Heuristics for how we process image data
unknown or quite complex.
• We don’t want to have to develop image
features.
• Turn to deep learning to learn the feature
extraction.
OUTLINE
1. Introduction to NNs
2. Unsupervised Deep Learning
3. Getting started with Chainer
4. Training a simple model.
1. Introduction to NNs
2. Unsupervised Deep Learning
3. Getting started with Chainer
4. Training a simple model.
OUTLINE
1. Introduction to NNs
2. Unsupervised Deep Learning
3. Getting started with Chainer
4. Training a simple model.
5. Open source package!
6. Conclusions/Future (current) Directions
OUTLINE
Begin with input: 1 2 3 4 layer 1
(Input)
5 6
layer 2
f
(l)
i (x) = tanh
0
@
X
j
W
(l)
ij x
(l 1)
j + b(l)
1
A
INTRO TO NEURAL NETS
Begin with input: 1 2 3 4 layer 1
(Input)
5 6
layer 2
f
(l)
i (x) = tanh
0
@
X
j
W
(l)
ij x
(l 1)
j + b(l)
1
A
layer 3
(output)
Transform data repeatedly
with non-linear function.
f(1)
· · · f(n)
(x)
INTRO TO NEURAL NETS
1 2 3 4 layer 1
(Input)
5 6
layer 2
layer 3
(output)
Calculate loss function
and update weights
f(1)
· · · f(n)
(x)
L(xout, y) =
MSE
z }| {
1
m
mX
k=1
(xk yk)2
Begin with input:
f
(l)
i (x) = tanh
0
@
X
j
W
(l)
ij x
(l 1)
j + b(l)
1
A
Transform data repeatedly
with non-linear function.
INTRO TO NEURAL NETS
1 2 3 4 layer 1
(Input)
5 6
layer 2
layer 3
(output)
L(xout, y) =
MSE
z }| {
1
m
mX
k=1
(xk yk)2
W
(l)⇤
ij = W
(l)
ij
✓
1 ↵
@L
@Wij
◆
Calculate loss function
and update weights
f(1)
· · · f(n)
(x)
Begin with input:
f
(l)
i (x) = tanh
0
@
X
j
W
(l)
ij x
(l 1)
j + b(l)
1
A
Transform data repeatedly
with non-linear function.
INTRO TO NEURAL NETS
1 2 3 4 layer 1
(Input)
5 6
layer 2
layer 3
(output)
L(xout, y) =
MSE
z }| {
1
m
mX
k=1
(xk yk)2
W
(l)⇤
ij = W
(l)
ij
✓
1 ↵
@L
@Wij
◆
@L
@W
(l)
ij
=
✓
@L
@xout
◆ ✓
@xout
@f(n 1)
◆
· · ·
@f(l)
@W
(l)
ij
!
Calculate loss function
and update weights
f(1)
· · · f(n)
(x)
Begin with input:
f
(l)
i (x) = tanh
0
@
X
j
W
(l)
ij x
(l 1)
j + b(l)
1
A
Transform data repeatedly
with non-linear function.
INTRO TO NEURAL NETS
WHY DEEP LEARNING?
1) With no hidden layers NN resemble just a linear
transformation.
2) Shallow networks approximate PCA
3) Composing non-linear activation functions adds increasing
nonlinearity.
f(1)
· · · f(n)
(x)
4) Learn more complex/nonlinear models with deep architectures.
DL WITH SUPERVISION
Most deep learning methods rely on supervised training data.
MO:
Feature Extraction w/ Deep Learning
Final Classification Layer(s)
http://parse.ele.tue.nl/education/cluster2
Thankfully we can learn feature representations of unsupervised data.
The key is to compress the data with a nonlinear encoding process.
PROBLEM
No reliable system of style labels for image data.
ISSUES FOR STYLE
Compressed Data
Original
Image
Reconstructed
Image
Encode Decode
Training:
1) Initialize to random weights in layers.
2) Full forward pass of batch through encoding and then decoding
of encoded rep.
3) Construct loss via MSE of original data to reconstructed data.
4) Calculate gradients and backprop through to train new weights.
5) Iterate.
AUTO-ENCODERS
AUTO-ENCODER ISSUES
1) AEs will often overfit unless amount of training data is
large.
2) Gradients diminish quickly, thus weight corrections small
“far away” from output.
SOLUTION
1) Use variational component to “regularize” training.
2) *Not Covered* Stack auto-encoders and train greedily (DBN)
1) AEs will often overfit unless amount of training data is
large.
2) Gradients diminish quickly, thus weight corrections small
“far away” from output.
AUTO-ENCODER ISSUES
Easy-to-use framework for training Neural Networks.
BASIC OBJECTS
Variables Functions
Wrapper on ndarrays. Operate on Variable objects
Operations of functions on variables memorized in sequence.
Back propagation done by simply automatic differentiation
moving backwards through the sequence of operations.
INTRO TO CHAINER
x = np.ones(1)*5
y = np.ones(1)*3
x = chainer.Variable(x)
y = chainer.Variable(y)
z = x**2 + y**2 + 2*y
INTRO TO CHAINER
x = np.ones(1)*5
y = np.ones(1)*3
x = chainer.Variable(x)
y = chainer.Variable(y)
z = x**2 + y**2 + 2*y
INTRO TO CHAINER
x = np.ones(1)*5
y = np.ones(1)*3
x = chainer.Variable(x)
y = chainer.Variable(y)
z = x**2 + y**2 + 2*y
In [3]: z.data
Out[3]: array([ 40.])
INTRO TO CHAINER
x = np.ones(1)*5
y = np.ones(1)*3
x = chainer.Variable(x)
y = chainer.Variable(y)
z = x**2 + y**2 + 2*y
In [3]: z.data
Out[3]: array([ 40.])
INTRO TO CHAINER
x = np.ones(1)*5
y = np.ones(1)*3
x = chainer.Variable(x)
y = chainer.Variable(y)
z = x**2 + y**2 + 2*y
In [3]: z.data
Out[3]: array([ 40.])
#calculate gradients
z.backwards()
INTRO TO CHAINER
Steps to NN
1. Define a model using chainer.FunctionSet
1. Contains all parametric functions.
2. Simple way to wrap computational elements into one
object.
2. Design and code forward network pass.
3. Set optimizer: chainer.optimizers
4. Make a train script which iteratively passes batches forward
through the network and updates the weights:
optimizer.update()
loss.backwards()
INTRO TO CHAINER
ADVANTAGES
1. Forward pass through networks are intuitive and easily
debugged.
2. Can use arbitrary control flow statements.
3. Backpropagation easily implemented through backwards
traversal of computational graph.
4. High level of readability.
INTRO TO CHAINER
UPDATE
# Loss is just RMSE
loss = F.mean_squared_error(reconstruction, input)
# “Regularize” the latent vector
loss += F.gaussian_kl_divergence(mean, std)
L(x) = DKL(q (z)||N(0, I)) + MSE(x, yout)
UPDATE
# Loss is just RMSE
loss = F.mean_squared_error(reconstruction, input)
# “Regularize” the latent vector
loss += F.gaussian_kl_divergence(mean, std)
#backprop
optimizer.zero_grads()
loss.backward()
optimizer.update()
RESULTS
Still testing the efficacy of modeling style with the encoded space.
Normally, the generative portion would be thrown out after training,
but here we can use it to look at our style space.
For 100x200 RGB Image:
100x200x3 = 60000 node input layer
60,000x(step down layer 4000) = 240M
240M x 32-bits = ~ 960 MB
FUTURE DIRECTIONS
Issues with scaling to high resolution.
Add Convolution Layers:
1) Reduce # of parameters.
2) Add translation robustness.
3) Hierarchical feature structure.
FUTURE DIRECTIONS
For 100x200 RGB Image:
100x200x3 = 60000 node input layer
60,000x(step down layer 4000) = 240M
240M x 32-bits = ~ 960 MB
Issues with scaling to high resolution.
Add Convolution Layers:
1) Reduce # of parameters.
2) Add translation robustness.
3) Hierarchical feature structure.
FUTURE DIRECTIONS
For 100x200 RGB Image:
100x200x3 = 60000 node input layer
60,000x(step down layer 4000) = 240M
240M x 32-bits = ~ 960 MB
Issues with scaling to high resolution.
COMING SOON
CONCLUSIONS
1) Style feature space would help resolve cold-start problem for both
clients and items.
2) Auto-encoders are useful for deducing feature space in an
unsupervised way.
3) Turn to VAE for drag and drop way to prevent overfitting.
4) Convolution on it’s way.
You can check out the branch: convolutional-vae
QUESTIONS?
Original VAE Paper: http://arxiv.org/abs/1312.6114
Blog Post: http://multithreaded.stitchfix.com/blog/2015/09/17/deep-style/
APPENDIX: VARIATIONAL INFERENCE
Want to solve for posterior: p✓(z|x) =
p✓(x|z)p✓(z)
p✓(x)
But posterior can be intractable to calculate efficiently.
Approximate
p✓(z|x) ⇡ q (z)
Minimize KL Divergence
DKL (q (z)||p✓(z|x)) =
Z
dz q (z) ln
✓
q (z)
p✓(z|x)
◆
APPENDIX: VARIATIONAL AUTO-ENCODER
Auto-encoder learns/infers in the Bayesian sense too.
Learning encoding is equivalent to maximizing likelihood:
argmax
z
p✓(x|z)
And generating decoding by maximizing posterior:
argmax
x
p✓(z|x)
Apply variational inference at the decoding step to calculate posterior.
Auto-encoder now models distributions for latent space.
If we guess a normal form for our “variational distribution” …
APPENDIX: VARIATIONAL AUTO-ENCODER
DKL (q (z)||p✓(z|x)) = log
2
1
+
2
1
2
2 + (µ1 µ2)
2
2 2
2
Auto-encoder now models distributions for latent space.
If we guess a normal form for our “variational distribution” …
APPENDIX: VARIATIONAL AUTO-ENCODER
DKL (q (z)||p✓(z|x)) = log
2
1
+
2
1
2
2 + (µ1 µ2)
2
2 2
2
L2 Loss
Auto-encoder now models distributions for latent space.
If we guess a normal form for our “variational distribution” …
APPENDIX: VARIATIONAL AUTO-ENCODER
DKL (q (z)||p✓(z|x)) = log
2
1
+
2
1
2
2 + (µ1 µ2)
2
2 2
2
L2 Loss
=
X
i
✓
1
2
⇥ 2
i + µ2
i 1
⇤
log i
◆
Auto-encoder now models distributions for latent space.
If we guess a normal form for our “variational distribution” …
APPENDIX: VARIATIONAL AUTO-ENCODER
DKL (q (z)||p✓(z|x)) = log
2
1
+
2
1
2
2 + (µ1 µ2)
2
2 2
2
L2 Loss
=
X
i
✓
1
2
⇥ 2
i + µ2
i 1
⇤
log i
◆
Drop in loss term to regularize latent space!
Auto-encoder now models distributions for latent space.
If we guess a normal form for our “variational distribution” …
APPENDIX: VARIATIONAL AUTO-ENCODER