KaoNet: Face Recognition and Generation App using Deep Learning

KaoNet: Face Recognition and Generation
App using Deep Learning
Van Phu Quang Huy
Pham Quang Khang
1

About Us
Van Phu Quang Huy
● AI Lead Engineer at Galapagos Inc
Pham Quang Khang
● Software engineer@Works Applications
2

Objectives
● What we want to do?
To introduce the whole process of creating an application
based on Deep Learning
● What will be included:
○ Convolutional Neural Networks (CNNs)
○ Generative Adversarial Networks (GANs)
○ TensorFlow
3

First thing first: idea
● Facial recognition is a promising yet challenging research field for its
enormous applications:
○ Biometric security system
○ Monitoring and people searching
○ Daily applications
● All the tools to develop a facial recognition app are already provided by
lots of company
=> Why not a face recognition app
5

The name: KaoNet
KaoNet = 顔(Kao) + Net
It is The Network of Faces
6

What can the app do?
● Classify the input data into groups of faces of the same people
● Generate faces using the input such that the generated faces can be as
similar as human as possible
7

Whose faces?
● In order to train a neural network, the amount of sample data must be
very large
who would have that amount of photos for share? => famous people
● Who would attract most => singers, models, actresses
8

Where to find those photos?
● Online: internet is the infinite source of all kind of information, hence the
more famous one person is, the higher the probability his/her photos can
be searched with simple keywords
● The search engine we chose: Bing. Because the API to crawl photos from
search results is still free in Bing.
9

How many photos?
● At first, a list of more than 50 popular people was chosen as the target of
the app, we expected around at least 1K of data for each
● Crawling: data was collected from a few simple fixed keyword to search
on Bing and save the result to local server
● Result: around 1K of photos for each person were collected but after
removing wrong result, only around 200 correct samples for each was
chosen for KaoNet
10

But we only care about The Face of people
● The whole photo is not
a good sample since
too much noise in
background
● Solution: Cut out the
face out by OpenCV
11

Finally
A Net of Kao (faces)
Result: only have enough time to
filter 26 targets
12

Finish?
● That is only the beginning. Now the hard part: training
● Model:
○ CNN : train to classify samples
○ GAN: to generate a face from samples
● Framework: TensorFlow. Because it is highly supported for CNN with real
time training process observation, and one code for both CPU and GPU
13
Training progress in real time
steps steps

Convolutional Neural Network: convolution layer
Idea: extracting the elementary features of image by using the local receptive
fields instead of training all points on the original image (Yann LeCun 1998)
15
Fei-Fei Li, Stanford 2016

Pooling layer (sub-sampling)
Local averaging and sub-sampling, reducing the resolution of feature map and
reducing the resolution of the feature map (Yann LeCun 1998)
16
Fei-Fei Li, Stanford 2016

CNN architecture in KaoNet
● Formula of convolutional layers:
Convolution + Batch Normalization + ReLU +
Max Pooling
● Architecture of KaoNet:
4 convolutional layers (conv) + 2 fully
connected layers (fc)
17
layer size-in size-out kernel
conv1 128⨉128⨉3 128⨉128⨉32 7⨉7, 1
pool1 128⨉128⨉32 64⨉64⨉32 2⨉2, 2
conv2 64⨉64⨉32 64⨉64⨉64 5⨉5, 1
pool2 64⨉64⨉64 32⨉32⨉64 2⨉2, 2
conv3 32⨉32⨉64 32⨉32⨉128 3⨉3, 1
pool3 32⨉32⨉128 16⨉16⨉128 2⨉2, 2
conv4 16⨉16⨉128 16⨉16⨉192 3⨉3, 1
pool4 16⨉16⨉192 8⨉8⨉192 2⨉2, 2
reshape 8⨉8⨉192 1⨉12288
fc1 1⨉12288 1⨉1024
fc2 1⨉1024 1⨉512

Hyper-parameters in KaoNet
● Number of layers: 4 conv, 2 fc
● Size and number of filters in each convolutional layer (previous slide)
● Size of fully connected layer (previous slide)
● Weight-decay (for fc only, weight-decay = 0.004)
● Optimization algorithm (AdamOptimizer)
● Initial learning rate (0.004)
● Initial weight (normal distribution with mean=0, sttdev = 5e-4)
19

Data partition
● Data is separated into 2 parts: train data and validation data with the ratio
of 80: 20.
● Each epoch, training result is applied to validation data to evaluate the
loss and prediction accuracy
● Each train step, an amount of batch_size (KaoNet: 64) data is loaded for
training. Data is loaded randomly from training set
20

Source code
● TensorFlow tutorial of Cifar10 and MNIST are good samples
https://www.tensorflow.org/tutorials/deep_cnn
https://www.tensorflow.org/tutorials/layers
● Our source code (not public yet)
https://github.com/vanhuyz/KaoNet
21

Let’s run
● Training with 26 targets resulted in fair accuracy on training set but
extremely poor on validation set => Overfitting
22
steps steps
Train set
Validation set
Train set
Validation set

Why failed?
Causes:
○ The model is too complex compare to the number of sample in each training set
○ The number of sample for each object varied too much, some has the number of sample
a few times more than others
Solution:
● Simplify the model => not so a choice for application extending
● Increasing the number of samples => not enough time
● Only train with targets have sufficient number of sample => worth trying
23

The Ultimate 2
● One way to fix the problem is to use a set of sample that fairly separated
and with more amount of data
● The Ultimate 2: 10K of photos for each target
24

Accuracy of validation test is highly improved
● Loss drops to close to zero after 10K steps of training
● Train accuracy went to 100% before 5K steps
● Validation accuracy highly improved, compared to previous data set
25
steps steps

Training Environment
● Use all resources that we can
○ Macbook Pro (CPU)
○ Dell Vostro desktop (CPU)
○ AWS GPU Instance g2.8xlarge (2.7$/h) → totally cost us about 100$
○ GeForce GTX 1080 (GPU) → thank Galapagos Inc for supporting!
26

Embedding Visualization
・Presenting the vector of last fully
connected layer at each input data
・Each image is represented by a
512-dimensional vector
・High dimension vectors are
compressed into 3-dimensional
vector using PCA for visualization
→ Let’s check on Tensorboard
28

Future of KaoNet
● Biometric security: using face recognition to replace physical lock
● Face search
● Criminal hunting using CCTV
29

Generative Model [1]
● Explicitly or implicitly model the distribution of data
● By sampling from that model, it is possible to generate synthetic data
points in the data space
31
[1] C.Bishop, 2006. Pattern Recognition and Machine Learning, p43

Generative Adversarial Networks (GAN)
What are some recent and potentially upcoming breakthroughs in deep
learning? (from Quora 2016)
32
The most important one, in my opinion, is adversarial training (also called
GAN for Generative Adversarial Networks)...
This, and the variations that are now being proposed is the most
interesting idea in the last 10 years in ML, in my opinion.
-
Yann LeCun, Director of AI Research at Facebook
(https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-in-deep-learning)

GAN [1]
● Based on a game theoretic scenario in which the generator network must
compete against an adversary [2]
○ The generator network directly produces “fake” samples
○ The discriminator network attempts to distinguish between samples drawn from the
training data and samples drawn from the generator
● Train 2 networks simultaneously
○ The discriminator learns to correctly classify samples as real of fake
○ The generator learns to fool the discriminator into believing its samples are real
● In convergence, the generator’s samples are indistinguishable from real
data, and the discriminator outputs ½ everywhere
33
[1] Goodfellow, 2014
[2] Goodfellow et al, 2016. Deep Learning, p702

GAN in easy words...
● A criminal tries to print fake money
● A police attempts to distinguish fake money from real money
● At first, with outdated technology, the criminal just prints some “random
papers”, so the police can easily detect what is fake money
● The criminal learns from that, then improves his tech
34
vs

GAN in easy words...
● As the fake money becomes more and more realistic, the police also has
to improve his detection skill
● As a result, the criminal and the police learn from each other, and
continuously improve themselves
● Finally, when the fake money looks so realistic that the police can not
distinguish, the world is over!
35

In the GAN world
● The criminal is called Generator
● The police is called Discriminator
● Generator and Discriminator are usually Neural Networks (but not
required)
● GAN’s problems:
○ Unstable to train
○ Non-convergence
36

GAN’s application: Image to Image Translation
37(Isola et al, 2016)

Deep Convolutional Generative Adversarial Networks (DCGAN) [1]
● Both Generator and Discriminator are Deep Convolutional Neural
Networks
● Apply some techniques for stable training
○ Replace pooling layers with strided convolutions (discriminator) and fractional-strided
convolutions (generator)
○ Use batch normalization
○ Remove fully connected hidden layers
○ Use LeakyRELU activation in the discriminator for all layers
○ ...
38
[1] Radford et al, 2015

Generator Network in DCGAN
39(Radford et al, 2015)

Experiment: Train DCGAN on our celebrity dataset
40
Step 0 Step 1000 Step 40000

Experiment: Train DCGAN on the Ultimate 2 dataset
41
Step 0 Step 1000 Step 15000

Conclusion
● We have introduced step-by-step of developing an application based on
Deep Learning
● Succeed in creating a face classification app based on CNN
● Achieved 98% accuracy for validation test and good result on test data
● Successfully generated images using DCGAN
42

KaoNet: Face Recognition and Generation App using Deep Learning

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a KaoNet: Face Recognition and Generation App using Deep Learning

Similar a KaoNet: Face Recognition and Generation App using Deep Learning (20)

Último

Último (20)

KaoNet: Face Recognition and Generation App using Deep Learning