13. Viola-Jones algorithm: inference
for each patch
Stages
Face
Yes Yes
Stage 1 Stage 2 Stage N
Optimization
– Features are grouped into stages
– If a patch fails any stage => discard
19. Comparison: MTCNN vs R-FCN
MTCNN
+ Faster
+ Landmarks
- Less accurate
- No batch processing
Model GPU Inference FDDB Precision
(100 errors)
R-FCN 40 ms 92%
MTCNN 17 ms 90%
21. What is TensorRT
NVIDIA TensorRT is a high-performance deep learning inference optimizer
Features
– Improves performance for complex networks
– FP16 & INT8 support
– Effective at small batch-sizes
25. Batch processing
Results
– Single run
– Enables batch processing
Model Inference
ms
MTCNN (Caffe, python) 17
MTCNN (Caffe, C++) 12.7
+ batch 10.7
26. TensorRT: layers
Problem
No PReLU layer => default pre-trained
model can’t be used
Retrained with ReLU from scratch
Model GPU Inference
ms
FDDB Precision
(100 errors)
MTCNN, batch 10.7 90%
+Tensor RT 8.8 91.2%
-20%
36. Softmax
– Learned features only separable but not discriminative
– The resulting features are not sufficiently effective
close
37. We need metric learning
– Tightness of the cluster
– Discriminative features
38. Triplet loss
Features
– Identity -> single point
– Enforces a margin between persons
Anchor
Positive Negativepositive + α < negative
minimize maximize
39. Choosing triplets
Crucial problem
How to choose triplets ? Useful triplets = hardest errors
Pick all
positive
Too easy
Hard enough
Solution
Hard-mining within a large mini-batch (>1000)
47. Center loss: structure
– Without classification loss – collapses
CNN
Embedding
Classify
Softmax
Loss
λ
Center
Loss
Pull
– Final loss = Softmax loss + λ Center loss
51. Center loss: summary
Overview
– Intra-class compactness and inter-class separability
– Good performance at several other tasks
Opensource Code
– Caffe (original, Megaface - 65%)
LFW, % Megaface
Triplet Loss 99.35 65
Center Loss
(Torch, ours)
99.60 71.7
52. Tricks: augmentation
Test time augmentation
– Flip image
Embedding
Flipped
Embedding
Final
Embedding
Average
– Average embeddings
– Compute 2 embeddings
54. Shades on
At one point we used shades augmentation
How to
– Construct several sunglass textures
– Place them using landmarks
55. Tricks: adding Eye loss
CNN
Embedding
Person
Softmax
Loss
Center
Loss
– We can force CNN to learn specific discriminative features
– For celebrities eye colors are available in the Internet
Eye
Loss
Eye
Color
56. Eye loss: summary
*Adding simple features doesn’t help, i.e gender
LFW, % Megaface
Center Loss + Tricks 99.68 73
Center Loss + Eye 99.68 73.5
60. Angular softmax: summary
Overview
– As describes in the paper: doesn’t work at all
– Works using sum of losses (m=1,N) over training
• only on small datasets!
LFW, % Megaface
Center Loss 99.6 73
Center Loss + Eye 99.68 73.5
A-Softmax (Torch) 99.68 74.2
Opensource Code
– Caffe (original)
– Slight modification of the loss yields 74.2%
61. Metric learning: summary
Softmax < Triplet < Center < A-Softmax
A-Softmax
– With bells and whistles better than center loss
Center loss
Overall
– Rule of thumb: use Center loss
– Metric learning may improve classification performance
63. Errors after MSCeleb: children
Problem
Children all look alike
Result
Embeddings are almost single point in
the space
64. Errors after MSCeleb: asian
Problem
Face Recognition’s intolerant to
Asians
Reason
Dataset doesn’t contain enough
photos of these categories
65. How to fix these errors ?
It’s all about data, we need diverse
dataset!
Natural choice – avatars of social networks
66. A way to construct dataset
Face
Detection
Pick
largest
Face
Recognition+
Clustering
Cleaning algorithm
1. Face detection2. Face recognition -> embeddings3. Hierarchical clustering algorithm4. Pick the largest cluster as a personIterate after each model improvement
67. MSCeleb dataset’s errors
MSCeleb is constructed by leveraging search engines
Joe Eszterhas
Joe Eszterhas and Mel Gibson public confrontation leads to the error
Mel Gibson
=
74. Workaround
Algorithm
1. Construct dataset with children
2. Compute average embedding
3. Every point inside the sphere – a child
4. Tighten distance threshold there
Results
This allows softening the overall threshold
75. How to handle big dataset
It seems we can add more data infinitely, but no.
Problems
– Memory consumption (Softmax)
– Computational costs
– A lot of noise in gradients
77. Softmax Approximation
Algorithm
1. Perform K-Means clustering using current FR model
CNN
Embedding
Predict
cluster
Predict
person MenPerson
Softmax
2. Two Softmax heads:
1. Predicts cluster label
2. Class within the true cluster
Cluster
Softmax
Men
85. Fixing trash clusters
«Trash» has small norm Faces
Trash
Softmax loss
Motivation
Softmax encourage big embedding’s norms
Results
– ROC AUC 97%
– Better then Laplacian for blurry
90. Summary
1. Use TensorRT to speed up inference
2. Metric learning: use Center loss by default
3. Clean your data thoroughly
4. Understanding CNN helps to fight errors
99. Histogram loss
Idea
– Compute similarities positive and negative pairs
– Maximize a probability that a randomly sampled positive pair has smaller similarity than
negative one
Loss = the integral of the product between the negative distribution and the cumulative density function for the positive distribution (shown with a dashed line)
«Another approach – HOG features (dlib)» «not far from state-of-the-art (Google reports 94%)»
Offtop:
обучается первая сеть на 12x12, вторая – 24x24, 3 – 36x36 – hard negative mining для 2 и 3 для исправления ошибок
выдает первая 5x1x1, а на пирамиде тензор, дальше составляется батч для 2 и 3 (батч над батчами не делали)
https://kpzhang93.github.io/MTCNN_face_detection_alignment/paper/spl.pdf
«Open source
Pre-trained model (Caffe)
No original training code
«
«but not Torch/PyTorch
We’ve retrained the CNN because of PreLu absense»
basic- actively devolping
TODO: возможно показать что мы не подряд засовываем в сеть, а сразу в кучу, хз не перегруз ли
«All faces of person are project to a single point»
Chad Smith
TODO: про альфа лишнее ?
https://ydwen.github.io/papers/WenECCV16.pdf, 65% из статьи
TODO: убрать текст ?
(Better than dlib)
TODO: не выпилить но доделать нормально фотка чувака точки + очки
«пол например не является discriminative – пробовали»
71.5% -> 75.5% или +2
71.5 vs 73, 75,5 из-за датасета
СЛОВА
«сказать что норма – это деление на сумму»
To discriminate better -> tighten clusters
http://proceedings.mlr.press/v48/liud16.pdf
http://proceedings.mlr.press/v48/liud16.pdf
“Рассказать про идею созданую по мотивам корпоратиа”
VK – easy API
OK – hard access to API, rate-limits
Facebook – just closed to mining
Cleaning algorithm
Face Detection
Face Recognition to get embeddings
Hierarchical clustering algorithm
with Soften the distance threshold
Pick the largest cluster as a person
iterate after each model improvement
this leads to improved perf on megaface = 75%?
цифра лучше ?
чуть меньше текста или на 2 слайда разбить
TODO: мы обнаружили что дети компанкто расположены
https://arxiv.org/pdf/1609.04309.pdf - paper: “Efficient softmax approximation for GPUs”
«какой профит помимо решения проблемы»
https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_gradients/py_gradients.html
It’s a face, but without enough facial information
“If the detector makes a mistake, recognizer gone crazy. Detection errors often form a cluster.”
«софтмакс стремится увеличить скалярное произведение»
maximizing scalar product <w_i, x> for i class
We normalize embeddings only for distance
“Social networks contains avatars over users’ lives”