PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

DEEP NEURAL
NETWORKS
Alex Conway
alex @ numberboost.com
PYCONZA
Keynote 2019
Neither confidential nor proprietary - please distribute ;)
for
Video Applications

2016 MultiChoice
Innovation Competition
1st PrizeWinners
2017 Mercedes-Benz
1st PrizeWinners
2018 Lloyd’s Register
1st PrizeWinners
2019 NTT & Dimension Data
1st PrizeWinners

https://www.youtube.com/watch?v=Gz0QZP2RKWA

https://twitter.com/goodfellow_ian/status/1084973596236144640

9https://twitter.com/quasimondo/status/1100016467213516801

10https://www.youtube.com/watch?feature=youtu.be&v=r6zZPn-6dPY&app=desktop

ORIGINAL FILM
Rear Window (1954)
PIX2PIX MODEL OUTPUT
Fully Automated
RE-MASTERED BY HAND
Painstakingly
https://hackernoon.com/remastering-classic-films-in-
tensorflow-with-pix2pix-f4d551fa0503

INPUT OUTPUT
ORIGINAL
https://arstechnica.com/information-technology/2017/02/google-brain-super-resolution-zoom-enhance/

https://techcrunch.com/2016/06/20/twitter-is-buying-magic-pony-technology-which-uses-neural-networks-
to-improve-images/

https://arxiv.org/abs/1508.06576
CONTENT
IMAGE
STYLE
IMAGE
STYLE
TRANSFER
OUTPUT
+ =

https://github.com/junyanz/CycleGAN 15

https://news.developer.nvidia.com/ai-can-transform-anyone-into-a-professional-dancer/

https://github.com/JoYoungjoo/SC-FEGAN

https://www.linkedin.com/feed/update/urn:li:activity:6498172448196820993

https://motherboard.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn

https://www.youtube.com/watch?v=MVBe6_o4cMI

https://twitter.com/XHNews/status/1098173090448629760

https://www.youtube.com/watch?v=aE1kA0Jy0Xg

https://www.youtube.com/watch?v=xhp47v5OBXQ

https://www.reddit.com/r/Cyberpunk/comments/ddplms/hk_wearable_face_projector_to_avoid_face/

https://twitter.com/x0rz/status/1104744170529439744

f (video) =
facial
expressions

f (video) =
video
with
new
faces

NEURAL NETWORKS
Set of connected Neurons
with randomly initialized weights
and non-linear activation functions
connected in a Network
that are optimized (learned)
using training data
to minimize prediction error

http://playground.tensorflow.org
http://playground.tensorflow.org

NON-LINEAR
ACTIVATION FUNCTIONS
TanhSigmoid ReLU

Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Note: Outputsof one layer are inputsinto the next layer
This (non-convolutional)architecture is called a “multi-layered perceptron”
(DEEP) NEURAL NETWORKS

HOW DOES A NEURAL
NETWORK LEARN?
New
weight =
Old
weight
Learning
rate- ( )x
“How much
error increases
whenwe increase
this weight”

GRADIENT DESCENT
http://scs.ryerson.ca/~aharley/neural-networks/

1 1,
3,
3,
7,
…
[[1, 2, 3 ]
[3, 2, 1]
[3, 4, 5]
[7, 8, 9]
…]
[[1, 2, 3 ]
[3, 2, 1]
[3, 4, 5]
[7, 8, 9]
…]
[[1, 2, 3 ]
[3, 2, 1]
[3, 4, 5]
[7, 8, 9]
…]

image tensor
500 x 500 x 3 = 750’000
60 second video at 10 FPS tensor
500 x 500 x 3 x 10 x 60 = 450’000’000

Convolutional
Neural Networks
(CNNs)

INPUT
28 x 28 pixel grayscale images
= 784 numbers

2 LAYER NEURAL NETWORK
0
1
2
3
4
5
6
7
8
9

https://www.youtube.com/watch?v=aircAruvnKk
3 LAYER NEURAL NETWORK

https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
(99.25% test accuracy in 192 seconds and 46 lines of code)

3 KEY CONVOLUTIONAL NETWORK
ARCHITECTURE IDEAS:
1. Local receptive fields
2. Shared weights
3. Subsampling

http://setosa.io/ev/image-kernels

78http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

80
Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833).

81
Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833).

Convolutional Nets Learn
Hierarchical Features
82

SUBSAMPLING
aka “POOLING”
83

we need
labelled
training data

14,197,122 images, 21841 synsets indexed
ILSVRC: 1‘200‘000 images, 1000 categories
ImageNet

IMAGENET
TOP-5 ERROR RATE
Traditional
Image Processing
Methods
AlexNet
8
Layers ZFNet
8 Layers
GoogLeNet
22 Layers
ResNet
152 Layers
SENet
Ensamble
TSNet
Ensamble

https://arxiv.org/abs/1611.01578

95https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Example: Use CNN to
Classify Product Images
96
https://github.com/alexcnwy/
DeepLearning4ComputerVision

99
USING A CNN AS A
FEATURE EXTRACTOR
Feature Extractor (“ENCODER”) Classifier

Extracting Features from an Image

Fine-tuning A CNN
To Solve A New Problem
96.3% accuracy in under 2 minutes for
classifying products into categories
(WITH ONLY 3467 TRAINING IMAGES!!1!)

https://www.youtube.com/watch?v=X4Q6C915sUY

https://www.pyimagesearch.com/2019/06/03/fine-
tuning-with-keras-and-deep-learning/

IMAGE & VIDEO MODERATION
TODO
106

https://www.youtube.com/watch?v=VOC3huqHrss

1.5 million object instances
80 object categories
http://cocodataset.org

https://github.com/tensorflow/models/blob/master/research
/object_detection/g3doc/detection_model_zoo.md

https://github.com/tzutalin/labelImg
CUSTOM OBJECT
DETECTION

https://towardsdatascience.com/how-to-train-your-own-object-
detector-with-tensorflows-object-detector-api-bec72ecfe1d9

CNN …
P(A) = 0.005
P(B) = 0.002
P(C) = 0.98
P(9) = 0.001
P(0) = 0.03

https://www.reddit.com/r/southafrica/comments/asl4n5/when_a_l
ittle_is_just_not_enough/

https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/
ID #2
ID #1
“CENTROID TRACKING”

For each object with ID in frame t, compute distance to centroid of
every object in frame t + 1 and assign same ID provided distance less
than threshold, else assign new ID

ID #1
ID #2

https://www.youtube.com/watch?v=FfU22I-_dI4

https://www.youtube.com/watch?v=NW-rXqCl7us

Recurrent
Neural Networks
(RNNs)

150http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Frame model accuracy <<< Video model accuracy

https://i.imgur.com/mGXdpdp.gifv

Frame-level
Action
Recognition
(7 classes)

161
https://github.com/alexcnwy/Deep-Neural-
Networks-for-Video-Classification

XXX
163https://www.youtube.com/watch?v=UeheTiBJ0Io
VIDEO Q&A

XXX
VIDEO Q&A

VIDEO Q&A

FACE SWAP
https://www.youtube.com/watch?v=7XchCsYtYMQ

FACE SWAP
https://www.youtube.com/watch?v=7XchCsYtYMQ
Detect face & crop then
run “face swap” model…
2 networks:
- same CNN encoder,
- different decoder
Feed image to encoder
to create vector of
input face then feed to
decoder B to produce
output face

https://github.com/wuhuikai/FaceSwap
FACE SWAP

Few-Shot Adversarial
Learning of Realistic Neural
Talking Head Models
https://www.youtube.com/watch?v=p1b5aiTrGzY

Talking Head Models
Network 1: CNN embedder compresses faces & landmarks to vector

Talking Head Models
Network 2: Generator takes landmarks and synthesizes photo

Talking Head Models
Network 3: Discriminator learns to tell apart real and synthesized photos

POSE ESTIMATION
https://www.youtube.com/watch?v=pW6nZXeWlGM

https://github.com/CMU-Perceptual-Computing-Lab/openpose

https://www.affectiva.com/product/affectiva-
automotive-ai-for-driver-monitoring-solutions/
DISTRACTED DRIVING
DETECTION

SELF-DRIVING CARS
https://www.youtube.com/watch?v=nuMQ4LNMWu8

https://arstechnica.com/cars/2019/08/elon-musk-says-
driverless-cars-dont-need-lidar-experts-arent-so-sure/

Don’t be scared
to git clone
functions and use
deep learning!

Deep Learning Indaba
http://www.deeplearningindaba.com
Jeremy Howard & Rachel Thomas
http://course.fast.ai
Andrej Karpathy’s Class on Computer Vision
http://cs231n.github.io
Richard Socher’s Class on NLP (great RNN resource)
http://web.stanford.edu/class/cs224n/
Keras docs
https://keras.io/
GREAT FREE RESOURCES

THANK YOU!
@alxcnwy
alex @ numberboost.com

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

Similar to PyConZA 2019 Keynote - Deep Neural Networks for Video Applications (20)

Recently uploaded

Recently uploaded (20)

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications