Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

Slides from my PyConZA 2019 Keynote on "Deep Neural Networks for Video Applications"

Don't be afraid of A.I. ... git clone a relevant function (deep learning model), fine-tune it for your use case if required and use it to build cool things! I also do consulting if you get stuck or need help @@@ numberboost.com :P

"Most CCTV video cameras exist as a sort of time machine for insurance purposes. Deep neural networks make it easy to convert video into actionable data which can be used to trigger real-time anomaly alerts and optimize complex business processes. In addition to commercial applications, deep learning can be used to analyze large amounts of video recorded from the point of view of animals to study complex behavior patterns impossible to otherwise analyze. This talk will present some theory of deep neural networks for video applications as well as academic research and several applied real-world industrial examples, with code examples in python."

Note: links are hard to click in SlideShare but are clickable if you download PDF :)

#deeplearning #machinelearning #deeplearningforvideo #convolutionalneuralnetworks #recurrentneuralnetworks #centroidtracking #objectdetection #deepfakes #poseestimation #videomachinelearning #numberboost

  • Sé el primero en comentar

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

  1. 1. DEEP NEURAL NETWORKS Alex Conway alex @ numberboost.com PYCONZA Keynote 2019 Neither confidential nor proprietary - please distribute ;) for Video Applications
  2. 2. 2016 MultiChoice Innovation Competition 1st PrizeWinners 2017 Mercedes-Benz Innovation Competition 1st PrizeWinners 2018 Lloyd’s Register Innovation Competition 1st PrizeWinners 2019 NTT & Dimension Data Innovation Competition 1st PrizeWinners
  3. 3. HANDS UP! 🙌
  4. 4. https://www.youtube.com/watch?v=Gz0QZP2RKWA
  5. 5. https://twitter.com/goodfellow_ian/status/1084973596236144640
  6. 6. 9https://twitter.com/quasimondo/status/1100016467213516801
  7. 7. 10https://www.youtube.com/watch?feature=youtu.be&v=r6zZPn-6dPY&app=desktop
  8. 8. ORIGINAL FILM Rear Window (1954) PIX2PIX MODEL OUTPUT Fully Automated RE-MASTERED BY HAND Painstakingly https://hackernoon.com/remastering-classic-films-in- tensorflow-with-pix2pix-f4d551fa0503
  9. 9. INPUT OUTPUT ORIGINAL https://arstechnica.com/information-technology/2017/02/google-brain-super-resolution-zoom-enhance/
  10. 10. https://techcrunch.com/2016/06/20/twitter-is-buying-magic-pony-technology-which-uses-neural-networks- to-improve-images/
  11. 11. https://arxiv.org/abs/1508.06576 CONTENT IMAGE STYLE IMAGE STYLE TRANSFER OUTPUT + =
  12. 12. https://github.com/junyanz/CycleGAN 15
  13. 13. https://news.developer.nvidia.com/ai-can-transform-anyone-into-a-professional-dancer/
  14. 14. https://github.com/JoYoungjoo/SC-FEGAN
  15. 15. https://www.linkedin.com/feed/update/urn:li:activity:6498172448196820993
  16. 16. https://motherboard.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn
  17. 17. https://www.youtube.com/watch?v=MVBe6_o4cMI
  18. 18. https://twitter.com/XHNews/status/1098173090448629760
  19. 19. https://www.youtube.com/watch?v=aE1kA0Jy0Xg
  20. 20. https://www.youtube.com/watch?v=xhp47v5OBXQ
  21. 21. https://www.reddit.com/r/Cyberpunk/comments/ddplms/hk_wearable_face_projector_to_avoid_face/
  22. 22. https://twitter.com/x0rz/status/1104744170529439744
  23. 23. f (video) = useful data
  24. 24. f (video) = useful data
  25. 25. f (video) = clip label
  26. 26. f (video) = frame label
  27. 27. f (video) = object count
  28. 28. f (video) = object activity
  29. 29. f (video) = object poses
  30. 30. f (video) = facial expressions
  31. 31. f (video) = higher res video
  32. 32. f (video) = video with new faces
  33. 33. Neural Networks Crash Course
  34. 34. NEURAL NETWORKS Set of connected Neurons with randomly initialized weights and non-linear activation functions connected in a Network that are optimized (learned) using training data to minimize prediction error
  35. 35. http://playground.tensorflow.org http://playground.tensorflow.org
  36. 36. WHAT IS A NEURON?
  37. 37. LINEAR
  38. 38. NON-LINEAR
  39. 39. NON-LINEAR ACTIVATION FUNCTIONS TanhSigmoid ReLU
  40. 40. Inputs outputs hidden layer 1 hidden layer 2 hidden layer 3 Note: Outputsof one layer are inputsinto the next layer This (non-convolutional)architecture is called a “multi-layered perceptron” (DEEP) NEURAL NETWORKS
  41. 41. HOW DOES A NEURAL NETWORK LEARN? New weight = Old weight Learning rate- ( )x “How much error increases whenwe increase this weight”
  42. 42. GRADIENT DESCENT http://scs.ryerson.ca/~aharley/neural-networks/
  43. 43. 1 1, 3, 3, 7, … [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …]
  44. 44. image tensor 500 x 500 x 3 = 750’000 60 second video at 10 FPS tensor 500 x 500 x 3 x 10 x 60 = 450’000’000
  45. 45. Convolutional Neural Networks (CNNs)
  46. 46. INPUT 28 x 28 pixel grayscale images = 784 numbers
  47. 47. 2 LAYER NEURAL NETWORK 0 1 2 3 4 5 6 7 8 9
  48. 48. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK
  49. 49. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK
  50. 50. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK
  51. 51. https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192 seconds and 46 lines of code)
  52. 52. 3 KEY CONVOLUTIONAL NETWORK ARCHITECTURE IDEAS: 1. Local receptive fields 2. Shared weights 3. Subsampling
  53. 53. 76 VGGNet
  54. 54. http://setosa.io/ev/image-kernels
  55. 55. 78http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
  56. 56. 79
  57. 57. 80 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  58. 58. 81 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  59. 59. Convolutional Nets Learn Hierarchical Features 82
  60. 60. SUBSAMPLING aka “POOLING” 83
  61. 61. 84 VGGNet
  62. 62. we need labelled training data
  63. 63. 14,197,122 images, 21841 synsets indexed ILSVRC: 1‘200‘000 images, 1000 categories ImageNet
  64. 64. 89 ImageNet
  65. 65. 90 ImageNet
  66. 66. IMAGENET TOP-5 ERROR RATE Traditional Image Processing Methods AlexNet 8 Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  67. 67. https://arxiv.org/abs/1611.01578
  68. 68. 95https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
  69. 69. Example: Use CNN to Classify Product Images 96 https://github.com/alexcnwy/ DeepLearning4ComputerVision
  70. 70. 97
  71. 71. TRANSFER LEARNING 🎉
  72. 72. 99 USING A CNN AS A FEATURE EXTRACTOR Feature Extractor (“ENCODER”) Classifier
  73. 73. Extracting Features from an Image
  74. 74. feature vector =f ( )
  75. 75. Adding a New Classifier
  76. 76. Fine-tuning A CNN To Solve A New Problem 96.3% accuracy in under 2 minutes for classifying products into categories (WITH ONLY 3467 TRAINING IMAGES!!1!)
  77. 77. https://www.youtube.com/watch?v=X4Q6C915sUY
  78. 78. https://www.pyimagesearch.com/2019/06/03/fine- tuning-with-keras-and-deep-learning/
  79. 79. IMAGE & VIDEO MODERATION TODO 106
  80. 80. Object Detection
  81. 81. https://www.youtube.com/watch?v=VOC3huqHrss
  82. 82. 1.5 million object instances 80 object categories http://cocodataset.org
  83. 83. https://github.com/tensorflow/models/blob/master/research /object_detection/g3doc/detection_model_zoo.md
  84. 84. DEMO (HOLD THUMBS) 😅
  85. 85. https://github.com/tzutalin/labelImg CUSTOM OBJECT DETECTION
  86. 86. https://towardsdatascience.com/how-to-train-your-own-object- detector-with-tensorflows-object-detector-api-bec72ecfe1d9
  87. 87. CNN … P(A) = 0.005 P(B) = 0.002 P(C) = 0.98 P(9) = 0.001 P(0) = 0.03
  88. 88. https://www.reddit.com/r/southafrica/comments/asl4n5/when_a_l ittle_is_just_not_enough/
  89. 89. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ ID #2 ID #1 “CENTROID TRACKING”
  90. 90. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING”
  91. 91. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING” For each object with ID in frame t, compute distance to centroid of every object in frame t + 1 and assign same ID provided distance less than threshold, else assign new ID
  92. 92. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING”
  93. 93. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ ID #1 ID #2 “CENTROID TRACKING”
  94. 94. https://www.youtube.com/watch?v=FfU22I-_dI4
  95. 95. https://www.youtube.com/watch?v=NW-rXqCl7us
  96. 96. Recurrent Neural Networks (RNNs)
  97. 97. 144
  98. 98. SPATIO-TEMPORAL
  99. 99. SPORTS 1-M
  100. 100. SPATIAL … THEN TEMPORAL
  101. 101. 150http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  102. 102. feature vector =f ( )
  103. 103. Frame model accuracy <<< Video model accuracy
  104. 104. https://i.imgur.com/mGXdpdp.gifv
  105. 105. Frame-level Action Recognition (7 classes)
  106. 106. Frame model accuracy <<< Video model accuracy
  107. 107. 161 https://github.com/alexcnwy/Deep-Neural- Networks-for-Video-Classification
  108. 108. MORE (CRAZY) APPLICATIONS
  109. 109. XXX 163https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A
  110. 110. XXX 164https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A
  111. 111. 165https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A
  112. 112. FACE SWAP https://www.youtube.com/watch?v=7XchCsYtYMQ
  113. 113. FACE SWAP https://www.youtube.com/watch?v=7XchCsYtYMQ Detect face & crop then run “face swap” model… 2 networks: - same CNN encoder, - different decoder Feed image to encoder to create vector of input face then feed to decoder B to produce output face
  114. 114. https://github.com/wuhuikai/FaceSwap FACE SWAP
  115. 115. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models https://www.youtube.com/watch?v=p1b5aiTrGzY
  116. 116. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models https://www.youtube.com/watch?v=p1b5aiTrGzY
  117. 117. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network 1: CNN embedder compresses faces & landmarks to vector
  118. 118. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network 2: Generator takes landmarks and synthesizes photo
  119. 119. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network 3: Discriminator learns to tell apart real and synthesized photos
  120. 120. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM
  121. 121. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM
  122. 122. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM
  123. 123. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM
  124. 124. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM
  125. 125. https://github.com/CMU-Perceptual-Computing-Lab/openpose
  126. 126. https://www.affectiva.com/product/affectiva- automotive-ai-for-driver-monitoring-solutions/ DISTRACTED DRIVING DETECTION
  127. 127. SELF-DRIVING CARS https://www.youtube.com/watch?v=nuMQ4LNMWu8
  128. 128. https://arstechnica.com/cars/2019/08/elon-musk-says- driverless-cars-dont-need-lidar-experts-arent-so-sure/
  129. 129. REMEMBER 💡
  130. 130. f (video) = useful data
  131. 131. Don’t be scared to git clone functions and use deep learning!
  132. 132. Deep Learning Indaba http://www.deeplearningindaba.com Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej Karpathy’s Class on Computer Vision http://cs231n.github.io Richard Socher’s Class on NLP (great RNN resource) http://web.stanford.edu/class/cs224n/ Keras docs https://keras.io/ GREAT FREE RESOURCES
  133. 133. THANK YOU! @alxcnwy alex @ numberboost.com

    Sé el primero en comentar

    Inicia sesión para ver los comentarios

  • AlexConway2

    Oct. 26, 2019
  • OlivierWellmann

    Nov. 7, 2019

Slides from my PyConZA 2019 Keynote on "Deep Neural Networks for Video Applications" Don't be afraid of A.I. ... git clone a relevant function (deep learning model), fine-tune it for your use case if required and use it to build cool things! I also do consulting if you get stuck or need help @@@ numberboost.com :P "Most CCTV video cameras exist as a sort of time machine for insurance purposes. Deep neural networks make it easy to convert video into actionable data which can be used to trigger real-time anomaly alerts and optimize complex business processes. In addition to commercial applications, deep learning can be used to analyze large amounts of video recorded from the point of view of animals to study complex behavior patterns impossible to otherwise analyze. This talk will present some theory of deep neural networks for video applications as well as academic research and several applied real-world industrial examples, with code examples in python." Note: links are hard to click in SlideShare but are clickable if you download PDF :) #deeplearning #machinelearning #deeplearningforvideo #convolutionalneuralnetworks #recurrentneuralnetworks #centroidtracking #objectdetection #deepfakes #poseestimation #videomachinelearning #numberboost

Vistas

Total de vistas

645

En Slideshare

0

De embebidos

0

Número de embebidos

27

Acciones

Descargas

19

Compartidos

0

Comentarios

0

Me gusta

2

×