Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Deep	Learning	in	Computer	Vision
Axon@Grokking
Oct.	28,	2017
Dang	Huynh
Education
• Ph.D.	in	Computer	Science	(France)
Work
• Jan	2017	– now:	Axon	Enterprise
• 2015	– 2016:	Misfit
• 2...
!=
We	are	AXON!
3/43
Outline
•Refresh
•Computer	vision
•Deep	learning	in	Computer	vision
•Theory	vs.	Reality
•Demo
4/43
Refresh
Machine	learning	and	Deep	learning
5/43
Machine	learning
Input	data	à prediction	model à output	label
y
x
y	=	F(x)
x0
y0?
6/43
Machine	Learning
y	=	4x1
3 - 2x2
2 +	8
x2
f(x)	=	x3x1
f(x)	=	x2
+1
y
weight=1
0
0
1
4
-2
8
7/43
Machine	Learning
Challenges
• Relevant	data	acquisition
• Data	preprocessing
• Feature	selection
• Model	selection:	simpli...
Deep	Learning
• Machine	Learning	with	many	(deep)	hidden	layers
x2
x1
+1
+1
+1
y1
y2
Hidden	layersInput Output
9/43
Why	deep	learning?
Amount	of	data
Performance
Deep	learning
Machine	learning
10/43
Computer	Vision
intro
11/43
Make	computers	understand	images	and	video:
- Detection
- Recognition
- Tracking
- Extraction
Computer	Vision
Object detec...
Still	there	are	challenges:	object	can	be…
Computer	Vision
… partly	occluded	
… or	even	fully	occluded.	
13/43
Challenge
We were building a human detector, and we accidentally got future human detector!
14/43
15/43
Traditional	approach																Deep	learning	approach
has two eyes?
has a nose below eyes?
Ok, it’s a face!
….....
Traditional	approach	vs.	Deep	learning
16/43
ImageNet: 1.2 million images with 1000 object categories
Source:	http://patte...
Deep	Learning in Computer	Vision
17/43
Computer	Vision
What	computer	sees
Red
43 45 21
13 34 12
23 88 55
Green
19 89 27
17 57 29
75 56 94
Blue
19 89 27
17 57 29
...
Intuition
x2
x1
+1
+1
+1
y1
y2
Hidden	layersInput Output
Facial	detection
Green
Red
Blue
19/43
Convolutional	Neural	Network	(CNN)
Idea:	having	a	filter	scanning	over	image.
Output	matrix
Input	matrix	
(e.g.,	image)
Fi...
CNN – Striding	and	Padding
Control	how	the	filter	convolves	around	the	input	matrix.
Output	matrix
Input	matrix	
(e.g.,	im...
Convolutional	operation
0 1 1 1 0 0 0
0 0 1 1 1 0 0
0 0 0 1 1 1 0
0 0 0 1 1 0 0
0 0 1 1 0 0 0
0 1 1 0 0 0 0
1 1 0 0 0 0 0
...
Rectified	Linear	Unit	(ReLU)
ReLU:	F(y)	=	max(0,y)
-3 2 0
1 -1 0
-5 2 4
0 2 0
1 0 0
0 2 4
ReLU
Non-linear	activation	funct...
Max	Pooling
1 0 2 3
4 6 6 8
3 1 1 0
1 2 2 4
6 8
3 4
Reduce	dimension	and	avoid	overfitting.
Max	pool	with	2x2	filter	and	s...
Example
Input
24	x	24	x	3
11	x	11	x	28 4	x	4 x	48 3	x	3	x	64
face/non-face
bounding	box	
regression
2
4
Conv:	3	x	3
MP:	2	...
Object	scales
• Detect	object	of	various	sizes.
Source:	https://www.pyimagesearch.com
Input
Tradeoffs?
scans	over
26/43
Data	augmentation
• Generate	more	artificial	data	points	from	base	data.
• Apply	with	care to	other	data	types!
Original L...
Complex	data	augmentation
Face rotation
28/43
Why	data	augmentation?
WITHOUT augmentation
AXON detection
WITH augmentation
29/43
How	to	benchmark?
Facebook detection 30/43
Theory	vs.	Reality
31/43
Deep	learning	in	Computer	Vision
Pros:
• DL	reduces	the	need	for	feature	engineering.
• DL	outperforms	classical	Computer	...
Performance	vs.	Portability
Theory Reality
33/43
Performance	vs.	Power	consumption
Theory Reality
Portable battery
34/43
Special	hardware	for	Deep	Learning
Jetson TX2 (NVDIA) Google TPU Movidius Myriad
• Optimized	for	specific	use	case.
• Not	...
Privacy
• The	police	are	our	customers,	so	data	privacy	is	important.
• Can	we	“extract	features”	from	the	private	data?
3...
Demo
37/43
Workflow	and	tool	set
38/43
Skin	blurring
39/43
Facial	detection	with	tracking
40/43
License	plate	detection
41/43
Take	Home	message
42/43
Industry	perspective
Always	consider	the	following	4Ps:
• Performance
• Power	consumption
• Portability
• Price
Deep	learn...
Thank	you
44/43
We	are	Hiring
Full	Stack,	Research	Engineers,	Security.
https://jobs.lever.co/axon
45/43
Próxima SlideShare
Cargando en…5
×

Grokking TechTalk #21: Deep Learning in Computer Vision

519 visualizaciones

Publicado el

Deep Learning in Computer Vision

The field of deep learning is rapidly growing and surpassing traditional approaches for machine learning and pattern recognition. Do you know how is Machine Learning/Deep Learning being applied in Computer Vision?

This talk, you will learn about:
- How ML/DL are being applied in Computer Vision (a very old field!)
- Comparision between old methods/algorithms of Computer Vision, and the new approaches using Deep Learning
- Discuss how DL in CV are being done in practice (vs theory)

Speaker: Dang Huynh, Researcher @ Axon Research

About speaker:
Dang Huynh is a researcher in Machine Vision at Axon Research. Prior to joining Axon in 2017, Dang was a researcher in Machine Vision at Misfit where he focused on video-based biometric technology. Before Misfit, he was a research engineer at Nokia Bell Labs, in the Mathematics of Dynamic Systems department from 2011 to 2015. His research interests include machine vision, data science and telecommunication systems. Dang received his Ph.D. degree in Computer Science from University Pierre and Marie Curie (UPMC), France.

Publicado en: Tecnología
  • Sé el primero en comentar

Grokking TechTalk #21: Deep Learning in Computer Vision

  1. 1. Deep Learning in Computer Vision Axon@Grokking Oct. 28, 2017
  2. 2. Dang Huynh Education • Ph.D. in Computer Science (France) Work • Jan 2017 – now: Axon Enterprise • 2015 – 2016: Misfit • 2011 – 2015: Nokia Bell Labs Research domains • Machine vision. • Data science. • Telecommunication systems. Axon Enterprise Misfit Nokia Bell Labs 2/43
  3. 3. != We are AXON! 3/43
  4. 4. Outline •Refresh •Computer vision •Deep learning in Computer vision •Theory vs. Reality •Demo 4/43
  5. 5. Refresh Machine learning and Deep learning 5/43
  6. 6. Machine learning Input data à prediction model à output label y x y = F(x) x0 y0? 6/43
  7. 7. Machine Learning y = 4x1 3 - 2x2 2 + 8 x2 f(x) = x3x1 f(x) = x2 +1 y weight=1 0 0 1 4 -2 8 7/43
  8. 8. Machine Learning Challenges • Relevant data acquisition • Data preprocessing • Feature selection • Model selection: simplicity versus complexity • Result interpretation. 8/43
  9. 9. Deep Learning • Machine Learning with many (deep) hidden layers x2 x1 +1 +1 +1 y1 y2 Hidden layersInput Output 9/43
  10. 10. Why deep learning? Amount of data Performance Deep learning Machine learning 10/43
  11. 11. Computer Vision intro 11/43
  12. 12. Make computers understand images and video: - Detection - Recognition - Tracking - Extraction Computer Vision Object detection 12/43
  13. 13. Still there are challenges: object can be… Computer Vision … partly occluded … or even fully occluded. 13/43
  14. 14. Challenge We were building a human detector, and we accidentally got future human detector! 14/43
  15. 15. 15/43 Traditional approach Deep learning approach has two eyes? has a nose below eyes? Ok, it’s a face! ….. Feature engineering NO feature engineering
  16. 16. Traditional approach vs. Deep learning 16/43 ImageNet: 1.2 million images with 1000 object categories Source: http://pattern-recognition.weebly.com/ Deep learningTradition
  17. 17. Deep Learning in Computer Vision 17/43
  18. 18. Computer Vision What computer sees Red 43 45 21 13 34 12 23 88 55 Green 19 89 27 17 57 29 75 56 94 Blue 19 89 27 17 57 29 75 56 94 y = F(Red, Green, Blue) 3-D input array Facial detection 18/43
  19. 19. Intuition x2 x1 +1 +1 +1 y1 y2 Hidden layersInput Output Facial detection Green Red Blue 19/43
  20. 20. Convolutional Neural Network (CNN) Idea: having a filter scanning over image. Output matrix Input matrix (e.g., image) Filter (grey) Source: https://github.com/vdumoulin/conv_arithmetic Convolutional process 20/43
  21. 21. CNN – Striding and Padding Control how the filter convolves around the input matrix. Output matrix Input matrix (e.g., image) Filter (grey) Source: https://github.com/vdumoulin/conv_arithmetic Stride = 2, Zero-padding = 1 21/43
  22. 22. Convolutional operation 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 4 3 4 1 1 2 4 3 3 1 2 3 4 1 1 3 3 1 1 3 3 1 1 0 5 x 5 Output 3 x 3 Filter 7 x 7 Input * = Input [height1, width1, # of channels] Filter [height2, width2, # of channels] Output [height3, width3, # of filters] 22/43
  23. 23. Rectified Linear Unit (ReLU) ReLU: F(y) = max(0,y) -3 2 0 1 -1 0 -5 2 4 0 2 0 1 0 0 0 2 4 ReLU Non-linear activation function. 23/43
  24. 24. Max Pooling 1 0 2 3 4 6 6 8 3 1 1 0 1 2 2 4 6 8 3 4 Reduce dimension and avoid overfitting. Max pool with 2x2 filter and stride 2 24/43
  25. 25. Example Input 24 x 24 x 3 11 x 11 x 28 4 x 4 x 48 3 x 3 x 64 face/non-face bounding box regression 2 4 Conv: 3 x 3 MP: 2 x 2 Conv: 3 x 3 MP: 3 x 3 Conv: 2 x 2 Fully connected 128 Suppose that all Max Pooling (MP) layer has stride 2. Input: 24 x 24 x 3 Conv: 3 x 3 x 3 MP: 2 x 2 (stride 2) à Output dimension (24 – 3 + 1) / 2 = 11 25/43
  26. 26. Object scales • Detect object of various sizes. Source: https://www.pyimagesearch.com Input Tradeoffs? scans over 26/43
  27. 27. Data augmentation • Generate more artificial data points from base data. • Apply with care to other data types! Original Little noise Moderate Heavy noise 27/43
  28. 28. Complex data augmentation Face rotation 28/43
  29. 29. Why data augmentation? WITHOUT augmentation AXON detection WITH augmentation 29/43
  30. 30. How to benchmark? Facebook detection 30/43
  31. 31. Theory vs. Reality 31/43
  32. 32. Deep learning in Computer Vision Pros: • DL reduces the need for feature engineering. • DL outperforms classical Computer Vision approaches. Cons: • DL requires a huge amount of data (> 100K samples). • DL is extremely computationally expensive to train (weeks on GPUs). • DL model structure is a black box. 32/43
  33. 33. Performance vs. Portability Theory Reality 33/43
  34. 34. Performance vs. Power consumption Theory Reality Portable battery 34/43
  35. 35. Special hardware for Deep Learning Jetson TX2 (NVDIA) Google TPU Movidius Myriad • Optimized for specific use case. • Not plug-and-play, need good engineers to make it work. Still far from consumer… 35/43
  36. 36. Privacy • The police are our customers, so data privacy is important. • Can we “extract features” from the private data? 36/43
  37. 37. Demo 37/43
  38. 38. Workflow and tool set 38/43
  39. 39. Skin blurring 39/43
  40. 40. Facial detection with tracking 40/43
  41. 41. License plate detection 41/43
  42. 42. Take Home message 42/43
  43. 43. Industry perspective Always consider the following 4Ps: • Performance • Power consumption • Portability • Price Deep learning is not a magic: tradeoff always exists! 43/43
  44. 44. Thank you 44/43
  45. 45. We are Hiring Full Stack, Research Engineers, Security. https://jobs.lever.co/axon 45/43

×