Introduction to the Artificial Intelligence and Computer Vision revolution

Introduction to the Artificial Intelligence
and Computer Vision revolution
Darian Frajberg
darian.frajberg@polimi.it
October 30, 2017

2
Introduction
§ What is Artificial Intelligence?
Computers with the ability to
reason as humans
§ What is Machine Learning?
Computers with the ability to learn
without being explicitly
programmed
§ What is Deep Learning?
Computers with the ability to learn
by using artificial neural networks,
which were inspired by the
structure and function of the brain

3
§ What is Computer Vision?
The ability of computers to acquire, analyze and understand digital
images/videos
”If We Want Machines to Think, We Need to Teach Them to See."
-Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab
Introduction

4
From Hand-crafted Features to Learned Features
§ Traditional Computer Vision
§ Deep Learning
Sven Behnke: Visual Perception using Deep Convolutional Neural Networks, Bilbao DeepLearn Summer School (2017)

5
Deep Learning breakthrough
§ Data set with over 15M labeled images
§ Approximately 22k categories
§ Collected from web and labeled by Amazon Mechanical Turk
(crowdsourcing tool)
http://www.image-net.org

6
Large Scale Visual Recognition Challenge (ILSVRC)
§ Annual competition of image classification at large scale since 2010
§ Classification: make 5 guesses about the image label
§ 1K categories
§ 1.2M training images
§ 100k test images
Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision
115.3 (2015): 211-252.

7
ILSVRC-2012 Results
AlexNet
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
Place Model Team Top-5 (test)
1st AlexNet (CNN) SuperVision 15.3%
2nd SIFT + FVs ISI 26.2%
3rd SVM OXFORD_VGG 26.97%
+10.9%

8
28,19
25,77
15,31
11,2
7,32 6,66
3,57 2,99 2,25
5,1
0
5
10
15
20
25
30
2010
(NEC)
2011
(Xerox)
2012
(AlexNet)
2013
(ZF)
2014
(VGG)
2014
(GoogleNet)
Human 2015
(ResNet)
2016
(Ensemble)
2017
(Ensemble)
ILSVRC top-5 error on ImageNet
Human
Deep
Shallow
First Deep Architecture
First to beat Human

9
§ Which were the key elements to achieve those results?

10
Data Models
§ Deep Artificial Neural Networks have accomplished outstanding results
§ The term “deep” refers to the number of hidden layers in the neural
network between the input and the output
§ In Computer Vision, in particular, Convolutional Neural Networks (CNNs)
are the architecture having more success
§ This architecture leads to better models, able to learn more complex non-
linear features

12
Processing power
§ Acceleration
• Training
• Deployment
§ DL oriented hardware
• GPU
• ASIC
• CPU
• FPGA

13
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
Deep Learning software
Caffe
(UC Berkeley)
Caffe2
(Facebook)
Torch
(NYU/Facebook)
PyTorch
(Facebook)
Theano
(U Montreal)
TensorFlow
(Google)
Paddle
(Baidu)
CNTK
(Microsoft)
MXNet
(Amazon)
Developed by U Washington, CMU,
MIT, Hong Kong U, etc, but main
framework of choice at AWS
MatConvNet
(University of
Oxford)
Keras
(François Chollet)
Deeplearning4j
(SkyTeam)
And more…

14
Some Computer Vision tasks
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

15
Some Computer Vision applications
§ Autonomous vehicles
§ Face recognition
§ Gesture recognition
§ Augmented reality
§ Industrial automation and inspection
§ Medical and biomedical
§ Monitoring and surveillance
§ Image retrieval
§ Photography and video enhancement

16
In 2012 Facebook announced the acquisition of
Face.com, a facial recognition technology company
that was based in Israel (US$60 million)
In 2014 they published a paper on “Deepface”. Facebook used a small part
of its database of users and outperformed all face recognition benchmarks
§ Data set: 4000 people x 1100 images for each one = 4,4M images
Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2014.
Face recognition

17
Face recognition
Evaluation by using public data set benchmarks
Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2014.

18
Classification of skin cancer
§ CNN training on 129.450 images
§ Comparison with dermatologists
Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639
(2017): 115-118.

19
Data vs Performance
More data, more performance

20
Small labeled data sets
Problems
§ Overfitting becomes much harder to avoid
§ Outliers become more dangerous
Solutions
§ Clean data noise
§ Get more annotated data
§ Reduce model complexity, being careful with underfitting
§ Apply regularizations
§ Semi-supervised learning
§ Apply transfer learning
§ Apply data augmentation
§ Generate synthetic data

21
Transfer learning
It is the ability of an AI to learn from a certain task/domain and apply its
pre-learned knowledge to another new task/domain
http://ruder.io/transfer-learning/

22
Data augmentation
For many problems, we can use known invariances to transform existing
training samples into new training samples (not validation and test!)
For example, for image classification and object recognition, we have:
§ Translation invariance
§ Limited scale invariance
§ Limited rotation invariance
§ Limited photometric and color invariance

23
Synthetic data
“Any production data applicable to a given situation that are not
obtained by direct measurement”
-McGraw-Hill Dictionary of Scientific & Technical Terms
Problems
§ Gap between synthetic and real data
§ The final model has to generalize and
work well with real data
Solutions
§ Transfer Learning
§ Generative Adversarial Networks

24
MPIIGaze data set
Data set to estimate the appearance-based gaze
Zhang, Xucong, et al. "Appearance-based gaze estimation in the wild." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2015.

25
Generative Adversarial Network
Simulated+Unsupervised learning to add realism to the simulator while
preserving the annotations of the synthetic images
Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv
preprint arXiv:1612.07828 (2016).

26
Generative Adversarial Network
Results
Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv
preprint arXiv:1612.07828 (2016).

27
Challenges in Deep Learning
§ Society adaptation
§ Data Bias
§ Responsibility regulations and implications
§ Black box understanding
§ Multitask learning
§ Continuous learning
§ Self learning

28
Society adaptation
“Robots will be able to do everything better than us”
-Elon Musk, founder of PayPal, Tesla Motors, SpaceX, OpenAI, Neuralink, etc

29
Data Bias
”Forget Killer Robots—Bias Is the Real AI Danger."
-John Giannandrea, AI Chief at Google

30
Data Bias
Microsoft created Tay
AI chat bot on March
2016
It was designed to
mimic the behavior of
an American teenager
girl and to learn from
interacting with
Twitter users
After a few hours it
became offensive and
racist, and it had to
be shut down

31
Responsibility regulations and implications
Autonomous vehicles
§ What happens if there is an
accident?
§ Who is responsible for it?
§ How can it be analyzed?

32
Black box understanding
Deep neural networks learn features based on the data that they are
provided and they are usually not understood by humans
Problems
§ Trust them without understanding well the reasoning behind predictions
§ Understand and fix erroneous predictions causes
“Whether it’s an investment decision, a medical decision, or maybe a
military decision, you don’t want to just rely on a black box method”
-Tommi Jaakkola, Professor of Computer Science and AI at MIT
Deep ModelINPUT OUTPUT

33
Explaining an image classification prediction made by Google’s Inception
neural network
Top-3 classes:
§ Electric guitar
(p = 0.32)
§ Acoustic guitar
(p = 0.24)
§ Labrador
(p = 0.21)
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any
classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.

34
Explaining an erroneous image prediction made by a Wolf/Husky classifier
Solution
1. Verify whether the training data set contains mostly wolves with snow in
the background
2. Increase data set with images of Huskies in the snow and with images of
Wolves in other environments
3. Retrain
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any
classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.

35
Multitask and continuous learning
Multitask learning
§ Flexible and general purpose AI,
able to solve different problems,
instead of being built to face just a
specific one
Continuous learning
§ Adaptable AI, able to evolve and
get adjusted to the new knowledge,
without the need to re-train every
time
CONTINUOUS
LEARNING

36
Self learning
It is the ability of AI to learn unsupervised features by itself by using
algorithms that are able to learn from unlabeled data
Unlabeled data is less informative, but it can be massive and
inexpensive/free, which can lead to better performance

37
Artificial Intelligence milestones
Deep Blue
Chess-playing program (IBM)
It defeated the world champion in 1997
It used brute force computation and
clever ad-hoc algorithms
In 2014 Google acquired DeepMind,
a British AI specialized company
(US$500 million)
AlphaGo Lee
Go-playing program (Google/DeepMind)
It defeated the world champion in 2016
It used Deep Reinforcement Learning based
on human professional games and later on
games against instances of itself

38
Artificial Intelligence milestones
AlphaGo Zero
Go-playing program (Google/DeepMind)
It defeated previous versions of AlphaGo in 2017
It used Deep Reinforcement Learning entirely
based on self-playing, without any human data
and using less processing power
Silver, David, et al. "Mastering the game of Go without human knowledge." Nature 550.7676 (2017): 354-359.
Version Hardware Elo rating Matches
AlphaGo Fan 176 GPUs 3,144 5:0 against Fan Hui
AlphaGo Lee 48 TPUs 3,739 4:1 against Lee Sedol
AlphaGo Master 4 TPUs v2 4,858 60:0 against professional players
AlphaGo Zero 4 TPUs v2 5,185
100:0 against AlphaGo Lee
89:11 against AlphaGo Master

39
Conclusions
§ AI is not just the future, but it is
already the present
§ It is currently applied in a wide range
of fields with outstanding results
§ It has already equaled and even
outperformed humans in certain
applications
§ It is of big interest for both Industry
and Academy
§ It has a huge potential and there are
still many open challenges and
possible applications

40
Thank you
Introduction to the Artificial Intelligence
and Computer Vision revolution
Darian Frajberg
darian.frajberg@polimi.it

Introduction to the Artificial Intelligence and Computer Vision revolution

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Introduction to the Artificial Intelligence and Computer Vision revolution

Similar a Introduction to the Artificial Intelligence and Computer Vision revolution (20)

Último

Último (20)

Introduction to the Artificial Intelligence and Computer Vision revolution