Overblik over kunstig intelligens og digital billedanalyse

30/01/20 1
Kim Steenstrup Pedersen
Associate Professor, Ph.D.
Head of IMAGE Section
Dept. of Computer Science
The pros and cons
of modern AI &
computer vision

What is modern Artificial Intelligence (AI)?
30/01/20 2
Machine Learning
Bioinformatics
Information
Retrieval
Medical Image
Analysis
Audio /
Speech
Analysis
And more …
Computer
Vision
Natural
Language
Processing
I am here
Robotics
Oxford Dict. Def.: The theory and development of computer systems able to
perform tasks normally requiring human intelligence, such as visual perception,
speech recognition, decision-making, and translation between languages.
There is no clear definition (that all agrees upon) of what AI is!

What is machine learning?
30/01/20 3
Data is represented as points x in some high dimensional space.
Learn = Estimate a function (model) that groups / separate items in
this space.
y(x;w) ! Apple,Pear{ } A classification
problem with two
classes

•  We need data in order to learn the function /
parameters.
•  We call this training data and we also talk about
validation and test data.
•  Approaches to machine learning:
•  Supervised learning: We have training data examples of both
the input and wanted output
•  Unsupervised learning: We only have training data examples
of the input
•  And combinations thereof: Semi-supervised, active learning,
reinforcement learning, …
30/01/20 4
x1= , y1=Apple x2= , y2=Pear x3= , y3=Pear
x1= , y1=? x2= , y2=? x3= , y3=?

30/01/20 5
The goal is to make as few mistakes on the training set as possible
and to make good predictions on new unseen data (test data)
y(x;w) ! Apple,Pear{ }
?

What is Computer Vision?
My definition: The design of algorithms for interpreting visual data by
mimicking (and surpassing) the human visual perceptual system.

Computer Vision
30/01/20 7
Railroad asset mapping: Detection of
signals, signs, cabinets etc.
Articulated tracking of humans
Some research highlights:
Digital natural history: Specimen
detection, species recognition,
automatic label reading

Object recognition and detection
What is in this image?
“Car”
“Window”
“Lamp post”
Where?“a car” or “street scene”
Recognition: Detection:

How do we represent an image in the
computer?
R:101
G: 24
B: 18
R:100
G: 23
B: 17
R:103
G: 24
B: 17
R:102
G: 23
B: 16
R:104
G: 24
B: 17
R:104
G: 24
B: 17
R:108
G: 25
B: 17
R: 98
G: 23
B: 17
R:101
G: 24
B: 18
R:100
G: 23
B: 17
R:103
G: 24
B: 17
R:102
G: 23
B: 16
R:103
G: 23
B: 16
R:104
G: 24
B: 17
R:107
G: 24
B: 16
R: 98
G: 23
B: 17
R:102
G: 25
B: 19
R:101
G: 24
B: 18
R:102
G: 23
B: 16
R:103
G: 24
B: 17
R:103
G: 23
B: 16
R:104
G: 24
B: 17
R:109
G: 24
B: 17
:100
: 23
: 15
:102
: 25
: 17
:101
: 24
: 16
:102
: 23
: 16
:104
: 25
: 18
:103
: 23
: 14
:104
: 24
: 15
:108
: 23
: 16
R:104
G: 24
B: 17
R: 98
G: 25
B: 19
R:100
G: 22
B: 18
R:101
G: 23
B: 19
R:104
G: 23
B: 19
R:103
G: 22
B: 18
R:106
G: 23
B: 17
R:105
G: 22
B: 16
R:107
G: 24
B: 18
R:102
G: 25
B: 19
R:101
G: 24
B: 18
R:101
G: 24
B: 18
R:104
G: 25
B: 18
R:101
G: 22
B: 15
R:105
G: 25
B: 18
R:103
G: 23
B: 16
R:106
G: 23
B: 17
R:101
G: 24
B: 18
R: 97
G: 26
B: 20
R: 98
G: 25
B: 18
R: 99
G: 26
B: 19
R:102
G: 25
B: 19
R:102
G: 25
B: 19
R:102
G: 25
B: 17
R:101
G: 24
B: 16
R:104
G: 25
B: 18
R: 98
G: 25
B: 19
R:100
G: 25
B: 19
R:100
G: 25
B: 19
R:102
G: 25
B: 19
R:103
G: 26
B: 20
R:102
G: 23
B: 16
R:103
G: 24
B: 17
R:103
G: 25
B: 15
R: 97
G: 24
B: 18
R:100
G: 25
B: 19
R: 99
G: 24
B: 18
R:102
G: 23
B: 18
R:104
G: 25
B: 20
R:102
G: 23
B: 16
R:103
G: 24
B: 17
R:103
G: 23
B: 16
R: 98
G: 25
B: 19
R: 99
G: 24
B: 19
R: 99
G: 24
B: 19
R:102
G: 23
B: 18
R:103
G: 24
B: 19
R:103
G: 23
B: 16
R:104
G: 24
B: 17
R: 98
G: 27
B: 21
R: 98
G: 25
B: 18
R:100
G: 27
B: 20
R:103
G: 26
B: 20
R:102
G: 25
B: 19
R:104
G: 25
B: 18
R:102
G: 23
B: 16
R:104
G: 25
B: 18
•  Each pixel represents a
measured quantity such
as light
•  A pixel may contain a
scalar number (e.g. for
gray scale images),
triplets of color values
(e.g. RGB), or in general
vectors, matrices and
tensors.
•  Just a bunch of numbers!

Marr’s layers of abstraction (1982)
Primal sketch
Layers of Abstractions
3D Model
2 1/2D sketchDepth
Orientation
Occlusions
Features
Segments
Information reduction by abstraction is
important in computer vision
But what abstractions should we use?
Construct by design or learn from data?
In CV we don’t use the pixels directly,
we use local descriptors often called
features

11
Slide from Anders L. Larsen

Use Artificial Neural Networks to model
Output is class probabilities
y(x;w)

Commercial solutions can do a lot, but …
(e.g. Google Vision API, Amazon Rekognition)
28/01/20 13
Lerbæk herregård
Dansk bondegård
Output from Google Vision API

Bias in public training data: Mainly data from
North America and limited object categories
28/01/20 14
You need huge training datasets
in order to train deep learning
models.

30/01/20 15
Vision is full of hard
problems!
The human visual system is
also challenged

Visual ambiguities:
Multiple correct hypotheses about the scene
30/01/20 16

Humans have a bias when interpreting
scenes – here we have orientation bias
30/01/20 17

Humans have a bias when interpreting
scenes – here we have orientation bias
30/01/20 18
We prefer interpretations we have used before!
And the same holds for a computer vision system!
Choice of training data is important!

Vondrick et al 2013
Do you see the same objects as the computer?
What’s the problem? The representation is not strong enough!

Fine grained classification:
Examples of inter-class proximity
Flowers dataset, Oxford Visual Geometry Group

Fine grained classification:
Examples of large intra-class variation
Caltech-UCSD Birds 200 dataset – North American birds

Photos & film archives: Potential problems
•  Automated meta-data generation:
•  How do we ensure meaningful labeling / meta-data
generation?
•  Potential problems with bias – both geographically (cultural
bias) and temporally (objects change appearance over time).
•  Train on your own data: Where do we get training data labels
from? Manual labor? Crowdsourcing? Other idea?
•  Reading and interpreting text in photos or film might help. This
is a natural language processing problem.
•  Geo-location:
•  Where was this photo taken? Some solutions exist, but does it
work on your data?
•  Search for similar images in archive:
•  There are commercial solutions for this.
28/01/20 22

Summary
•  AI & Computer vision algorithms are getting more and
more advanced and we can solve many hard
problems.
•  However, just as with humans, computer vision
systems can be fooled and do fail.
•  A computer vision system is based on a set of
assumptions – they may be wrong or broken!
•  Learning-based computer vision systems can only be
as good as the training data we use when building the
system. Garbage in = garbage out.
•  Maybe you should join forces and use your combined
archives for improved training and quality of outcome.
28/01/20 23

Literature
•  LeCun, Bengio, Hinton: Deep learning. Nature review,
Vol. 521: 436 – 444, 2015.
•  Ponti et al.: Everything you wanted to know about
Deep Learning for Computer Vision but were afraid to
ask. Online, 2017.
•  Goodfellow et al.: Deep Learning. MIT Press, 2016.
http://www.deeplearningbook.org
25

Overblik over kunstig intelligens og digital billedanalyse

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Overblik over kunstig intelligens og digital billedanalyse

Similar a Overblik over kunstig intelligens og digital billedanalyse (20)

Más de LFF - Landsforeningen til bevaring af foto og film

Más de LFF - Landsforeningen til bevaring af foto og film (20)

Último

Último (20)

Overblik over kunstig intelligens og digital billedanalyse