Kelm überblick 2013

Pascal Kelm
Kelm@nue.tu-berlin.de
Communication Systems Group
www.nue.tu-berlin.de Technische Universität Berlin

Thursday, 24 January 2013

Overview 2

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Motivation – Where in the world is it? 3


Example 4

http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/

Fact: only 3% of the content in
online sharing plattforms is
available with geographic
coordinates (latitude, longitude)


State of the Art 5

How would you estimate the location of an unknown content?

Textual information Visual information
Tags: Paris, France, twilight, grand blue, Europe,
Hasselblad, film, …

Local features
Low-level features - interesting points on the
Gazetteers Textual similarity - Propagate the location object can be extracted to
- like geonames.org - Finding the similarity by finding a visual similar provide a "feature
to a group of typonyms Image description“ of the object
-Features: texture, color, - Features: SIFT, SURF
shape… etc.

• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]
• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]


Relevant Research 1 6

2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic
information from a single image. Proceedings of the IEEE Conf. On
Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)

Purely data-driven scene matching approach (over 6 million GPS-
tagged images, 5 low-level descriptors)
 Visual ambiguity
Low precision, high computational cost
 (cluster of 400 processors  3 days)


Relevant Research 2 7

2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing
Flickr Photos on a Map. In: 32nd International ACM SIGIR

Images with “palma" tag falsely mapped near
Palma de Mallorca, Spain

Textual annotated language model (ranking)
 Geographical / textual ambiguity
 High precision
 High computational cost


Research Question 8

What is the limitation of an automatic algorithm?
Which feature (text, video) performs best?
Is a fusion possible to eliminate geographical ambiguity?
Do I need a CPU-cluster to estimate the location?
Low performance  low precision?
Is it possible for a human to estimate the location of a
video using textual, visual and audio information?


Placing Task 9

The task requires participants to assign
geographical coordinates to each provided
test video. Participants can make use of
metadata and audio and visual features as
well as external resources.

Organizers:
Pascal Kelm, TU Berlin
Adam Rae, Yahoo! Research

[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-
0073) of the MediaEval 2012]


Image Distribution
Flickr Database:
3,6 million training images
10.000 trainings videos
5091 test videos
Descriptors:
1. Color and Edge Directivity Descriptor
2. Gabor
3. Fuzzy Color and Texture Histogram
4. Color Histogram
5. Scalable Color
6. Auto Color Correlogram
7. Tamura
8. Edge Histogram
9. Color Layout

Metadata:
All Inforamtion about
uploader + video


Overview Framework 11

National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location

[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical
Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]


Collaborative Systems: Example 12

這是我上次去巴黎。在那裡，我得
到了我的城堡在迪斯尼樂園看。…

這是我上次去巴黎。在那裡，我得到了我的城堡在迪斯尼樂園看。


Geographical Ambiguity 13

這是我上次去巴黎。在那裡，我得到了我的城堡在迪斯尼樂園看。…

Which language is it?
Chinese
This was my last trip to Paris. I visited the castle in Disneyland…

Which words gives us information? Tags?
Trip, Paris, Castle, Disneyland

Which of these nouns have got geographical information?
Paris, Disneyland


Geographical Ambiguity 14

Paris Disneyland
N 1

R j (c0 )
j 0
France China
c det ected arg max ...
N 1

R j (cm )
Canada USA j 0

Puerto R(ci) = Rank sum
France
Rico
ci = Countries
N = Number of toponym
… …

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map
using Millions of Flickr Photographs” ACM Multimedia 2011]


Overview Framework 15

National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location


Example 16

http://www.flickr.com/photos/62285085@N00/3484324495

Textual Region Model 17

Segmenting the world map into regions according to the
meridians and parallels
Stemming: reducing inflected words to their root form
Bounds Crossing, Florida, USA
Text Porter Stemmer
Bream Vortex Bream Vortex
Swimming Swim
Ocean Ocean
Beach Beach
Springs Vortex Springs Vortex
Scuba Diving Scuba Dive
Scuba Underwater Scuba Underwat
… …



N t ,l 1
Term-location-distribution: P (t | l )
N t ', l 1
t' V

Term frequency-inverse document frequency:
N
tfidf t
N t , l log
nt

N

P (l | d ) max log Pi ( t | l )
i 0



N t ,c 1
Bernoulli model: P (t | c )
N t ', c 1
t' V

t = Tag
C= Class / Region

Bream Vortex
Swim
Ocean
Beach
Springs Vortex
Scuba Dive
Scuba Underwat
…


Visual Region Model 20

Returns the visually most similar areas, which are
represented by a mean feature vector of all training images
and videos of the respective area


What is meant by Spatial Segmentation? 21

World map is iteratively divided into segments of
different sizes
Each segment is considered as classes for our probabil-
istic model

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”
Working Notes Proceedings of the MediaEval 2012]


Fusion: Example 22

Confidence scores of the visual approach (right)
restricted to be in the most likely spatial segment
determined by the textual approach (left)


Results 23

[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR
'12
[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and
visual analysis.


Conclusion 24

hierarchical approach for automatic estimation
of geo-tags in social media website
detailed analysis of textual and visual features
using different spatial granularities (national
borders detection)
fusion of textual and visual methods is
important to eliminate geographical ambiguities
reduces the computing time in the subsequent
classiﬁcation step
correctly located within a radius of 10 km for
half of the test set

Web demonstrator 25

http://geotagging.de.im


Geo-Location Human Baseline Project 26


Geo-Location Human Baseline Project 27

http://geotagging.de.im/game.php

• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-
Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]

•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video
Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.
1, January 2013]


Object Detection 28

Frame 370

Frame 35


Augmented Object Detection 29

OpenCV for Android
FAST
ORB
BRISK
Geo-referenced
Database

SURF
CPU: 192 ms business card

GPU: 87 ms
Android: 9990 ms

Object Detection 30

Depth Map Matching Map


Graph-based Object Detection 31

Matching


DFG Proposal 32

Housebreaking Cyber-Stealing

Cyber-Mobbing Cyber-Stalking


DFG Proposal: Geo-Privacy 33


Question 34

Thanks for your attention.

Dipl.- Ing. Pascal Kelm
Communication Systems Group
Technische Universität Berlin
Sekr. EN1, Einsteinufer 17
10587 Berlin, Germany

E-mail: Kelm@nue.tu-berlin.de
Telefon: (+49) 30 / 314 28504


DFG: Geo-Tagging 35


Spatial Segmentation 36


Twitter-based Placing Sub-Task (New York) 37


Spatial Segmentation 38


Extracted geo. items 39

kauii
hawaii

usa

00001: hawaii, kauai, usa


Textual Features + Naive Bayes 40


Visual Features 41

What will you do if you do not have any textual information?


Fusion 42

Textual Region Model
Region Region Region Region Region Region Region Region Region
1 2 3 4 5 6 8
… N
7

Visual Region Model
Region Region Region Region Region Region Region Region Region
1 2 3 4 5 6 8
… N
7

Geographical Boundaries Extraction
Ranking
Region Region Region Region Region
Pic1
Pic2
Pic3
2 3 4 5 6


Kelm überblick 2013

Recommended

Recommended

More Related Content

Similar to Kelm überblick 2013

Similar to Kelm überblick 2013 (20)

Kelm überblick 2013