The document discusses automatic geotagging of videos. It describes challenges in estimating the geographic location of videos using textual and visual information. Methods discussed include using textual tags, visual features, and gazetteers to determine location. The author also describes fusing multiple approaches and using spatial segmentation to improve accuracy while reducing computational costs.
1. Pascal Kelm
Kelm@nue.tu-berlin.de
Communication Systems Group
www.nue.tu-berlin.de Technische Universität Berlin
Thursday, 24 January 2013
2. Overview 2
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
3. Motivation – Where in the world is it? 3
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
4. Example 4
http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/
Fact: only 3% of the content in
online sharing plattforms is
available with geographic
coordinates (latitude, longitude)
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
5. State of the Art 5
How would you estimate the location of an unknown content?
Textual information Visual information
Tags: Paris, France, twilight, grand blue, Europe,
Hasselblad, film, …
Local features
Low-level features - interesting points on the
Gazetteers Textual similarity - Propagate the location object can be extracted to
- like geonames.org - Finding the similarity by finding a visual similar provide a "feature
to a group of typonyms Image description“ of the object
-Features: texture, color, - Features: SIFT, SURF
shape… etc.
• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]
• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
6. Relevant Research 1 6
2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic
information from a single image. Proceedings of the IEEE Conf. On
Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)
Purely data-driven scene matching approach (over 6 million GPS-
tagged images, 5 low-level descriptors)
Visual ambiguity
Low precision, high computational cost
(cluster of 400 processors 3 days)
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
7. Relevant Research 2 7
2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing
Flickr Photos on a Map. In: 32nd International ACM SIGIR
Images with “palma" tag falsely mapped near
Palma de Mallorca, Spain
Textual annotated language model (ranking)
Geographical / textual ambiguity
High precision
High computational cost
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
8. Research Question 8
What is the limitation of an automatic algorithm?
Which feature (text, video) performs best?
Is a fusion possible to eliminate geographical ambiguity?
Do I need a CPU-cluster to estimate the location?
Low performance low precision?
Is it possible for a human to estimate the location of a
video using textual, visual and audio information?
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
9. Placing Task 9
The task requires participants to assign
geographical coordinates to each provided
test video. Participants can make use of
metadata and audio and visual features as
well as external resources.
Organizers:
Pascal Kelm, TU Berlin
Adam Rae, Yahoo! Research
[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-
0073) of the MediaEval 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
10. Image Distribution
Flickr Database:
3,6 million training images
10.000 trainings videos
5091 test videos
Descriptors:
1. Color and Edge Directivity Descriptor
2. Gabor
3. Fuzzy Color and Texture Histogram
4. Color Histogram
5. Scalable Color
6. Auto Color Correlogram
7. Tamura
8. Edge Histogram
9. Color Layout
Metadata:
All Inforamtion about
uploader + video
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
11. Overview Framework 11
National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location
[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical
Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
12. Collaborative Systems: Example 12
這是我上次去巴黎。在那裡,我得
到了我的城堡在迪斯尼樂園看。…
這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
13. Geographical Ambiguity 13
這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…
Which language is it?
Chinese
This was my last trip to Paris. I visited the castle in Disneyland…
Which words gives us information? Tags?
Trip, Paris, Castle, Disneyland
Which of these nouns have got geographical information?
Paris, Disneyland
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
14. Geographical Ambiguity 14
Paris Disneyland
N 1
R j (c0 )
j 0
France China
c det ected arg max ...
N 1
R j (cm )
Canada USA j 0
Puerto R(ci) = Rank sum
France
Rico
ci = Countries
N = Number of toponym
… …
• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map
using Millions of Flickr Photographs” ACM Multimedia 2011]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
15. Overview Framework 15
National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
16. Example 16
http://www.flickr.com/photos/62285085@N00/3484324495
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
17. Textual Region Model 17
Segmenting the world map into regions according to the
meridians and parallels
Stemming: reducing inflected words to their root form
Bounds Crossing, Florida, USA
Text Porter Stemmer
Bream Vortex Bream Vortex
Swimming Swim
Ocean Ocean
Beach Beach
Springs Vortex Springs Vortex
Scuba Diving Scuba Dive
Scuba Underwater Scuba Underwat
… …
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
18. Textual Region Model 18
N t ,l 1
Term-location-distribution: P (t | l )
N t ', l 1
t' V
Term frequency-inverse document frequency:
N
tfidf t
N t , l log
nt
N
P (l | d ) max log Pi ( t | l )
i 0
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
19. Textual Region Model 19
N t ,c 1
Bernoulli model: P (t | c )
N t ', c 1
t' V
t = Tag
C= Class / Region
Bream Vortex
Swim
Ocean
Beach
Springs Vortex
Scuba Dive
Scuba Underwat
…
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
20. Visual Region Model 20
Returns the visually most similar areas, which are
represented by a mean feature vector of all training images
and videos of the respective area
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
21. What is meant by Spatial Segmentation? 21
World map is iteratively divided into segments of
different sizes
Each segment is considered as classes for our probabil-
istic model
• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”
Working Notes Proceedings of the MediaEval 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
22. Fusion: Example 22
Confidence scores of the visual approach (right)
restricted to be in the most likely spatial segment
determined by the textual approach (left)
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
23. Results 23
[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR
'12
[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and
visual analysis.
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
24. Conclusion 24
hierarchical approach for automatic estimation
of geo-tags in social media website
detailed analysis of textual and visual features
using different spatial granularities (national
borders detection)
fusion of textual and visual methods is
important to eliminate geographical ambiguities
reduces the computing time in the subsequent
classification step
correctly located within a radius of 10 km for
half of the test set
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
25. Web demonstrator 25
http://geotagging.de.im
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
26. Geo-Location Human Baseline Project 26
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
27. Geo-Location Human Baseline Project 27
http://geotagging.de.im/game.php
• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-
Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]
•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video
Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.
1, January 2013]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
28. Object Detection 28
Frame 370
Frame 35
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
29. Augmented Object Detection 29
OpenCV for Android
FAST
ORB
BRISK
Geo-referenced
Database
SURF
CPU: 192 ms business card
GPU: 87 ms
Android: 9990 ms
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
30. Object Detection 30
Depth Map Matching Map
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
31. Graph-based Object Detection 31
Matching
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
32. DFG Proposal 32
Housebreaking Cyber-Stealing
Cyber-Mobbing Cyber-Stalking
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
33. DFG Proposal: Geo-Privacy 33
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
34. Question 34
Thanks for your attention.
Dipl.- Ing. Pascal Kelm
Communication Systems Group
Technische Universität Berlin
Sekr. EN1, Einsteinufer 17
10587 Berlin, Germany
E-mail: Kelm@nue.tu-berlin.de
Telefon: (+49) 30 / 314 28504
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
35. DFG: Geo-Tagging 35
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
36. Spatial Segmentation 36
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
38. Spatial Segmentation 38
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
39. Extracted geo. items 39
kauii
hawaii
usa
00001: hawaii, kauai, usa
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
40. Textual Features + Naive Bayes 40
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
41. Visual Features 41
What will you do if you do not have any textual information?
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
42. Fusion 42
Textual Region Model
Region Region Region Region Region Region Region Region Region
1 2 3 4 5 6 8
… N
7
Visual Region Model
Region Region Region Region Region Region Region Region Region
1 2 3 4 5 6 8
… N
7
Geographical Boundaries Extraction
Ranking
Region Region Region Region Region
Pic1
Pic2
Pic3
2 3 4 5 6
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”