SlideShare a Scribd company logo
1 of 42
Pascal Kelm
                       Kelm@nue.tu-berlin.de
                       Communication Systems Group
www.nue.tu-berlin.de   Technische Universität Berlin

                                                       Thursday, 24 January 2013
Overview                                                                       2




     Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Motivation – Where in the world is it?                                           3




       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Example                                                                              4


http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/




             Fact: only 3% of the content in
             online sharing plattforms is
             available     with     geographic
             coordinates (latitude, longitude)




              Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
State of the Art                                                                                                  5


                   How would you estimate the location of an unknown content?




   Textual information                                                                 Visual information
   Tags: Paris, France, twilight, grand blue, Europe,
   Hasselblad, film, …


                                                                                                 Local features
                                                              Low-level features                 - interesting points on the
 Gazetteers                 Textual similarity                - Propagate the location           object can be extracted to
 - like geonames.org        - Finding the similarity           by finding a visual similar       provide a "feature
                            to a group of typonyms            Image                              description“ of the object
                                                              -Features: texture, color,         - Features: SIFT, SURF
                                                              shape…                             etc.

• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]
• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]

                       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Relevant Research 1                                                                 6



2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic
information from a single image. Proceedings of the IEEE Conf. On
Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)




  Purely data-driven scene matching approach (over 6 million GPS-
  tagged images, 5 low-level descriptors)
    Visual ambiguity
  Low precision, high computational cost
    (cluster of 400 processors  3 days)

          Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Relevant Research 2                                                                 7



2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing
Flickr Photos on a Map. In: 32nd International ACM SIGIR




                                 Images with “palma" tag falsely mapped near
                                 Palma de Mallorca, Spain


  Textual annotated language model (ranking)
    Geographical / textual ambiguity
    High precision
    High computational cost



          Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Research Question                                                                8




 What is the limitation of an automatic algorithm?
 Which feature (text, video) performs best?
 Is a fusion possible to eliminate geographical ambiguity?
 Do I need a CPU-cluster to estimate the location?
 Low performance  low precision?
 Is it possible for a human to estimate the location of a
 video using textual, visual and audio information?




       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Placing Task                                                                                               9




    The task requires participants to assign
    geographical coordinates to each provided
    test video. Participants can make use of
    metadata and audio and visual features as
    well as external resources.




      Organizers:
        Pascal Kelm, TU Berlin
        Adam Rae, Yahoo! Research

[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-
0073) of the MediaEval 2012]


                     Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Image Distribution
Flickr Database:
    3,6 million training images
    10.000 trainings videos
    5091 test videos
Descriptors:
    1.   Color and Edge Directivity Descriptor
    2.   Gabor
    3.   Fuzzy Color and Texture Histogram
    4.   Color Histogram
    5.   Scalable Color
    6.   Auto Color Correlogram
    7.   Tamura
    8.   Edge Histogram
    9.   Color Layout

Metadata:
All Inforamtion about
uploader + video

              Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Overview Framework                                                                                         11




          National borders extracted from the metadata
          Textual and visual features are used in a hierarchical
          framework to predict the most likely location

[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical
Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]


                      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Collaborative Systems: Example                                                     12




這是我上次去巴黎。在那裡,我得
到了我的城堡在迪斯尼樂園看。…


  這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。




         Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geographical Ambiguity                                                              13



         這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…


Which language is it?
         Chinese
         This was my last trip to Paris. I visited the castle in Disneyland…


Which words gives us information? Tags?
         Trip, Paris, Castle, Disneyland


Which of these nouns have got geographical information?
         Paris, Disneyland




          Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geographical Ambiguity                                                                                       14




                Paris              Disneyland
                                                                                                N 1

                                                                                                       R j (c0 )
                                                                                                 j 0
                France                   China
                                                                c det ected    arg max                  ...
                                                                                                N 1

                                                                                                       R j (cm )
               Canada                     USA                                                   j 0



                Puerto                                             R(ci) = Rank sum
                                        France
                 Rico
                                                                   ci = Countries
                                                                   N = Number of toponym
                    …                       …


• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map
using Millions of Flickr Photographs” ACM Multimedia 2011]


                      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Overview Framework                                                               15




 National borders extracted from the metadata
 Textual and visual features are used in a hierarchical
 framework to predict the most likely location



       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Example                                                                             16




http://www.flickr.com/photos/62285085@N00/3484324495
             Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model                                                                17




      Segmenting the world map into regions according to the
      meridians and parallels
      Stemming: reducing inflected words to their root form
                                                    Bounds Crossing, Florida, USA
     Text                 Porter Stemmer
Bream Vortex            Bream Vortex
Swimming                Swim
Ocean                   Ocean
Beach                   Beach
Springs Vortex          Springs Vortex
Scuba Diving            Scuba Dive
Scuba Underwater        Scuba Underwat
…                       …




             Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model                                                                              18


                                                                           N t ,l         1
Term-location-distribution:                         P (t | l )
                                                                               N t ', l       1
                                                                    t' V

Term frequency-inverse document frequency:
                                                                                          N
                                                        tfidf   t
                                                                    N t , l log
                                                                                          nt


                                                                           N

                                                  P (l | d )     max             log Pi ( t | l )
                                                                           i 0




         Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model                                                                  19



                                                                  N t ,c       1
Bernoulli model:                             P (t | c )
                                                                    N t ', c       1
                                                           t' V

t = Tag
C= Class / Region

    Bream Vortex
    Swim
    Ocean
    Beach
    Springs Vortex
    Scuba Dive
    Scuba Underwat
    …




         Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Visual Region Model                                                               20




  Returns the visually most similar areas, which are
represented by a mean feature vector of all training images
and videos of the respective area




         Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
What is meant by Spatial Segmentation?                                                                   21




          World map is iteratively divided into segments of
          different sizes
          Each segment is considered as classes for our probabil-
          istic model




• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”
Working Notes Proceedings of the MediaEval 2012]

                     Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Fusion: Example                                                                  22




 Confidence scores of the visual approach (right)
 restricted to be in the most likely spatial segment
 determined by the textual approach (left)


       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Results                                                                                                              23




[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR
'12
[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and
visual analysis.

                        Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Conclusion                                                                      24




 hierarchical approach for automatic estimation
 of geo-tags in social media website
 detailed analysis of textual and visual features
 using different spatial granularities (national
 borders detection)
 fusion of textual and visual methods is
 important to eliminate geographical ambiguities
 reduces the computing time in the subsequent
 classification step
 correctly located within a radius of 10 km for
 half of the test set
      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Web demonstrator                                                                25




                       http://geotagging.de.im


      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geo-Location Human Baseline Project                                             26




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geo-Location Human Baseline Project                                                                             27


                                    http://geotagging.de.im/game.php




• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-
Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]

•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video
Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.
1, January 2013]


                       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Object Detection                                                                28




                 Frame 370




                                                           Frame 35




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Augmented Object Detection                                                       29



OpenCV for Android
  FAST
  ORB
  BRISK
                                                                  Geo-referenced
                                                                  Database




SURF
CPU: 192 ms                  business card

GPU: 87 ms
Android: 9990 ms
        Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Object Detection                                                                 30




    Depth Map                                Matching Map

       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Graph-based Object Detection                                                     31




Matching




       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG Proposal                                                                     32




   Housebreaking                                  Cyber-Stealing




   Cyber-Mobbing                                   Cyber-Stalking

       Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG Proposal: Geo-Privacy                                                       33




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Question                                                                           34




Thanks for your attention.

Dipl.- Ing. Pascal Kelm
   Communication Systems Group
   Technische Universität Berlin
   Sekr. EN1, Einsteinufer 17
   10587 Berlin, Germany

  E-mail: Kelm@nue.tu-berlin.de
  Telefon: (+49) 30 / 314 28504




         Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG: Geo-Tagging                                                               35




     Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Spatial Segmentation                                                            36




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Twitter-based Placing Sub-Task (New York)                                       37




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Spatial Segmentation                                                            38




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Extracted geo. items                                                              39




       kauii
                         hawaii

                                                usa




00001: hawaii, kauai, usa


        Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Features + Naive Bayes                                                  40




      Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Visual Features                                                                   41




What will you do if you do not have any textual information?




        Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Fusion                                                                                   42




Textual Region Model
    Region   Region      Region   Region   Region    Region   Region   Region       Region
      1        2           3        4        5         6                 8
                                                                                …     N
                                                                7



Visual Region Model
    Region   Region      Region   Region   Region    Region   Region   Region       Region
      1        2           3        4        5         6                 8
                                                                                …     N
                                                                7


             Geographical Boundaries Extraction
Ranking
             Region      Region   Region   Region    Region
              Pic1
              Pic2
              Pic3
                2          3        4        5         6




                   Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

More Related Content

Similar to Kelm überblick 2013

GeoAlberta keynote
GeoAlberta keynoteGeoAlberta keynote
GeoAlberta keynotePeter Batty
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthManuel Nieves Sáez
 
DRCOG: The Geospatial Revolution Peter Batty
DRCOG: The Geospatial Revolution Peter BattyDRCOG: The Geospatial Revolution Peter Batty
DRCOG: The Geospatial Revolution Peter BattyPeter Batty
 
NCGIC The Geospatial Revolution
NCGIC The Geospatial RevolutionNCGIC The Geospatial Revolution
NCGIC The Geospatial RevolutionPeter Batty
 
What multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveWhat multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveUniversity of Amsterdam
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDong Heon Cho
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringAdvanced-Concepts-Team
 
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...Juxi Leitner
 
Minnesota GIS/LIS The Geospatial Revolution Peter Batty
Minnesota GIS/LIS The Geospatial Revolution Peter BattyMinnesota GIS/LIS The Geospatial Revolution Peter Batty
Minnesota GIS/LIS The Geospatial Revolution Peter BattyPeter Batty
 
Presentation notes: Gartner; Towards Ubiquitous Cartography
Presentation notes: Gartner; Towards Ubiquitous CartographyPresentation notes: Gartner; Towards Ubiquitous Cartography
Presentation notes: Gartner; Towards Ubiquitous Cartographyalexanno
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...Paris Open Source Summit
 
Openstreetmap Opendata
Openstreetmap OpendataOpenstreetmap Opendata
Openstreetmap Opendatadirkmunson
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
AI Tech. session, Synaplexus Presentation
AI Tech. session, Synaplexus Presentation AI Tech. session, Synaplexus Presentation
AI Tech. session, Synaplexus Presentation EITESAL NGO
 

Similar to Kelm überblick 2013 (20)

GeoAlberta keynote
GeoAlberta keynoteGeoAlberta keynote
GeoAlberta keynote
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 
Sccg Many Projects Layout03
Sccg Many Projects Layout03Sccg Many Projects Layout03
Sccg Many Projects Layout03
 
DRCOG: The Geospatial Revolution Peter Batty
DRCOG: The Geospatial Revolution Peter BattyDRCOG: The Geospatial Revolution Peter Batty
DRCOG: The Geospatial Revolution Peter Batty
 
NCGIC The Geospatial Revolution
NCGIC The Geospatial RevolutionNCGIC The Geospatial Revolution
NCGIC The Geospatial Revolution
 
What multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveWhat multimodal foundation models cannot perceive
What multimodal foundation models cannot perceive
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Gps Technology
Gps TechnologyGps Technology
Gps Technology
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
AR
ARAR
AR
 
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
 
Minnesota GIS/LIS The Geospatial Revolution Peter Batty
Minnesota GIS/LIS The Geospatial Revolution Peter BattyMinnesota GIS/LIS The Geospatial Revolution Peter Batty
Minnesota GIS/LIS The Geospatial Revolution Peter Batty
 
Presentation notes: Gartner; Towards Ubiquitous Cartography
Presentation notes: Gartner; Towards Ubiquitous CartographyPresentation notes: Gartner; Towards Ubiquitous Cartography
Presentation notes: Gartner; Towards Ubiquitous Cartography
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
 
Openstreetmap Opendata
Openstreetmap OpendataOpenstreetmap Opendata
Openstreetmap Opendata
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
Raskar Computational Camera Fall 2009 Lecture 01
Raskar Computational Camera Fall 2009 Lecture 01Raskar Computational Camera Fall 2009 Lecture 01
Raskar Computational Camera Fall 2009 Lecture 01
 
AI Tech. session, Synaplexus Presentation
AI Tech. session, Synaplexus Presentation AI Tech. session, Synaplexus Presentation
AI Tech. session, Synaplexus Presentation
 

Kelm überblick 2013

  • 1. Pascal Kelm Kelm@nue.tu-berlin.de Communication Systems Group www.nue.tu-berlin.de Technische Universität Berlin Thursday, 24 January 2013
  • 2. Overview 2 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 3. Motivation – Where in the world is it? 3 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 4. Example 4 http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/ Fact: only 3% of the content in online sharing plattforms is available with geographic coordinates (latitude, longitude) Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 5. State of the Art 5 How would you estimate the location of an unknown content? Textual information Visual information Tags: Paris, France, twilight, grand blue, Europe, Hasselblad, film, … Local features Low-level features - interesting points on the Gazetteers Textual similarity - Propagate the location object can be extracted to - like geonames.org - Finding the similarity by finding a visual similar provide a "feature to a group of typonyms Image description“ of the object -Features: texture, color, - Features: SIFT, SURF shape… etc. • [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012] • [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 6. Relevant Research 1 6 2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic information from a single image. Proceedings of the IEEE Conf. On Computer Vision and Pattern Recognition (CVPR, „Where am I ?“) Purely data-driven scene matching approach (over 6 million GPS- tagged images, 5 low-level descriptors)  Visual ambiguity Low precision, high computational cost  (cluster of 400 processors  3 days) Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 7. Relevant Research 2 7 2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing Flickr Photos on a Map. In: 32nd International ACM SIGIR Images with “palma" tag falsely mapped near Palma de Mallorca, Spain Textual annotated language model (ranking)  Geographical / textual ambiguity  High precision  High computational cost Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 8. Research Question 8 What is the limitation of an automatic algorithm? Which feature (text, video) performs best? Is a fusion possible to eliminate geographical ambiguity? Do I need a CPU-cluster to estimate the location? Low performance  low precision? Is it possible for a human to estimate the location of a video using textual, visual and audio information? Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 9. Placing Task 9 The task requires participants to assign geographical coordinates to each provided test video. Participants can make use of metadata and audio and visual features as well as external resources. Organizers: Pascal Kelm, TU Berlin Adam Rae, Yahoo! Research [Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613- 0073) of the MediaEval 2012] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 10. Image Distribution Flickr Database: 3,6 million training images 10.000 trainings videos 5091 test videos Descriptors: 1. Color and Edge Directivity Descriptor 2. Gabor 3. Fuzzy Color and Texture Histogram 4. Color Histogram 5. Scalable Color 6. Auto Color Correlogram 7. Tamura 8. Edge Histogram 9. Color Layout Metadata: All Inforamtion about uploader + video Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 11. Overview Framework 11 National borders extracted from the metadata Textual and visual features are used in a hierarchical framework to predict the most likely location [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 12. Collaborative Systems: Example 12 這是我上次去巴黎。在那裡,我得 到了我的城堡在迪斯尼樂園看。… 這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 13. Geographical Ambiguity 13 這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。… Which language is it? Chinese This was my last trip to Paris. I visited the castle in Disneyland… Which words gives us information? Tags? Trip, Paris, Castle, Disneyland Which of these nouns have got geographical information? Paris, Disneyland Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 14. Geographical Ambiguity 14 Paris Disneyland N 1 R j (c0 ) j 0 France China c det ected arg max ... N 1 R j (cm ) Canada USA j 0 Puerto R(ci) = Rank sum France Rico ci = Countries N = Number of toponym … … • [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs” ACM Multimedia 2011] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 15. Overview Framework 15 National borders extracted from the metadata Textual and visual features are used in a hierarchical framework to predict the most likely location Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 16. Example 16 http://www.flickr.com/photos/62285085@N00/3484324495 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 17. Textual Region Model 17 Segmenting the world map into regions according to the meridians and parallels Stemming: reducing inflected words to their root form Bounds Crossing, Florida, USA Text Porter Stemmer Bream Vortex Bream Vortex Swimming Swim Ocean Ocean Beach Beach Springs Vortex Springs Vortex Scuba Diving Scuba Dive Scuba Underwater Scuba Underwat … … Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 18. Textual Region Model 18 N t ,l 1 Term-location-distribution: P (t | l ) N t ', l 1 t' V Term frequency-inverse document frequency: N tfidf t N t , l log nt N P (l | d ) max log Pi ( t | l ) i 0 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 19. Textual Region Model 19 N t ,c 1 Bernoulli model: P (t | c ) N t ', c 1 t' V t = Tag C= Class / Region Bream Vortex Swim Ocean Beach Springs Vortex Scuba Dive Scuba Underwat … Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 20. Visual Region Model 20 Returns the visually most similar areas, which are represented by a mean feature vector of all training images and videos of the respective area Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 21. What is meant by Spatial Segmentation? 21 World map is iteratively divided into segments of different sizes Each segment is considered as classes for our probabil- istic model • [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging” Working Notes Proceedings of the MediaEval 2012] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 22. Fusion: Example 22 Confidence scores of the visual approach (right) restricted to be in the most likely spatial segment determined by the textual approach (left) Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 23. Results 23 [UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR '12 [QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and visual analysis. Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 24. Conclusion 24 hierarchical approach for automatic estimation of geo-tags in social media website detailed analysis of textual and visual features using different spatial granularities (national borders detection) fusion of textual and visual methods is important to eliminate geographical ambiguities reduces the computing time in the subsequent classification step correctly located within a radius of 10 km for half of the test set Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 25. Web demonstrator 25 http://geotagging.de.im Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 26. Geo-Location Human Baseline Project 26 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 27. Geo-Location Human Baseline Project 27 http://geotagging.de.im/game.php • [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo- Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012] •[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No. 1, January 2013] Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 28. Object Detection 28 Frame 370 Frame 35 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 29. Augmented Object Detection 29 OpenCV for Android FAST ORB BRISK Geo-referenced Database SURF CPU: 192 ms business card GPU: 87 ms Android: 9990 ms Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 30. Object Detection 30 Depth Map Matching Map Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 31. Graph-based Object Detection 31 Matching Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 32. DFG Proposal 32 Housebreaking Cyber-Stealing Cyber-Mobbing Cyber-Stalking Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 33. DFG Proposal: Geo-Privacy 33 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 34. Question 34 Thanks for your attention. Dipl.- Ing. Pascal Kelm Communication Systems Group Technische Universität Berlin Sekr. EN1, Einsteinufer 17 10587 Berlin, Germany E-mail: Kelm@nue.tu-berlin.de Telefon: (+49) 30 / 314 28504 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 35. DFG: Geo-Tagging 35 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 36. Spatial Segmentation 36 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 37. Twitter-based Placing Sub-Task (New York) 37 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 38. Spatial Segmentation 38 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 39. Extracted geo. items 39 kauii hawaii usa 00001: hawaii, kauai, usa Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 40. Textual Features + Naive Bayes 40 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 41. Visual Features 41 What will you do if you do not have any textual information? Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
  • 42. Fusion 42 Textual Region Model Region Region Region Region Region Region Region Region Region 1 2 3 4 5 6 8 … N 7 Visual Region Model Region Region Region Region Region Region Region Region Region 1 2 3 4 5 6 8 … N 7 Geographical Boundaries Extraction Ranking Region Region Region Region Region Pic1 Pic2 Pic3 2 3 4 5 6 Kelm: “Where in the World?: The State of Automatic Geotagging of Video”