SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Geographic Information Retrieval
An Overview
Problem Statement:
Introduction 

● Geographic Information Retrieval can be seen as a specialized branch of traditional
  Information Retrieval.

● Information that has relationships to geographic space is called georeferenced
  information and frequently used term in Georeferenced Information Retrieval.

● Georeferenced information is used in all kinds of media, Eg :- Structured data like
  maps, land surveys, airborne and satellite images and tabulated observations.

● Can also be used by researchers looking for certain area, or requiring particular area
  inhabited by certain animals or is affected by an epidemic.
Properties of Georeferenced Information:

● Information available in digital libraries and on the Internet is georeferenced,
  although mostly it is not denoted in terms of geographic coordinates.

● The geographical location and extension of a place name is often called geographic
  footprint and it is given by coordinates ( longitude, latitude ).

● Geographic Information Retrieval requires that place names and phrases that include
  direct or indirect references to place names be resolved and translated into footprints
  that can be indexed.
General Problems in GIR:
Ambiguity/Lack of precision in Place Names:

  ● Firstly, several places can share the same name, making the place names unique
    only within a limited geographic area.

  ● Secondly, some place names occurring in texts are temporal or cultural conventions
    rather than official names, requiring the user to have an understanding of the time,
    context or cultural environment the place names are used in to be able to link it to
    some geographic location.

  ● Thirdly, some place names change over time. eg. Banglore to Bengaluru, Calcutta to
    Kolkata etc..

  ● Fourthly, the geographic extension that the place name denotes can be extended,
    reduced or changed over time.
General Problems in GIR: (contd.)
    ○ Fifthly, the borders of a location can be fuzzy. (Kashmir?)

    ○ The same place name can be written differently in different text, either because the
      author has misspelled the name or because there are different legal spellings of the
      same place name.


Information being fuzzy :

    ○ About 200 kilometers south of the capital of Russia” . Direction may vary,
      distance may vary. In case of South Africa there are 3 capitals which may lead to
      ambiguity.

    ○ Often, people are imprecise in giving geographic direction, using one of the four
      general directions north, south, east or west, when the actual direction might be
      somewhere in between.
Impact of cognitive model on Geographic
         Information Retrieval
● Human understanding of the geographic loaction: Procedural and Survey based.

● Survey: Involves looking at maps and geographic location finding.

● Procedural: Involves exploring and navigating through the place so as to get the 'feel'
  of it.

● Using procedural method to locate or gain information is particularly difficult as it
  contains many phrases involving human ambiguity.
Cognitive model (continued)

● 'People link geographic distance with time.': People when talking about going from say
  'a' to 'b' have a tendency of using time as a method of asserting distance.eg: It takes
  two hours to reach from 'A' to 'B' by car.


● 'Topology and metric distances': People are very good at mentioning topological
  aspects pertaining to a place. Like inclusion (eg: names of the topologies in an area.)
  or coincidences (eg: this place is at the same place as..)
● 'People have biases towards east-west or north-south direction': People have a very
  biased view of the geographical area. And while giving specifics in direction, they seem
  to have a vague sense of direction. eg: When asked where is south america w.r.t to
  north america. The answer generally is south. While the really it is in the south-east.
Geo referencing using the Gazetteers 
Gazetteers: A form of index that relates place names to co-ordinates of locations and
extents.

Here we are going to focus on automatic geo-referencing based on the contents of the
documents text alone

In an automated approach most projects have based their approaches to georeferencing
on a combination of place name identification and natural language processing to identify
phrases that modifies the location pointed to by occurrences of place names (“200 km
south of the Moskow”) or that provides georeferences that indicates a geo-reference
without actually mentioning a specific place name (“Rosenborgs homefield”).
Geo- referencing (continued)

Gazetteers have three basic components:




The name is the textual designator of a geographic location, the location is the coordinates
of a point, line or area on the earth’s surface pointed to by a name, and the feature type is
the type of location that a name points to
(Forrest, agricultural area, river, inhabited location etc).The location that a place name
refers to (the place names footprint) can be given as a point, a bounding box or a polygon,
all represented by coordinates.
Geo-referencing (continued)
Centroid point:




Vague in terms of geometry and size of the area.
Little data storage.
Geo- referencing (continued)
Bounding Box:




Gives a better idea of the entire referenced area.
Does not require a lot of data storage.
However it overlaps other areas around it and is inaccurate.
Geo-referencing (continued)
Approximated Polygon approach:




Most accurate in terms of referencing.
However takes a lot of data storage space.

The best approach would be to have something in the middle of the polygon and bounded
box approach like a fixed points polygon approach.
Searching for Georeferenced Information

Letting the user specify one or more place names in as keywords in a traditional keyword
based query. When parsing the query, the GIR/IR treats the found place names as special
keywords by the GIR/IR system, indicating the geographical scope of the information need
of the user.
e.g: Googling for Restaurants around you?




Letting users specify the geographic constraint to a query by drawing on one or more
maps.
e.g: Google Maps




and what about GPS Apps like "Here and Now", "Google Latitude"?
Searching for Georeferenced Information
Typical Queries:

     ○ Point in Polygon - asking for georeferenced information that contains,
       surrounds or refers to a particular geographic point location

     ○ Region Queries - asking for anything contained in, adjacent to, or overlaps
       the region.

     ○ Distance and Buffer Zone Queries - asking for information within some fixed
       distance of a geographic object (point, line, polygon)

     ○ Path Queries - asking for the presence of a network structure that can be
       queried for network traversal information

     ○ Multimedia Queries - combining multiple geo-referenced information sources
       in resolving a query.
Related Projects:
SPIRIT:(Spatially-aware information retrieval on the internet) - funded by the EC
Fifth Framework Programme. To improve the search capabilities on the internet by using
geographical and conceptual ontologies to model both vocabulary and the spatial structure
of places for purposes of IR.This ontology, which is envisioned as an extension to traditional
gazetteers and related locations as well as help ranging hits based on geographic
properties.



∙ ontologies that model geographical terminology;
∙ query expansion and relevance ranking procedures based on
the geographical ontologies;
∙ machine learning techniques for the extraction of
geographical context from web documents and for generating
metadata providing spatial context;
∙ a multi-modal user interface providing textual input and
interactive map feedback of the context of retrieved
documents;
∙ spatial indices for web collections
Geo-Ontologies

Ontologies relating Geographical Terminology and Spatial Relationships

  ● Reference to a geographic place: <PL-Name,PL-Type,{(x,y)}>
     ○ eg: <Charminar, Monument,{(x,y)}>
  ● Relative Place Reference : <Spatial Relationship,PL-Name, Type,PL-FP>
     ○ eg: <In, Hyderabad, City, {(x,y)}>

A Query to SPIRIT will contain one or more references to a PL-REF

Geographic content is a set of <Place reference> expressions and the Geometric Footprint
is a function of this set.

Basically Geo Ontologies can be applied in :
1) User's query interpretation: (+ domain specific ontologies) for disambiguation of place
name
2) System query formulation: to generate alternate names and spatially associated names
3) Metadata extraction: to extract info from free text documents to generate foot print(s)
4) Relevance Ranking: potential for geographical relevance ranking (Dominos Pizza? :) )
Geo-Ontologies
         Ontology"formal, explicit specification of a shared conceptualisation"
Geo-Ontologies

 ● Types of Atomic Queries:
     ○ A place name
     ○ An aspatial entity with relation to a place name
     ○ An aspatial entity with a spatial relation to a place name
     ○ An aspatial entity with a spatial relation to a place name
     ○ A place name with spatial relation to a place name
     ○ A place type with spatial relation to a place name
     ○ A place type with spatial relation to a place type

 ● Geo Ontology = Geographic Feature Ontology + Geographic Type Ontology + Spatial
   Relation Ontology
User evaluation of the spirit prototype gave consistent results with SPIRIT priorities on
innovative features. Yet, users explain a feeling of frustration which highlights that their
requirements are beyond SPIRIT achievements and that there is still more work to be
done in this area.




The last publication on the website dates back to 2005.
                        Relevance 

In Information Retrieval, relevance denotes how well a retrieved document or set of
documents meets the information need of the user.

Geographic Information Retrieval is concerned with retrieving documents in response to a
spatially related query. Thus, the ranking of documents by both textual and spatial
relevance have to be considered.

The most common way to return a set of documents obtained from a Web query is by
a ranked list. The search engine attempts to determine which document seems to be the
most relevant to the user and will put it first in the list. In short, every document receives
a score, or distance to the query, and the returned documents are sorted by this score or
distance.

There are situations where the sorting by score may not be the most useful one. When
a more complex query is done, composed of more than one query term or aspect,
documents can also be returned with two or more scores instead of one.
For example, the Web search could be for campings in the neighborhood of
Neuschwanstein, and the documents returned ideally have a score for the query
term “camping” and a score for the proximity to Neuschwanstein. This implies that a Web
document resulting from this query can be mapped to a point in the 2-dimensional plane,
where both axes represent a score. The map indicates campings near the castle
Neuschwanstein, which is situated close to Schwangau, with the distance to the castle
on the x-axis and the rating on the y-axis.
             

Another weakness of our methods lies in the way we treat multiple-footprint documents.
While we assume that a query can have only one footprint (a user is interested in only one
location), documents may have multiple footprints (refer to more than one location).

 The method we followed so far in order to calculate the spatial score considers only the
best-matching document footprint. For example, if a user is looking for “airports near
London”, a document that refers to both “Gatwick” and “Stansted” is scored as referring
only to “Gatwick” since it’s the nearest airport of the two. Such a document, however,
should be scored higher than another that refers only to “Gatwick” since it provides more
relevant information. Another thing is , the number of footprints occurring: Gatwick’s
official web-pages should be more important than a web-list of all airports in UK.
              

For high-quality ranking two things are required. Firstly, we need a good spatial score
between query and document footprints. Secondly, we need a good combination of the
spatial and textual (BM25) scores.

For finding spatial scores, the spatial relationships (distance, containment, and direction)
were converted into numeric values that indicate how close, how much inside, or how
much North-of the relationship between two objects is. Those numeric values were first
attempts at obtaining a score to quantify spatial relationships.

However, certain issues do come up in this method. For example, let us assume three
cities, A, B, and C, where A lies in equal distance (in a Euclidean sense) from B and C. If
C is bigger than B, then the score of B being close to A should be lower than that of C
being close to C. In other words, the distance scores of cities around A may depend on the
context, i.e. which other cities are around A. Also, natural barriers can influence the
concept of proximity. It matters a lot whether a distance of 10 km (as the crow flies) can be
covered by a direct road, or requires a large detour around a mountain range (or a small
road over a mountain pass)
             

In traditional information retrieval, the separate scores of each document would be
combined into a single score (e.g., by a weighted sum or product) which produces the
ranked list by sorting.

Now, we are going to incorporate two pieces of information into the way that a spatial
document score is calculated:
• The number n of unique footprints in a document.
• The frequencies f_1,…, f_n, of occurrence of the footprints in the document.
Moreover, the total spatial score of a document will be derived from fractional score
contributions of all occurring document footprints.
              

 A simple way of taking into account all document footprints is to define the total spatial
score as a linear combination (e.g. the simple average) of the individual scores of the
footprints:

S = 1/n * (s_1+…+s_n)

where s_i is the score of the ith document footprint in respect to the query
footprint. Incorporating also the frequencies of occurrence f_i, let us define the weight of
a footprint:

tf_i = 1 + log (f_i).
A footprint that occurs in the document only once will get a weight of one, where any extra
occurrences will increase the weight in a log fashion. The total score may be calculated as

S = 1/(tf_1+…+tf_n) * (tf_1*s_1+…+tf_n*s_n),

that is the weighted average of the individual scores.
              

Considering again the example about “airports near London”, such a scoring function like
the last one would score higher
Gatwick’s official web-page than a web-list of all UK airports. Moreover, it takes into
account more than the best-matched document footprint. The last formula may serve as a
starting point for improving the spatial scoring function.
Evaluation:

2 Indicators:
1) Recall = No. of Relevant Docs returned / Total No. of rel. Docs
2) Precission = No. of relevant Docs returned / Total No. of Indexed Docs

Trec has been evaluated using the ISO 9241 standard: based on Effectiveness (can users
find relevant docs?) , Efficiency (resourcs consumed per result) and Satisfaction (User
feedback)
Gazetteer Server and Service for UK
            Academia - James Reid
Gazetteer :- Geographical dictionary or directory. Serves as reference for information about
places.

  ● Geographic searching is powerful information retrieval tool, because the results
    obtained hereafter are more specific.
  ● Geographic searching is restricted because Geographic metadata creation is very
    resource intensive and the resources having geographic metadata exists only to
    names.
  ● There is no particular mentioning of the geographic footprint i.e. directly. There might
    be direct or indirect reference to the place.

    Constant change in Geographic metadata:-
  ● Names of places may vary.
  ● Names may have changed from time to time.
  ● Boundaries can be fuzzy.
  ● Spoken in some context.
GeoXwalk is a comprehensive Gazetteer linking vocabulary of
current and historical geographical names to a standard spatial
coding scheme ( longitude, latitude ).

Technically GeoXwalk has basically three components :-
 ● Gazetteer database to support spatial searches.
 ● Middleware components to issue spatial/aspatial queries.
 ● Geo parser to parse non geographically indexed documents
   for some place name as reference to it.
Gazetteer database
Each geographical feature must include :-
  ● Feature name.
  ● Feature type.
  ● Geometry ( spatial footprints ).

Marking out the places can be done better by using Polygons as opposed to Points.


Explicit relationships can be defined which is of particular use when Gazetteer hold
significant amount of historical data for which geometries doesn't exist.

Middleware components:

Protocols supported by geoXwalk are:-
   ● ADL Gazetteer protocol
   ● OGC filter encoding implementation.

This is to translate XML queries to database specific SQL queries.
GeoParser

Most data and metadata existing have some sort of geo-reference that is not in format
which will allow it to be easily spatially searched.

One task associated is how non spatially referenced documents could be spatially indexed.
Could be done using a Gazetteer as reference.
Prototype based geo-parser has been implemented that semi automatically identifies place
name in a document and extract a suitable spatial footprint.
The rule based approach takes in account the structure and context in which words occur.

One issue that is faced by GeoXwalk are Map conflation i.e. detecting duplicate entries.
Like a place spoken differently in different language but has a same geographic footprint.
Related Projects: GeoVSM
Geographic Vector Space Model:           The project integrates coordinate based
geographic indexing with the key-word based vector space model in are presenting
information space. Relevance measures are based on both geographic measures and on
thematic measures which can be combined into one single measure system.

Vector Space Model: One of the most popular models of document space developed
in textual-based information retrieval research. It is an algebraic model for representing
text or graphical documents (and any objects, in general) as vectors of identifiers.
Using a vector space model, the content of each geographic document can be
approximately described by a vector of (content-bearing) terms, which are a combination of
thematic
subjects and place names.
    ● Documents and queries are represented as vectors. Each dimension corresponds to a
      separate term
 An information retrieval system stores a representation of a
document collection using a document-by-term matrix, where the element at position (i, j)
corresponds to the frequency of occurrence of term j in the ith document. In the vector
space model, all the objects (terms, documents, queries, concepts, etc) can be similarly
represented as vectors.

  ● Vector space model is well accepted as an effective approach in modelling thematic
However, the vector space model has some serious problems when used for
modeling the geographic subspace.




 The geographic space is inherently continuous and cannot be
adequately approximated using a set of place names (which are discrete in nature). if a
document mentions four place names—Pittsburgh, Philadelphia, Harrisburg, and
Hagerstown—the four place names will be treated as four independent dimensions in a
vector space model, whereas in fact, they are points (or regions) in a two-dimensional
geographic space.


Additional concerns of using locational terms as geographic indexes include: ambiguity in
meaning, non-unique place names, place name might change over time, and spelling
variations
Geographical Model

   ● Geographical model of document space is capable of processing arbitrarily complex
     spatial queries.

   ● The most common spatial are believed to be of three types:
1.Point query: Return the geometric object that contains a given query point
2.Region query :Given a region R, find all objects in the collection that intersect R
3.Buffer zone :A buffer query involves two spatial data sets and a distance d. The answer
to this query are pairs of objects, one from each input set, that are within distanced of
each other. For e.g. “find house-power line pairs that are within 50 meters of each
other.”
    ● Spatial indexing based on coordinates generates persistent indexes for documents,
      since it is well defined and is immune from any changes in place names, political
      boundaries, and linguistic variations
VSM / Geographical model (contd..)

   ● Disadvantages of using the Geographical model in retrieving geographical
      information
-There are considerable amount of geographical information existing in textual forms
that are not easily integrated into geographical model for mapping and spatial
analysis, due to the difficulties of natural language understanding for geo-referencing
text.
-
GeoVSM

● Model obtained by combining the advantages of both the geographical model and
  vector space model.
● Each document will be indexed both by footprint (in geographical coordinate space)
  and by a term vector (in vector space).
● Geographical indexes will only represent the geographical scope of the document,
  and term vectors will only represent thematic scope of documents
            

Assume that any document has a limited geographic scope, GSd, and
a thematic scope, TSd. Similarly, a query on a document collection also has a geographic
scope, GSq and a thematic scope, TSq. The degree of relevance of a document
to a query can be determined by the following measure:
Rel(d, q) = ƒ(SimG(GSd, GSq), SimT(TSd, TSq) ) (1)
where SimG(•) measures the similarity (i.e., the degree of overlapping) between the
geographic scopes of the document and the query; SimT(•) measures the degree of
overlapping between the thematic scopes of the document and the query; and ƒ(*) is a
function for combining relevance measures of geographic dimensions and thematic
dimensions.
References

* GeoVSM: An Integrated Retrieval Model for Geographic Information
Guoray Cai
School of Information Sciences and Technology
The Pennsylvania State University
002K Thomas Building, University Park, PA 16802

* http://www.geo-spirit.org/public_deliverables.html

* http://www.geo-spirit.org/publications/SPIRIT_WP5_D17_5201_final.pdf

* http://www.geo-spirit.org/publications/SPIRIT_DeliverableD18_5302_final.pdf

* http://www.geo-spirit.org/publications/GIR_distrib_ranking.pdf

* Distributed Ranking Methods for Geographic Information Retrieval by
Marc van Kreveld Iris Reinbacher Avi Arampatzis Roelof van Zwol

Más contenido relacionado

La actualidad más candente

Geographic Phenomena and their Representations
Geographic Phenomena and their RepresentationsGeographic Phenomena and their Representations
Geographic Phenomena and their RepresentationsNAXA-Developers
 
Spatial Database and Database Management System
Spatial Database and Database Management SystemSpatial Database and Database Management System
Spatial Database and Database Management SystemLal Mohammad
 
Data input and transformation
Data input and transformationData input and transformation
Data input and transformationMohsin Siddique
 
TYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
TYBSC IT PGIS Unit III Chapter II Data Entry and PreparationTYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
TYBSC IT PGIS Unit III Chapter II Data Entry and PreparationArti Parab Academics
 
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsTYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsArti Parab Academics
 
Digital image processing and interpretation
Digital image processing and interpretationDigital image processing and interpretation
Digital image processing and interpretationP.K. Mani
 
Remote Sensing Data Acquisition,Scanning/Imaging systems
Remote Sensing Data Acquisition,Scanning/Imaging systemsRemote Sensing Data Acquisition,Scanning/Imaging systems
Remote Sensing Data Acquisition,Scanning/Imaging systemsdaniyal rustam
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systemsVivek Srivastava
 
Geographic Information System unit 1
Geographic Information System   unit 1Geographic Information System   unit 1
Geographic Information System unit 1sridevi5983
 
Vector and Raster Data data model
Vector and Raster Data data modelVector and Raster Data data model
Vector and Raster Data data modelCalcutta University
 
Image pre processing
Image pre processingImage pre processing
Image pre processingAshish Kumar
 
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and PositioningTYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and PositioningArti Parab Academics
 

La actualidad más candente (20)

Geographic Phenomena and their Representations
Geographic Phenomena and their RepresentationsGeographic Phenomena and their Representations
Geographic Phenomena and their Representations
 
Gis
GisGis
Gis
 
Spatial Database and Database Management System
Spatial Database and Database Management SystemSpatial Database and Database Management System
Spatial Database and Database Management System
 
Spatial Data Models
Spatial Data Models Spatial Data Models
Spatial Data Models
 
Unit 5
Unit 5Unit 5
Unit 5
 
GIS PPT
GIS PPTGIS PPT
GIS PPT
 
Web Based GIS
Web Based GISWeb Based GIS
Web Based GIS
 
Data input and transformation
Data input and transformationData input and transformation
Data input and transformation
 
Spatial databases
Spatial databasesSpatial databases
Spatial databases
 
Fundamentals of GIS
Fundamentals of GISFundamentals of GIS
Fundamentals of GIS
 
TYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
TYBSC IT PGIS Unit III Chapter II Data Entry and PreparationTYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
TYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
 
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsTYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
 
introduction-of-GNSS-1
introduction-of-GNSS-1introduction-of-GNSS-1
introduction-of-GNSS-1
 
Digital image processing and interpretation
Digital image processing and interpretationDigital image processing and interpretation
Digital image processing and interpretation
 
Remote Sensing Data Acquisition,Scanning/Imaging systems
Remote Sensing Data Acquisition,Scanning/Imaging systemsRemote Sensing Data Acquisition,Scanning/Imaging systems
Remote Sensing Data Acquisition,Scanning/Imaging systems
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systems
 
Geographic Information System unit 1
Geographic Information System   unit 1Geographic Information System   unit 1
Geographic Information System unit 1
 
Vector and Raster Data data model
Vector and Raster Data data modelVector and Raster Data data model
Vector and Raster Data data model
 
Image pre processing
Image pre processingImage pre processing
Image pre processing
 
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and PositioningTYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
 

Destacado

From 0 to mine sweeper in pyside
From 0 to mine sweeper in pysideFrom 0 to mine sweeper in pyside
From 0 to mine sweeper in pysideDinesh Manajipet
 
Geographic Information Retrieval (GIR)
Geographic Information Retrieval (GIR)Geographic Information Retrieval (GIR)
Geographic Information Retrieval (GIR)Behrooz Rasuli
 
Geocoding in Geographic Information Systems
Geocoding in Geographic Information SystemsGeocoding in Geographic Information Systems
Geocoding in Geographic Information SystemsTanner Jessel
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 

Destacado (7)

From 0 to mine sweeper in pyside
From 0 to mine sweeper in pysideFrom 0 to mine sweeper in pyside
From 0 to mine sweeper in pyside
 
Geographic Information Retrieval (GIR)
Geographic Information Retrieval (GIR)Geographic Information Retrieval (GIR)
Geographic Information Retrieval (GIR)
 
Geocoding in Geographic Information Systems
Geocoding in Geographic Information SystemsGeocoding in Geographic Information Systems
Geocoding in Geographic Information Systems
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Networking ppt
Networking ppt Networking ppt
Networking ppt
 

Similar a Geographic Information Retrieval: An Overview

Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1Johan Blomme
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysisJohan Blomme
 
Geo tagging & spatial indexing of text-specified data
Geo tagging & spatial indexing of text-specified dataGeo tagging & spatial indexing of text-specified data
Geo tagging & spatial indexing of text-specified dataShiv Shakti Ghosh
 
A quick overview of geospatial analysis
A quick overview of geospatial analysisA quick overview of geospatial analysis
A quick overview of geospatial analysisMd.Farhad Hossen
 
Geographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxGeographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxzewde alemayehu
 
Geographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunGeographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunzewde alemayehu
 
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseTYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseArti Parab Academics
 
Sampling and Probability in Geography
Sampling and Probability in Geography Sampling and Probability in Geography
Sampling and Probability in Geography Prof Ashis Sarkar
 
Introduction to Geographic Information system and Remote Sensing (RS)
Introduction to Geographic Information system  and Remote Sensing (RS)Introduction to Geographic Information system  and Remote Sensing (RS)
Introduction to Geographic Information system and Remote Sensing (RS)chala hailu
 
INTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptINTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptSafriyana1
 
7 srtm paleochannels_aeromagnetics data
7 srtm paleochannels_aeromagnetics data7 srtm paleochannels_aeromagnetics data
7 srtm paleochannels_aeromagnetics dataIUMA MARTINEZ
 
Performing Fast Spatial Query Search by Using Ultimate Code Words
Performing Fast Spatial Query Search by Using Ultimate Code WordsPerforming Fast Spatial Query Search by Using Ultimate Code Words
Performing Fast Spatial Query Search by Using Ultimate Code WordsBRNSSPublicationHubI
 
Differentiation between Global and Local Datum from Different aspect
Differentiation between Global and Local Datum from Different aspect Differentiation between Global and Local Datum from Different aspect
Differentiation between Global and Local Datum from Different aspect Nzar Braim
 
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdfVisheshDalwal
 

Similar a Geographic Information Retrieval: An Overview (20)

Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
 
10.1.1.17.1245
10.1.1.17.124510.1.1.17.1245
10.1.1.17.1245
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
Standard
StandardStandard
Standard
 
Geo tagging & spatial indexing of text-specified data
Geo tagging & spatial indexing of text-specified dataGeo tagging & spatial indexing of text-specified data
Geo tagging & spatial indexing of text-specified data
 
A quick overview of geospatial analysis
A quick overview of geospatial analysisA quick overview of geospatial analysis
A quick overview of geospatial analysis
 
Geographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxGeographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptx
 
Geographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunGeographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahun
 
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseTYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
 
Sampling and Probability in Geography
Sampling and Probability in Geography Sampling and Probability in Geography
Sampling and Probability in Geography
 
Introduction to Geographic Information system and Remote Sensing (RS)
Introduction to Geographic Information system  and Remote Sensing (RS)Introduction to Geographic Information system  and Remote Sensing (RS)
Introduction to Geographic Information system and Remote Sensing (RS)
 
INTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptINTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.ppt
 
7 srtm paleochannels_aeromagnetics data
7 srtm paleochannels_aeromagnetics data7 srtm paleochannels_aeromagnetics data
7 srtm paleochannels_aeromagnetics data
 
Performing Fast Spatial Query Search by Using Ultimate Code Words
Performing Fast Spatial Query Search by Using Ultimate Code WordsPerforming Fast Spatial Query Search by Using Ultimate Code Words
Performing Fast Spatial Query Search by Using Ultimate Code Words
 
Geography and Cartography
Geography and CartographyGeography and Cartography
Geography and Cartography
 
ch1
ch1ch1
ch1
 
Global Observation Data Integration with Lexicographic and Geospatial Ontology
Global Observation Data Integration with Lexicographic and Geospatial OntologyGlobal Observation Data Integration with Lexicographic and Geospatial Ontology
Global Observation Data Integration with Lexicographic and Geospatial Ontology
 
Differentiation between Global and Local Datum from Different aspect
Differentiation between Global and Local Datum from Different aspect Differentiation between Global and Local Datum from Different aspect
Differentiation between Global and Local Datum from Different aspect
 
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf
1505382049E-TextConceptsofGIS(includeerrorsinGIS.pdf
 
Ai for cultural history
Ai for cultural historyAi for cultural history
Ai for cultural history
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Geographic Information Retrieval: An Overview

  • 4. Introduction  ● Geographic Information Retrieval can be seen as a specialized branch of traditional Information Retrieval. ● Information that has relationships to geographic space is called georeferenced information and frequently used term in Georeferenced Information Retrieval. ● Georeferenced information is used in all kinds of media, Eg :- Structured data like maps, land surveys, airborne and satellite images and tabulated observations. ● Can also be used by researchers looking for certain area, or requiring particular area inhabited by certain animals or is affected by an epidemic.
  • 5. Properties of Georeferenced Information: ● Information available in digital libraries and on the Internet is georeferenced, although mostly it is not denoted in terms of geographic coordinates. ● The geographical location and extension of a place name is often called geographic footprint and it is given by coordinates ( longitude, latitude ). ● Geographic Information Retrieval requires that place names and phrases that include direct or indirect references to place names be resolved and translated into footprints that can be indexed.
  • 6. General Problems in GIR: Ambiguity/Lack of precision in Place Names: ● Firstly, several places can share the same name, making the place names unique only within a limited geographic area. ● Secondly, some place names occurring in texts are temporal or cultural conventions rather than official names, requiring the user to have an understanding of the time, context or cultural environment the place names are used in to be able to link it to some geographic location. ● Thirdly, some place names change over time. eg. Banglore to Bengaluru, Calcutta to Kolkata etc.. ● Fourthly, the geographic extension that the place name denotes can be extended, reduced or changed over time.
  • 7. General Problems in GIR: (contd.) ○ Fifthly, the borders of a location can be fuzzy. (Kashmir?) ○ The same place name can be written differently in different text, either because the author has misspelled the name or because there are different legal spellings of the same place name. Information being fuzzy : ○ About 200 kilometers south of the capital of Russia” . Direction may vary, distance may vary. In case of South Africa there are 3 capitals which may lead to ambiguity. ○ Often, people are imprecise in giving geographic direction, using one of the four general directions north, south, east or west, when the actual direction might be somewhere in between.
  • 8. Impact of cognitive model on Geographic Information Retrieval ● Human understanding of the geographic loaction: Procedural and Survey based. ● Survey: Involves looking at maps and geographic location finding. ● Procedural: Involves exploring and navigating through the place so as to get the 'feel' of it. ● Using procedural method to locate or gain information is particularly difficult as it contains many phrases involving human ambiguity.
  • 9. Cognitive model (continued) ● 'People link geographic distance with time.': People when talking about going from say 'a' to 'b' have a tendency of using time as a method of asserting distance.eg: It takes two hours to reach from 'A' to 'B' by car. ● 'Topology and metric distances': People are very good at mentioning topological aspects pertaining to a place. Like inclusion (eg: names of the topologies in an area.) or coincidences (eg: this place is at the same place as..) ● 'People have biases towards east-west or north-south direction': People have a very biased view of the geographical area. And while giving specifics in direction, they seem to have a vague sense of direction. eg: When asked where is south america w.r.t to north america. The answer generally is south. While the really it is in the south-east.
  • 10. Geo referencing using the Gazetteers  Gazetteers: A form of index that relates place names to co-ordinates of locations and extents. Here we are going to focus on automatic geo-referencing based on the contents of the documents text alone In an automated approach most projects have based their approaches to georeferencing on a combination of place name identification and natural language processing to identify phrases that modifies the location pointed to by occurrences of place names (“200 km south of the Moskow”) or that provides georeferences that indicates a geo-reference without actually mentioning a specific place name (“Rosenborgs homefield”).
  • 11. Geo- referencing (continued) Gazetteers have three basic components: The name is the textual designator of a geographic location, the location is the coordinates of a point, line or area on the earth’s surface pointed to by a name, and the feature type is the type of location that a name points to (Forrest, agricultural area, river, inhabited location etc).The location that a place name refers to (the place names footprint) can be given as a point, a bounding box or a polygon, all represented by coordinates.
  • 12. Geo-referencing (continued) Centroid point: Vague in terms of geometry and size of the area. Little data storage.
  • 13. Geo- referencing (continued) Bounding Box: Gives a better idea of the entire referenced area. Does not require a lot of data storage. However it overlaps other areas around it and is inaccurate.
  • 14. Geo-referencing (continued) Approximated Polygon approach: Most accurate in terms of referencing. However takes a lot of data storage space. The best approach would be to have something in the middle of the polygon and bounded box approach like a fixed points polygon approach.
  • 15. Searching for Georeferenced Information Letting the user specify one or more place names in as keywords in a traditional keyword based query. When parsing the query, the GIR/IR treats the found place names as special keywords by the GIR/IR system, indicating the geographical scope of the information need of the user. e.g: Googling for Restaurants around you? Letting users specify the geographic constraint to a query by drawing on one or more maps. e.g: Google Maps and what about GPS Apps like "Here and Now", "Google Latitude"?
  • 16. Searching for Georeferenced Information Typical Queries: ○ Point in Polygon - asking for georeferenced information that contains, surrounds or refers to a particular geographic point location ○ Region Queries - asking for anything contained in, adjacent to, or overlaps the region. ○ Distance and Buffer Zone Queries - asking for information within some fixed distance of a geographic object (point, line, polygon) ○ Path Queries - asking for the presence of a network structure that can be queried for network traversal information ○ Multimedia Queries - combining multiple geo-referenced information sources in resolving a query.
  • 17. Related Projects: SPIRIT:(Spatially-aware information retrieval on the internet) - funded by the EC Fifth Framework Programme. To improve the search capabilities on the internet by using geographical and conceptual ontologies to model both vocabulary and the spatial structure of places for purposes of IR.This ontology, which is envisioned as an extension to traditional gazetteers and related locations as well as help ranging hits based on geographic properties. ∙ ontologies that model geographical terminology; ∙ query expansion and relevance ranking procedures based on the geographical ontologies; ∙ machine learning techniques for the extraction of geographical context from web documents and for generating metadata providing spatial context; ∙ a multi-modal user interface providing textual input and interactive map feedback of the context of retrieved documents; ∙ spatial indices for web collections
  • 18. Geo-Ontologies Ontologies relating Geographical Terminology and Spatial Relationships ● Reference to a geographic place: <PL-Name,PL-Type,{(x,y)}> ○ eg: <Charminar, Monument,{(x,y)}> ● Relative Place Reference : <Spatial Relationship,PL-Name, Type,PL-FP> ○ eg: <In, Hyderabad, City, {(x,y)}> A Query to SPIRIT will contain one or more references to a PL-REF Geographic content is a set of <Place reference> expressions and the Geometric Footprint is a function of this set. Basically Geo Ontologies can be applied in : 1) User's query interpretation: (+ domain specific ontologies) for disambiguation of place name 2) System query formulation: to generate alternate names and spatially associated names 3) Metadata extraction: to extract info from free text documents to generate foot print(s) 4) Relevance Ranking: potential for geographical relevance ranking (Dominos Pizza? :) )
  • 19. Geo-Ontologies Ontology"formal, explicit specification of a shared conceptualisation"
  • 20. Geo-Ontologies ● Types of Atomic Queries: ○ A place name ○ An aspatial entity with relation to a place name ○ An aspatial entity with a spatial relation to a place name ○ An aspatial entity with a spatial relation to a place name ○ A place name with spatial relation to a place name ○ A place type with spatial relation to a place name ○ A place type with spatial relation to a place type ● Geo Ontology = Geographic Feature Ontology + Geographic Type Ontology + Spatial Relation Ontology
  • 21. User evaluation of the spirit prototype gave consistent results with SPIRIT priorities on innovative features. Yet, users explain a feeling of frustration which highlights that their requirements are beyond SPIRIT achievements and that there is still more work to be done in this area. The last publication on the website dates back to 2005.
  • 22.                         Relevance  In Information Retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. Thus, the ranking of documents by both textual and spatial relevance have to be considered. The most common way to return a set of documents obtained from a Web query is by a ranked list. The search engine attempts to determine which document seems to be the most relevant to the user and will put it first in the list. In short, every document receives a score, or distance to the query, and the returned documents are sorted by this score or distance. There are situations where the sorting by score may not be the most useful one. When a more complex query is done, composed of more than one query term or aspect, documents can also be returned with two or more scores instead of one.
  • 23. For example, the Web search could be for campings in the neighborhood of Neuschwanstein, and the documents returned ideally have a score for the query term “camping” and a score for the proximity to Neuschwanstein. This implies that a Web document resulting from this query can be mapped to a point in the 2-dimensional plane, where both axes represent a score. The map indicates campings near the castle Neuschwanstein, which is situated close to Schwangau, with the distance to the castle on the x-axis and the rating on the y-axis.
  • 24.               Another weakness of our methods lies in the way we treat multiple-footprint documents. While we assume that a query can have only one footprint (a user is interested in only one location), documents may have multiple footprints (refer to more than one location). The method we followed so far in order to calculate the spatial score considers only the best-matching document footprint. For example, if a user is looking for “airports near London”, a document that refers to both “Gatwick” and “Stansted” is scored as referring only to “Gatwick” since it’s the nearest airport of the two. Such a document, however, should be scored higher than another that refers only to “Gatwick” since it provides more relevant information. Another thing is , the number of footprints occurring: Gatwick’s official web-pages should be more important than a web-list of all airports in UK.
  • 25.                For high-quality ranking two things are required. Firstly, we need a good spatial score between query and document footprints. Secondly, we need a good combination of the spatial and textual (BM25) scores. For finding spatial scores, the spatial relationships (distance, containment, and direction) were converted into numeric values that indicate how close, how much inside, or how much North-of the relationship between two objects is. Those numeric values were first attempts at obtaining a score to quantify spatial relationships. However, certain issues do come up in this method. For example, let us assume three cities, A, B, and C, where A lies in equal distance (in a Euclidean sense) from B and C. If C is bigger than B, then the score of B being close to A should be lower than that of C being close to C. In other words, the distance scores of cities around A may depend on the context, i.e. which other cities are around A. Also, natural barriers can influence the concept of proximity. It matters a lot whether a distance of 10 km (as the crow flies) can be covered by a direct road, or requires a large detour around a mountain range (or a small road over a mountain pass)
  • 26.               In traditional information retrieval, the separate scores of each document would be combined into a single score (e.g., by a weighted sum or product) which produces the ranked list by sorting. Now, we are going to incorporate two pieces of information into the way that a spatial document score is calculated: • The number n of unique footprints in a document. • The frequencies f_1,…, f_n, of occurrence of the footprints in the document. Moreover, the total spatial score of a document will be derived from fractional score contributions of all occurring document footprints.
  • 27.                A simple way of taking into account all document footprints is to define the total spatial score as a linear combination (e.g. the simple average) of the individual scores of the footprints: S = 1/n * (s_1+…+s_n) where s_i is the score of the ith document footprint in respect to the query footprint. Incorporating also the frequencies of occurrence f_i, let us define the weight of a footprint: tf_i = 1 + log (f_i). A footprint that occurs in the document only once will get a weight of one, where any extra occurrences will increase the weight in a log fashion. The total score may be calculated as S = 1/(tf_1+…+tf_n) * (tf_1*s_1+…+tf_n*s_n), that is the weighted average of the individual scores.
  • 28.                Considering again the example about “airports near London”, such a scoring function like the last one would score higher Gatwick’s official web-page than a web-list of all UK airports. Moreover, it takes into account more than the best-matched document footprint. The last formula may serve as a starting point for improving the spatial scoring function.
  • 29. Evaluation: 2 Indicators: 1) Recall = No. of Relevant Docs returned / Total No. of rel. Docs 2) Precission = No. of relevant Docs returned / Total No. of Indexed Docs Trec has been evaluated using the ISO 9241 standard: based on Effectiveness (can users find relevant docs?) , Efficiency (resourcs consumed per result) and Satisfaction (User feedback)
  • 30. Gazetteer Server and Service for UK Academia - James Reid Gazetteer :- Geographical dictionary or directory. Serves as reference for information about places. ● Geographic searching is powerful information retrieval tool, because the results obtained hereafter are more specific. ● Geographic searching is restricted because Geographic metadata creation is very resource intensive and the resources having geographic metadata exists only to names. ● There is no particular mentioning of the geographic footprint i.e. directly. There might be direct or indirect reference to the place. Constant change in Geographic metadata:- ● Names of places may vary. ● Names may have changed from time to time. ● Boundaries can be fuzzy. ● Spoken in some context.
  • 31. GeoXwalk is a comprehensive Gazetteer linking vocabulary of current and historical geographical names to a standard spatial coding scheme ( longitude, latitude ). Technically GeoXwalk has basically three components :- ● Gazetteer database to support spatial searches. ● Middleware components to issue spatial/aspatial queries. ● Geo parser to parse non geographically indexed documents for some place name as reference to it.
  • 32. Gazetteer database Each geographical feature must include :- ● Feature name. ● Feature type. ● Geometry ( spatial footprints ). Marking out the places can be done better by using Polygons as opposed to Points. Explicit relationships can be defined which is of particular use when Gazetteer hold significant amount of historical data for which geometries doesn't exist. Middleware components: Protocols supported by geoXwalk are:- ● ADL Gazetteer protocol ● OGC filter encoding implementation. This is to translate XML queries to database specific SQL queries.
  • 33. GeoParser Most data and metadata existing have some sort of geo-reference that is not in format which will allow it to be easily spatially searched. One task associated is how non spatially referenced documents could be spatially indexed. Could be done using a Gazetteer as reference. Prototype based geo-parser has been implemented that semi automatically identifies place name in a document and extract a suitable spatial footprint. The rule based approach takes in account the structure and context in which words occur. One issue that is faced by GeoXwalk are Map conflation i.e. detecting duplicate entries. Like a place spoken differently in different language but has a same geographic footprint.
  • 34. Related Projects: GeoVSM Geographic Vector Space Model: The project integrates coordinate based geographic indexing with the key-word based vector space model in are presenting information space. Relevance measures are based on both geographic measures and on thematic measures which can be combined into one single measure system. Vector Space Model: One of the most popular models of document space developed in textual-based information retrieval research. It is an algebraic model for representing text or graphical documents (and any objects, in general) as vectors of identifiers. Using a vector space model, the content of each geographic document can be approximately described by a vector of (content-bearing) terms, which are a combination of thematic subjects and place names. ● Documents and queries are represented as vectors. Each dimension corresponds to a separate term An information retrieval system stores a representation of a document collection using a document-by-term matrix, where the element at position (i, j) corresponds to the frequency of occurrence of term j in the ith document. In the vector space model, all the objects (terms, documents, queries, concepts, etc) can be similarly represented as vectors. ● Vector space model is well accepted as an effective approach in modelling thematic
  • 35. However, the vector space model has some serious problems when used for modeling the geographic subspace. The geographic space is inherently continuous and cannot be adequately approximated using a set of place names (which are discrete in nature). if a document mentions four place names—Pittsburgh, Philadelphia, Harrisburg, and Hagerstown—the four place names will be treated as four independent dimensions in a vector space model, whereas in fact, they are points (or regions) in a two-dimensional geographic space. Additional concerns of using locational terms as geographic indexes include: ambiguity in meaning, non-unique place names, place name might change over time, and spelling variations
  • 36. Geographical Model ● Geographical model of document space is capable of processing arbitrarily complex spatial queries. ● The most common spatial are believed to be of three types: 1.Point query: Return the geometric object that contains a given query point 2.Region query :Given a region R, find all objects in the collection that intersect R 3.Buffer zone :A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distanced of each other. For e.g. “find house-power line pairs that are within 50 meters of each other.” ● Spatial indexing based on coordinates generates persistent indexes for documents, since it is well defined and is immune from any changes in place names, political boundaries, and linguistic variations
  • 37. VSM / Geographical model (contd..) ● Disadvantages of using the Geographical model in retrieving geographical information -There are considerable amount of geographical information existing in textual forms that are not easily integrated into geographical model for mapping and spatial analysis, due to the difficulties of natural language understanding for geo-referencing text. -
  • 38.
  • 39. GeoVSM ● Model obtained by combining the advantages of both the geographical model and vector space model. ● Each document will be indexed both by footprint (in geographical coordinate space) and by a term vector (in vector space). ● Geographical indexes will only represent the geographical scope of the document, and term vectors will only represent thematic scope of documents
  • 40.
  • 41.              Assume that any document has a limited geographic scope, GSd, and a thematic scope, TSd. Similarly, a query on a document collection also has a geographic scope, GSq and a thematic scope, TSq. The degree of relevance of a document to a query can be determined by the following measure: Rel(d, q) = ƒ(SimG(GSd, GSq), SimT(TSd, TSq) ) (1) where SimG(•) measures the similarity (i.e., the degree of overlapping) between the geographic scopes of the document and the query; SimT(•) measures the degree of overlapping between the thematic scopes of the document and the query; and ƒ(*) is a function for combining relevance measures of geographic dimensions and thematic dimensions.
  • 42. References * GeoVSM: An Integrated Retrieval Model for Geographic Information Guoray Cai School of Information Sciences and Technology The Pennsylvania State University 002K Thomas Building, University Park, PA 16802 * http://www.geo-spirit.org/public_deliverables.html * http://www.geo-spirit.org/publications/SPIRIT_WP5_D17_5201_final.pdf * http://www.geo-spirit.org/publications/SPIRIT_DeliverableD18_5302_final.pdf * http://www.geo-spirit.org/publications/GIR_distrib_ranking.pdf * Distributed Ranking Methods for Geographic Information Retrieval by Marc van Kreveld Iris Reinbacher Avi Arampatzis Roelof van Zwol