How to Troubleshoot Apps for the Modern Connected Worker
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
1. GeoNames
“Under the Hood: How GeoNames Aggregates
many Sources into One Data Set“
GeoNames is ...
aggregator of free geo data
I am ...
Marc Wick
self employed software engineer, Switzerland
3. GeoNames - Gazetteer
Pragmatic, useful, ease of use
Over 6.5 million features
Cc-by licence
9 feature classes
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
5. Origins and Goal
Proprietary application
Team up together
contribute modifications to central data base.
applications switch to GeoNames from
proprietary aggregation
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
6. Challenge
A lot of data IS
available
Many providers
Languages
Scripts
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
7. GeoNames Ambassadors
GeoNames contact
Speak local language
Know local situation
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
8. Data Sources
National Mapping Agencies
Statistical Offices
Postal codes
National Geospatial-Intelligence Agency (NGA)
Applications using GeoNames
− Data files
− Manual modifications
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
9. US vs Europe
US data is freely available
European data is not available
Rest of the World?
Consequences
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
11. Future of geodata availability
We believe basic geodata will be free in most
countries
Why :
− Economy
− Traffic Policy and Road Safety (road signs)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
13. Free Availability is only a First Step
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
14. Who aggregates data
GeoNames
Super national mapping agencies
Super national organisations
INSPIRE
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
15. Problems and Solutions I
Shape / GML FWTools/ GDAL/OGR
Datum reprojection Postgis/epsg/native
tools/custom impl
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
16. Problems and Solutions II
FeatureCodes not 1:1 Pattern matching
non-ASCII Transliteration
Country codes
Admin1 codes
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
17. Place name matching
Geocoding
Distance
feature type and feature code
Reverse geocoding, compare name similarity
− levenshtein distance
− letter pair similarity
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
19. Wikipedia GeoTemplates
Proliferation of GeoFormats
No consensus, Anarchy
Examples
− <geo>48 46 36 N 121 48 51 W</geo>
− {{coor d|48.7767|N|121.8142|W|}}
− Berlin : |lat_deg = 52|lat_min = 31
− ... (Any template you could possibly think of is used somewhere)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
20. Alternate Names
...
Italian : Berlino
English : Berlin
Arabic : نيلرب
Korean :
Thai : เบอรลิน
Russian : Берлин
Chinese :
Marathi : बर् लि न
... (ca 100 names)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
21. Postal codes
Geocode – postal code numeric distance
Accuracy, completeness
ScribbleMaps by Robert Kosara
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
24. Data Dump
Flat csv files
Simple format
Ease of use
Full daily dump
daily modifications
rdf
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
25. Web Services
Search
− Ranking
Tf idf
Relevancy
− I18n
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
27. Hierarchy Web Services
Hierarchy
Child
Neighbour
Sibling
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
28. Apache
mod rewrite
ROME (RSS) jdom.org (xml) JSON
Tomcat (Java) JMS
activeMQ
Lucene
SRTM3
Gtopo30
JDBC
Full Text Index
TF-IDF
Database : Postgres
(postgis)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
29. Libraries
Java
Drupal
Ruby
Php
Perl
Python
Lisp
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
30. Synchronization
Dail dump
Daily modification
Jms
Rdf dump, periodically
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30