2. About - EDINA National Data Centre
• A designated National Data Centre for Tertiary
Education since 1995
• Based at The University of Edinburgh
• Our mission...
to enhance the productivity of research, learning and
teaching in UK higher and further education
BY
delivering access to a range of online data services through a
UK academic infrastructure, as well as supporting knowledge
exchange and ICT capacity building, nationally and
internationally.
• Focus is on service but also undertake r&D
• History
– first online GI service, UKBORDERS, launched in 1994
– flagship Digimap service now a teenager!
– substantial experience in handling geospatial data on a large
scale (large db; large user base)
3. The Geoservices Team
• Largest team within EDINA 1999
• Highly experienced and skilled
team Projects
– provides advice nationally and
internationally
Services
– active in standards development and
policy
– active in GI community nationally and
Today
internationally
Projects
• Demands of the services offered
means the team has been at
leading edge of GI service
Services
development in UK
4. Our Service requirements
• Fast servicing of requests
• Scaleable and extensible
– accommodates steady or increasing demand
• Robust (our SLA aspires to 99% uptime!)
• Maintainable
• Standardised
– can easily substitute components for repair, upgrade,
etc.
• Rapid prototyping and rollout
• All of above on tight budget!
5. What do we use Postgres/PostGIS for?
• Service operation and management
• Map creation
– Data store for vector based maps
– Indexing service for raster based maps
– Source for ‘Get Feature Info’ queries
• Data Delivery
– Data store for vector products
• Searching/Querying
– Advanced place name searching
6. … for service operation and management
• Store service critical metadata
• User data
• Control user access
• Log activity
7. Case Study: Digimap
• Approx 50,000 active users at any point in time
• Academic Year 2010/11 stats
• c400,000 logins
• Over 10 million maps created
• 240,000 high quality print maps generated
• 100,000 data download requests
• Over 1 million data files downloaded
8. … as a ‘Data Store’ for mapping
• From the (very) large
• Ordnance Survey’s MasterMap (in EDINA’s map schema)
Data Rows:
Area: 107,293,931
Lines: 278,110,576
Boundary: 535,039
Points: 3,984,140
Symbols: 2,793,680
Text: 21,004,729
Data Size (indexes):
Area: 49 Gb (13Gb)
Lines: 73 Gb (24Gb)
Boundary: 321 Mb (46 Mb)
Points: 668 Mb (399 Mb)
Symbols: 522 Mb (236 Mb)
Text: 4 Gb (1.7gb)
9. … as a ‘Data Store’ for mapping
• … via the small but cartographically complex
• Ordnance Survey’s Strategi
Only 778,000 rows
Range of geometries
Strict layer draw order
Over 50 layers
Many drawn multiple times
10. … as a ‘Data Store’ for mapping
• … to the complex data schema
• SeaZone’s Hydrospatial
Large range of features
Complex feature relationships
Individual layers scale control
11. … as a ‘Spatial Indexing’ system
• Spatial index for 1.4 million historical maps of Great Britain
• Covers the late 1840s to early 1990s
Complex file structure
Reflects original capture
Counties
Towns
Editions
Scale
And the digitisation process
… but not critically TIME
12. • However, for historical data the temporal availability was
critical.
• Use of date information in addition to spatial index allows
maps to be placed in correct time slot
– Used publication date as survey date metadata missing
– An example of a MapServer layer definition for 1890s maps:
area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a,
(selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as
varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version =
'ng' or version = 'cs_ng') and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and
a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND
(cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group
where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as
varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition =
sheet_group.edition2) as subq using unique id using SRID=27700
13. • Ease of use with range of map rendering software
OS Strategi (Cadcorp GeognoSIS)
OS Open Data: Panorama and Vector
Map District products plus grid lines and
labels (MapServer)
14. … for WMS GetFeature Info
• Easy to provide
information about
selected feature.
• Allow use of additional
search parameters, for
example proximity to
point clicked.
• Access additional
metadata tables for Example of proximity
information. search (especially useful
for point data)
Map sheet information
stored in metadata tables.
Bedrock information and
selected area highlighted.
15. … update interfaces to reflect current map
Legend shows only rock
types in area (over 1000
in full legend)
Timeline highlights selected as well as other available decades
16. … as a ‘Data Store’ for download
UKBORDERS provides bespoke
data extraction of vector
boundary data in custom
formats (Shape, MIF,KML,DXF)
Realtime extraction - uses
Geoserver over PostGIS as WFS
piped through FME
Metamodel built around
PostGIS (formerly Oracle).
Migration resulted in a more
scalable (multiple
dev/live/fallover instances) with
easier desktop prototyping
OpenBoundaries – same
engine, different data (all
based around derived OS
Open Data) and skin
17. … for querying
• Unlock provides an Application Programming Interface (API)
for querying over 11 million geographic names across variety
of gazetteers:
• GeoNames (world coverage)
• Pleiades ancient place names (world coverage)
• Natural Earth (world coverage)
• OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary-
Line, BN Grid references
• Placename outlines and attribution extracted from mapping
data or published gazetteers
• Outlines are unique service feature enabling further spatial
data extraction and analysis
• Unlock Places extensively uses stored database procedures:
• The writing of dynamic queries.
• Allowing complex data filtering and parsing.
19. How do we use Postgres/PostGIS to best effect
• Ensure data schemas are determined by functionality
– Do NOT accept defaults from loaders
– Use INTs for primary selection attributes
• Tailor data processing to task
– For mapping do NOT include non-mapped features or attributes
• Indexes are your friend
– Ensure all search attributes are indexed
• Clustered indexes are your best pal
– Critical for our mapping schemas
• Bad or unnecessary indexes are your worst enemy
– Can cause sever slowdown resulting in a bad user experience
– Make use of EXPLAIN
20. • Hide internal complexity behind database views – makes
applications more portable
• Use schemas to roll out data updates (just set search path to
look in new default schema), makes rolling back to previous
data version easy.
• Take advantage of stored procedures. If SQL is hidden in
application code then it might be impossible to roll out changes
instantly because of the need to re-compile, re-deploy the
application, downtime might be required By storing SQL
within procedures any changes become immediate and more
seamless.
• Use built in data replication per instance – feel more protected
from bad luck!
21. What we like about Postgres/PostGIS
• Reliable
..and the elephants ...
• Performant
• Scalable
• Easier replication
• Standards compliant
• Comes with good tools
• Superb 3rd party support
22. The future: What are we planning?
• Migrating to Postgres 9.1
– Currently we have a mix of 8.3 and 8.4 installs
– Take advantage of new functionality and bug fixes
• Exploring the new functionality in PostGIS 2.0 to enhance
existing services and possible new ones
– Raster capabilities
– Topology
– Generalisation with
Highly generalised Census
topological consistency 2001 OAs in Nottingham.
all input features are
constraints present post generalisation
with no overlaps or new
slivers introduced.
23. Conclusion
• Postgres and PostGIS has been used to power EDINA geo-
services for over 8 years
• During late 2011 the last major service was migrated.
• All geo-services (and some non-geo ones!) at EDINA rely on
Postgres/PostGIS as either the sole or principal database
• It will continue to form the core of our services for the
foreseeable future.
• The elephant is our friend, it certainly could be yours!