Some initial considerations and discussion points around geospatial big data. Location adds context and relevance. Need to consider a number of V factors including Value.
2. Geospatial Intelligence Middle East 2013
Recently the Military GIS and Intelligence communities have
gained a better understanding of the incredible increase of “Cloud”
empowered applications, the challenges and opportunities of Big
Data, the importance of social media, the availability of improved
applications, and the dramatic improvement in quality and
availability of remote sensing data. This, and the increased speed
of GIS applications and the integration of a full-motion video
analysis product, empowers military forces and national security
agencies to exploit and analyze full motion video from UAVs and
other airborne vehicles.
http://tinyurl.com/cd8z6y5
3. http://www.computerweekly.com/feature/OrdnanceSurvey-gets-to-grips-with-geospatial-big-data
“ Ordnance Survey has all but completed a five-year IT
improvement programme to enhance its operations. That
programme – with Oracle as the main IT partner – has already
transformed those operations into an enterprise grid computing
system that pulls 17 databases into one Oracle spatial
database management platform. The platform supports all
geospatial data types and models. The system combines open
source Linux with Oracle’s grid computing architecture, which
makes it possible to coordinate large numbers of low-cost servers
and corresponding storage so they operate like one large
computer. ”
4. Big Data challenges
•
Big thinking (value)
•
Big strategy (necessary)
•
Big governance (stewardship)
•
Big access (sharing)
•
Big cooperation (supply chain)
•
Big privacy (security)
•
Big quality (QA/QC)
•
Big people (skills training)
6. http://hortonworks.com/blog/big-data-defined/
April 4th, 2013 Russell Jurney
• Wikipedia defines as problems posed by the awkwardness of
legacy tools in supporting massive datasets: what is a massive
dataset? Megabytes Yottabytes.
• Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications.
• There is a ‘Big Data’ opportunity: transformative economics.
Big Data is the opportunity space created by new open source,
distributed systems from the consumer internet space.
7. The big data environment
• Volume
• Data at rest; levels increasing
• Velocity
• Data in motion; speed at which it
transits enterprises and entire
industries is faster than ever
• Variety
• Data in many forms; hundreds of
millions of web pages, emails and
unstructured data, such as Word documents and
PDFs, as well as a nearly infinite number of events
and information from every enterprise data centres
• Value
• Do you need it?
9. The big data environment
• It uses local storage to be fast but inexpensive
• It uses clusters of commodity hardware to be inexpensive
• It uses free software to be inexpensive
• It is open source to build from community learning
• Cheap storage means logging enormous volumes of data to
many disks is easy. Processing this data is less so. Distributed
systems which have the above four properties are disruptive
because they are approximately 100 times cheaper than
other systems for processing large volumes of data, and
because they deliver high I/O performance.
10. The big data environment
• Apache Hadoop is one such system. Hadoop ties together a
cluster of commodity machines with local storage using free and
open source software to store and process vast amounts of data
at a fraction of the cost of other systems [Example: Esri/spatialframework-for-hadoop, GitHub: social network for programmers]
• SAN Storage $2-10/GB Local Storage $0.05/GB
11. The big data environment
• Capture every shred of data in the cheapest place possible
• Provide access to this data across the organization
• Mine the data for value
• “To undergo the transformative processes that unabridged
access to data provides, enabling bigger, better, faster more
profound insight than ever before”. Blogger
12. Most data isn’t big and businesses are wasting
money pretending it is:
www.qz.com/81661/most-data
• How many of us need to undertake operations that rank every
web page that exists?
• What processing tasks cannot be handled on a single computer
or even a laptop? [Megabyte to Gigabyte range]
• Weren’t you doing data analysis before data became big?
• Do you have the requirement or capability to check
correlations or patterns that you can act on if you have
even more data?
• False positives. Vincent Granville wrote ‘The curse of big data’,
even if a dataset includes 1000 items there are many millions of
correlations, a few will be extremely high just by chance.
• Getting more into the field of data science (stats, quality, etc.)
13. Mapping the global Twitter heartbeat:
the geography of Twitter
http://firstmonday.org/ojs/index.php/fm/article/vi
ew/4366/3654
• In 2012, supercomputing manufacturer Silicon Graphics
International (SGI), the University of Illinois and social media
data vendor GNIP collaborated to create the “Global Twitter
Heartbeat” project (http://www.sgi.com/go/twitter) in order to
map global emotion expressed on Twitter in real-time.
• GNIP provided access to the Twitter Decahose, which consists
of 10 percent of all tweets sent globally each day.
• SGI provided access to one of its new UV2000 supercomputers
with 256 processors and 4TB of RAM running the Linux
operating system.
14. Twitter
From 12:01AM 23 October
2012 through to11:59PM 30
November 2012
Twitter Decahose from GNIP
streamed 1,535,929,521
tweets from 71,273,997
unique users, averaging 38
million tweets from 13.7
million users each day.
Use the location of social
media posts for emergency
warning, real-time local
situation reporting, etc.
15. Big data perspective on mapping the
geography of Twitter
• iPhones and Blackberries yield an additional 1% of all tweets
being georeferenced
• However, they’ve been missed by previous studies because
• They store their geographic information in the textual
Location field rather than the machine-readable Geo
metadata field
• In the big data era we need to look at the data itself, not just
assume it follows the manual.
Kalev Leetaru, University of Illinois on CrisisMappers
http://www.CrisisMappers.net
17. Analytics plus geospatial data is changing the
way we get insights (hidden patterns)
• Geospatial analytics gives you the ability to ask “where”
questions of business data
Where did it
happen?
Where will it
happen?
Where is it
happening?
Source: Teradata
18. Analytics plus geospatial data is changing the
way we get insights
•
•
•
•
Where are my customers?
Where are my competitors?
How far will customers travel to a branch or store?
Which of my competitor’s customers can I draw to a
branch or store?
• Which customers live close to a branch or store?
• Where can I increase profitability?
• How can I mitigate financial risk from flooding?
19. Is there a ‘problem with crowdsourcing
intelligence’?
DefenceIQ, May 2013 Thomas Chappelow
http://www.defenceiq.com/defence-technology/articles/the-problemwith-crowdsourcing-intelligence-in-syr/
• blogging, tweeting, mapping and photographing every single
detail…creating an unprecedented mountain of information that
can be farmed for actionable intelligence
• lack of traditional sources to rely on, the global intelligence
community has to look elsewhere for information…
crowdsourcing appears a juicy prospect – until it goes wrong
• Provenance, verification and trust
• Just as important for HUMINT as GEOINT
22. Ordnance Survey today
• Ordnance Survey is 222 years old
• Civilian organisation since 1983; 1100 staff
• Independent Government Department and Executive Agency
reporting directly to a Government Minister
• Trading Fund since April 1999
• Annual Report for 2011/12: Revenue of £141.8m, profit before
exceptional items of £31.9m, dividend £17.2m
• Southampton headquarters with 26 field offices in Great Britain
23. The size of the task
Topographic Layer
(approximate volumes)
1:1250 Scale = 17 000 km2
1:2500 Scale = 158 000 km2
1:10 000 Scale = 66 000 km2
Over one million units of change per year.
Address Layer
27.5 million geocoded postal addresses,
with 500 000 changes per year.
Transport Network Layer
5.37 million kms of roads, 3.97million links,
885 881 route instructions –
over 20 000 changes per month.
27. A database to connect via real world information
• Every object represented in OS MasterMap has a unique
Reference identifier called a TOID. These TOIDs can be used to
connect other information and are linked to other core references
30. Using IBM Netezza for high performance
geospatial analytics
Stress
Data Queries
Testing our
Data
Storytelling with
New Insights Location Data
31. Netezza and geospatial analytics
•
•
•
•
•
•
In-database geospatial analytic functions
Native understanding of geospatial data
High performance out of the box
Scales to terabytes of data
No indexes or aggregates to manage
Open, standards-based interface and data model
Analyse all data in a single appliance
41. Big Data – Linked Data
•
As Ordnance Survey approaches the end of the transformation of its
operations, it is preparing its data to exploit the myriad
interconnections that can exist between physical entities in what has
been described as the “Internet of Things”. This web of
interconnections between disparate objects and ideas is made
possible through linked data technology.
•
Linked data assigns a unique tag – a three-fact, uniform resource
identifier known as a triple – to each thing of interest. For example,
population data can be linked to socio-economic statistics for a
given town.
•
Linked Data Web, currently estimated to include more than 30 billion
triples, with some 20% of those having geographic content.
43. Hyperlocal example
‘Find me all GPs in my ward, bus stops within a 500 metre
radius of those GPs, but exclude bus stops in areas of high
crime’.
Environment
Transport
Health
Council
Crime
Education
Weather
Business
44. Big Data challenges
•
Big thinking (value)
•
Big strategy (necessary)
•
Big governance (stewardship)
•
Big access (sharing)
•
Big cooperation (supply chain)
•
Big privacy (security)
•
Big quality (QA/QC)
•
Big people (skills training)
45. Ordnance Survey International: advisory services
•
Strategic review and assessment
•
Capacity and capability building
•
Knowledge transfer and training
•
Value of geographic information
•
Technology direction – 3D, quality,
open standards and much more
•
National authoritative mapping
•
National address infrastructure
•
National geodetic infrastructure
•
National spatial data infrastructure
46. Ordnance Survey International
Thank you for your attention. For further information contact:
Steven Ramage, Head of Ordnance Survey International
steven.ramage@ordnancesurvey.co.uk