VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
New analytical methods for geocomputation - Guy Lansley, UCL
1. Guy Lansley
Department of Geography, UCL
g.lansley@ucl.ac.uk
@GuyLansley
The Demographics User Group
Annual Away Day
1st December 2016
New Analytical Methods for
Geocomputation
3. Big Data and
Software
New software has had to
adapt to the growing size
and complexity of data
Kira Kowalska
4. Big Data shifts
• The concept of Big Data is changing and
becoming more challenging
• Emphasis on place rather than space
• Challenges to representing the dynamics of
real world phenomena
7. R and Rstudio
• Command line interface.
• Object oriented.
– You create things with names
using the “<-” symbol.
• Ten <- 5*2
• Two <- Ten/5
• Write a script of functions.
• The standard installation has
relatively few functions but more
have been made available via
open source downloadable
packages
R Scripts Workspace
Console
Multi-tab
(includes plots)
• Can also be run through
Rstudio which provides a
more user friendly GUI
8. Why should we conduct analysis in code?
• Accessibility
• Unrestrictive
• Automation and Consistency
• Skills development
10. Using R as a GIS
• Free online training resources coming soon to the
CDRC website
• www.cdrc.ac.uk/training-capacity-building/online-courses
Slides on slideshare
11. Fundamentals
• Data scientists still need to understand basic
fundamentals
• i.e. Circular statistics
– Commonly overlooked
13. 2011 Open Atlas Project
• A manual map might
typically take 5 minutes
to create - thus:
– 5 minutes X 134,567
maps = 672,835 minutes
– Or 467.2 days (no
breaks!)
www.alex-singleton.com
• Produced by Prof. Alex Singleton (CDRC, University of
Liverpool)
• R was used to automate the production of 134,567 into a
collection of PDF atlases
• This included downloading and formatting the data from the
ONS websites
14. 2011 Open Atlas Project
• Code available here:
rpubs.com/alexsingleton/openatlas
• E.g. Step 1: Download the data
E.g. archive =
http://www.nomisweb.co.uk/output/census/2011/ks101ew_2011_oa.zip
15. Algorithms
Alyson Lloyd
• Use a pipeline of
methods and
decisions to analyse
data
• i.e. data cleaning
Cleaning the registered
locations of customers
based on their store
visits
23. Topic Modelling
Blei et al. (2003) Latent Dirichlet Allocation (LDA):
In this example, I have applied an
LDA to 1.3 million geotagged
Tweets from Inner London
transmitted in 2013
24. 20 Twitter Groups
1 Photography and Sights
2 Optimism, Kindness and Positivity
3 Leisure and Attractions
4 TV and Film
5 Humour and Informal Conversations
6 Transport and Travel
7 Politics, Beliefs and Current Affairs
8 Sport and Games
9 Anticipation and Socialising
10 Business, Information and Networking
11 Pessimism and Negativity
12 Music and Musicians
13 Routine Activities
14 Food and Drink
15 Body, Appearances and Clothes
16 Social Media and Apps
17 Slang and Profanities
18 Place and Check-Ins
19 Wishes and Gratitude
20 Foreign and Other
Identifying Patterns
25. Time Distribution Hour
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Photography and Sights
Optimism, Kindness and Positivity
Leisure and Attractions
TV and Film
Humour and Informal Conversations
Transport and Travel
Politics, Beliefs and Current Affairs
Sport and Games
Anticipation and Socialising
Business, Information and Networking
Pessimism and Negativity
Music and Musicians
Routine Activities
Food and Drink
Body, Appearances and Clothes
Social Media and Apps
Slang and Profanities
Place and Check-Ins
Wishes and Gratitude
Foreign and Other
All Tweets
-1.5 -1 -0.5 0 0.5 1 1.5
Identifying Patterns
26. Identifying patterns
This map shows the density of Tweets from
the Education subtopic, relative to the
density of all Tweets in London.
UCL
University of
Westminster
Imperial College
London
London South
Bank
Kings College
Queen Mary
London Metropolitan
University
University of
Greenwich
City
Goldsmiths
SOAS
Birkbeck
LSE
UAL
University of
Roehampton
University of East
London
Identifying Patterns
36. A Basic Shiny Map
ui.R server.R
Population density (2011 Census)
37. On CDRC Maps
• Geodemographics
– OAC, COWZ, IUC
• Retail
– Value, Sector, Change
• Metrics
– IMD, IMD Components
– Population Density
– Population Change
• Top Metric Maps
– Dwelling Ages
– Country of Birth
– Occupation
– Mode of Commute
CDRC Maps
Oliver O’Brien
38. The Demographic Toolkit
• Analytical web mapping system (Web GIS)
– Self-hosted raster and vector map tiles
– Open source packages (OpenStreetMap, Mapnik &
Leaflet)
• Create and analyse spatial and temporal profiles
– Standard and bespoke functional regions
– MAUP (Modifiable areal unit problem) in public policy
• Aims to be available in mid-2017
Tian Lan
40. Summary
• We have to become more comfortable with coding in
order to unlock the full potential of machines
• We are exploring new techniques to unlock new insights
from Big Data
• We are also harnessing data in novel ways to gain
insights about the population and their dynamics
• However, converting big data into wisdom is still
challenging and new techniques still need to be made
more accessible
41. Guy Lansley
Department of Geography, UCL
g.lansley@ucl.ac.uk
@GuyLansley
Acknowledgements
Tian Lan
Wen Li
Alyson Lloyd
Oliver O’Brien
Seth Spielman
www.cdrc.ac.uk
Notas del editor
Old years = ibm. WE ARE NOW DATA RICH
Palce = social media, networks
Time = interactivity
Issues of there being no insurance
embedded
Python and R
Open source,
New methods new understanding
Need for training
Look at trip distributions per small area to store locations
Categorise into primary , secondary, tertiary destinations
Look at frequency customers perform irregular journeys
FOCUS ON UNSUPERVISED - SCIENCE
Artificial networks – SOM – creates 2d rep of input space (competitive learning)
sentiment
Count words
Data matching
Leisure
TV & Film
Transport
Leisure
Food and drink
Transactions
Registrations
Social media posts
Registers, 55m people, no link
matching
Create file of movers
File of leavers
Look for identical combinations of names
3m move, 800,000 singular combinations
What to do about duplicates? Distance? OAC?
World is dynamic –
So outputs should be interactive