Existe una multitud de sectores donde es necesario disponer de datos que permitan entender los patrones de comportamiento de la población: la planificación y la operación de los sistemas de transporte requiere información precisa, fiable y actualizada sobre la demanda de viajes; los patrones de actividad y movilidad de los turistas tienen profundas implicaciones para la planificación de infraestructuras, el desarrollo de la oferta turística y las estrategias de marketing turístico; entender el comportamiento espacial de los clientes es clave para optimizar las estrategias de distribución, comercialización y publicidad, determinar la localización de un nuevo comercio o punto de venta, o maximizar el retorno de la inversión en acciones de marketing. Las fuentes de datos tradicionales, basadas fundamentalmente en encuestas y registros administrativos, proporcionan información muy valiosa, pero no están exentas de inconvenientes. En general, las encuestas resultan caras y lentas de realizar, lo que limita el tamaño de la muestra y la frecuencia de actualización de la información, a lo que hay que añadir otras limitaciones intrínsecas, como las respuestas incorrectas e imprecisas, o la dependencia de la disposición a responder de los entrevistados. En los últimos años, la generalización del uso de dispositivos móviles ha abierto nuevas oportunidades para superar muchas de estas limitaciones. La posibilidad de recoger datos geolocalizados sobre la actividad de las personas, de manera dinámica y a un coste sensiblemente inferior al de los métodos tradicionales, abre la puerta a infinidad de aplicaciones. Las más evidentes son quizá las relacionadas con el transporte y la movilidad, pero el abanico es mucho más amplio, abarcando casi cualquier área que requiera información sobre los patrones de actividad y movilidad de la población. Las nuevas fuentes de datos plantean asimismo importantes retos, desde la necesidad de desarrollar nuevas metodologías de análisis, hasta la protección de la privacidad.
Vídeo de la ponencia: https://youtu.be/5PKC5Qm0eHM
CNIC Information System with Pakdata Cf In Pakistan
La telefonía móvil como fuente de información para el estudio de la movilidad humana
1. Big Data for the Study of Human Mobility
The Example of Mobile Phone Records
Ricardo Herranz | Director
Kineo Mobility Analytics
2. • Technology company based in Madrid and London
• Expertise: Big Data Analytics, Modelling and Simulation
• Provide: population activity and mobility information based on geolocated data
• Sectors:
– Transportation
– Tourism
– Geomarketing
– Environment
What we do
3. • Senior transport consultant
• 35 years of experience as a consultant, transport planner and researcher
with a distinguished academic career
• Internationally recognised authority in transport and traffic modelling
• Delivered projects in some 30 countries, including several projects in India
• Technology company based in Madrid
• Data management, analysis and visualisation
• Systems modelling and simulation
• Decision support tools: smart cities, transport and logistics, energy and
environment…
A joint company by Nommon and LW
4. Activity-mobility statistics
Traditional data collection methods (e.g., household travel surveys) are
expensive and time-consuming, so many institutions cannot access reliable
and frequently updated activity-mobility information
Different types of institutions collect and analyse activity-mobility data in
order to produce studies that are then made available to third parties or
publicly available: national/regional/local statistical offices, research
services, academic institutions, etc.
Modern ICTs allow the automatic collection of vast amounts of spatio-
temporal data and open new opportunities for complementing and/or
replacing traditional statistics and data collection methods
5. Information needs
Distribution of population Distribution of workplaces
Tourism flows
Urban mobility
Impact of eventsInterurban mobility
6. Data collection: traditional approach
• Surveys:
– Official statistics
– Ad-hoc surveys (e.g., mobility surveys)
• Administrative registers
– Trend towards replacing surveys by administrative registers, where possible
7. Problems
• Information not fully adapted to the problem
• Outdated information
• High cost
• Time consuming
• Small samples
• Quality of information
9. We mine and analyse anonymised geolocation data from mobile devices and
blend it with other data provided by our customers or available from public
databases to provide continuously updated and rich activity-travel
information in a fraction of the time and cost required by traditional
methods
What we do
10. What we do
Kineo Analytics Solution
Anonymised
mobile phone
records
Sociodemographic
statistics
Land use data Other demand data
Pre-processing and
cleansing
Sample
construction
Sample expansion
Input data
Generation of
activity-travel
diaries
Generation of
output indicators
Origin-Destination Matrices
Outputs
Dynamic Population Distributions
Transport supply
11. Dynamic Population Mapping and Activity Indicators Origin-Destination Matrices and
Mobility Statistics
Where are the people?
What are they doing?
How do they move?
Why?
12. Origin-Destination Matrices and Mobility
Statistics
Dynamic Population Mapping and Activity
Statistics
Applications
Transportation
Tourism
Urban planning
Events
Emergency & disaster management
13. Mobile phone data: what do we mean exactly?
• Typically anonymised call detail records (CDRs) or data from network probes
– Spatio-temporal data: time and user position every time an event occurs
– Position typically at cell level (but some MNOs are beginning to store triangulated positions)
• MNO’s clients + roamers that connect to its network
• Sociodemographic data for clients (age and gender)
9:12 a.m. 6:01 p.m. 9:38 p.m.
t
14. So how does it work?
What we do (more or less):
– Data cleansing
– Sample construction
– Identification of users’ home
– Identification of other activities (work, study…)
– Generation of activity-travel diaries
– Expansion to the total population
– Calculation of indicators
15. Why mobile phone records?
Great potential for the analysis of human activity and mobility:
– Passive data collection eliminates many of the intrinsic limitations of surveys
(e.g., incorrect/imprecise answers)
– Opportunistic data collection – no need for specific infrastructure
– Reasonably good temporal and spatial granularity
– Large samples
– Longitudinal data
– Opportunities for “natural experiments”
– Drastic cost and time reduction with respect to traditional methods
16. But…
These data also come with a number of challenges:
– Noise and errors
– Biases
– Less explanatory power
– Computational challenges
Mobile phone data do not provide the full picture: need for data fusion
Algorithms matter: devil’s in the details
Rigorous validation is essential
18. OD Matrices for Barcelona Metropolitan Area
Hourly trip distribution
• Calculation of origin-destination matrices in the Barcelona Metropolitan Area and
comparison with the Mobility Survey on Working Days (EMEF)
19. Trip Attraction Areas in Madrid
• Trip matrices for several attraction areas in the Region of Madrid
20. Analysis of Commuting Patterns in Spain
• Commuting Patterns in Spain: pilot study for next Spanish Census
21. Analysis of Workplaces Distribution in Barcelona
• Home-Work matrices for the Metropolitan Area of Barcelona
22. OD Matrices for Santiago de Chile
• Comparison Mobile Phone Records vs Household Travel Survey (EOD)
• Sample size: x20
• No of trips per person
– Average: 2,8 trips/person vs 2,1 trips/person in EOD
– % of people that do not travel: 17-18% in both cases
28. OD matrices for Málaga
• OD matrices by mode (private vehicles vs public transport) obtained through the fusion of
mobile phone records with data from Málaga public transport smart card
• Data used for the calibration of the Málaga transport model
29. Characterisation of the Visitors to a National Park
• Analysis and profiling of the visitors of a National Park, with gender, age, recurrence,
length of the visit, origin, etc.
0
1000
2000
3000
4000
20-feb 05-mar 08-mar 26-mar 27-mar 29-mar 29-abr 28-may 17-jun 18-jun
La Pedriza
Est. Baja
Est. Alta
Telefonía
0
500
1000
1500
2000
20-feb 05-mar 08-mar 26-mar 27-mar 29-mar 29-abr 28-may 17-jun 18-jun
Peñalara Est. Baja
Est. Alta
Telefonía
30. Population Exposure to Air Pollution
• Exposure of population to NOx emissions in Madrid based on
dynamic population mapping
32. Big data streams generated by smart personal devices offer a great
potential for analysing human activity-mobility patterns and
producing sociodemographic statistics
The promise of Big Data
33. Yet, some precautions…
Data is not (always) a substitute for theory
“Models without data are just stories - data without models are just
numbers”
34. Yet, some precautions…
Neither data nor algorithms are “neutral”: lack of understanding of
underlying principles and limitations may lead to a false sense of
objectivity and democratisation of information
Critical interpretation of the results in terms of the initial problem is
essential, and we (researchers, consultants…) must do so responsibly
35. Big Data is usually noisy, unstructured and biased (“all data are biased, but
some are useful”), so producing actionable information requires a lot of
effort for data cleaning, data fusion, and algorithm validation…
…but this is the hard part, not really
cool, so many people prefer to focus
on fancy (and often useless)
visualisation
Yet, some precautions…
36. Yet, some precautions…
Big Data also comes with a number of
regulatory, organisational and business challenges
(We won’t discuss this today, but it should not be forgotten)
37. We have to live with
• Those overselling the benefits of Big Data as the cure for all diseases: more common
in industry, but also happens in public administration and academia
• But we can and must be critical and claims to be rigorously supported
• Those unable to see the added value: equally common in industry, in public
administration and in academia, but probably more reproachable in the latter case
• Nothing much to do here - probably the same people who 20 years ago didn’t see the added value
of Internet over teletext
38. “A new scientific truth does not triumph by convincing its
opponents and making them see the light, but rather
because its opponents eventually die, and a new
generation grows up that is familiar with it”
Max Planck
40. Big Data has already yielded important practical advances in fields
like transportation planning, but so far this is mostly about
better/cheaper ways of measuring the same indicators and use them
to feed the same models
At present
41. For sure, Big Data will produce new indicators that
couldn’t be measured before, and will help us make the
most of state-of-the-art theoretical frameworks, but…
In the future
.. just as the telescope revolutionised astronomy or the
microscope revolutionised biology,
can Big Data inspire the development of new theories and
revolutionise the social sciences?