SlideShare a Scribd company logo
1 of 41
Download to read offline
Towards the Use of Big Data
for European Statistics
Peter Struijs
Statistics Netherlands
0
1
Scheveningen Memorandum on Big Data
 Examine the potential of Big Data sources for official
statistics
 Official Statistics Big Data strategy as part of wider
government strategy
 Address privacy and data protection
 Collaboration at European and global level
 Address need for skills
 Partnerships between different stakeholders (government,
academics, private sector)
 Developments in Methodology, quality assessment and IT
 Adopt action plan and roadmap for the European Statistical
System
2
Envisaged Benefits of Using Big Data for Official Statistics
Faster data production
Higher detail, e.g. geographically, frequency
More data
More flexible response to user needs
Increased efficiency
Stay relevant
3
The ESSnet Big Data
Framework Partnership Agreement: 22 partners
Two Specific Grant Agreements:
SGA-1: February 2016 – July 2017 1.0 M€
SGA-2: January 2017 – May 2018 1.0 M€
4
Partners of the ESSnet
4
5
ESSnet Big Data: Pilots
List of pilot projects
 Web scraping (2 work packages)
 job vacancies ; enterprise characteristics
 Smart meters
 electricity consumption ; temporary vacant dwellings
 Automatic Identification System (AIS)
 vessel identification data
 Mobile phone data
 preparing for access to data
 Early estimates
 various domains
 Multiple domains
 population, tourism / border crossing, agriculture
6
Subdivision of Pilots into Phases
1. Data access
Conditions; partnerships
2. Data handling
Production criteria; micro versus aggregated data;
visualisation
3. Methodology and technology
Methodology for long lasting statistics; process design
4. Statistical output
Examples of existing and new outputs; potential users;
comparison with current estimates (quality,
timeliness, level of detail)
5. Future perspectives
Applicability in ESS; future production process;
exploration of further possibilities of using and
combining (big) data sources
7
WP 1: Webscraping / Job Vacancies
WP leader: UK
Partners: Belgium, Denmark, France, Germany,
Greece, Italy, Portugal, Sweden, Slovenia
• Data access: job portals
• Data handling: legal and technical aspects, test webscraping
• Methodology for output production: from semi-structured
to structured data
• Future perspectives: webscraping enterprise websites,
methodology for future production, explore new products
8
Online Job Vacancy Data Landscape
9
Model for Measuring Job Vacancies
Target Population: All job vacancies
Advertised on enterprise website
Advertised on a job portal
‘Ghost’
Vacancies
Employing business
is identifiable
Advertised through an agency
10
Approach to Data Integration
Counts from online
sources
Enterprise A
Enterprise B
Enterprise C
Enterprise D
Enterprise E
Survey Estimates
Enterprise A
Enterprise B
Enterprise C
Enterprise F
Enterprise G
Scaling Factors
(by NACE?)
Matching
Integrated data set
Enterprise A
Enterprise B
Enterprise C
Enterprise D
Enterprise E
Enterprise F
Enterprise G
Enterprise H
Enterprise I
Enterprise J
Business Register
Enterprise A
Enterprise B
Enterprise C
Enterprise D
Enterprise E
Enterprise F
Enterprise G
Enterprise H
Enterprise I
Enterprise J
1. Scale online
data to survey
estimates
2. Apply scaling
factors to on-line
data
3. Use survey
estimates
4. Modelled
estimates
1. Survey and
Online
2. Online only
3. Survey only
4. Neither survey
or online
Total = Survey Estimate
11
WP 2: Webscraping / Enterprise Characteristics
WP leader: Italy
Partners: Bulgaria, Netherlands, Poland, Sweden, UK
• Data access: inventory of target enterprises, URLs; legal
and privacy aspects
• Data handling: use cases; actual webscraping
• Testing of methods and techniques: proof of concept for
selected use cases; build and apply predictor for estimates
of enterprise characteristics
12
Logical Reference Architecture
12
13
Towards a List of URLs
14
WP 3: Smart Meters
WP leader: Estonia
Partners: Austria, Denmark, Italy, Portugal, Sweden
• Data access: availability of smart meters, legal aspects
• Data handling: coverage assessment, production of
cleaned datasets
• Methodology and techniques: linkage with administrative
data; methodology for electricity consumption businesses
and households; also seasonally vacant living spaces
• Future perspectives: potential new products, feasibility of
using aggregated data
15
Grid Structure in Sweden
16
Estonian Data Structure
Estonian data structure: 4 main
tables
Metering data – main table
with hourly consumptions
Metering points – location
Agreements – contract info
Customers – contract holder
information
17
Linking Tables to Establish Aggregates
18
WP 4: AIS Data
WP leader: Netherlands
Partners: Denmark, Greece, Norway, Poland
• Data access: data availability (in particular EMSA)
• Data handling: processing and storage, aimed at
linking with data from port authorities, traffic
analyses, journeys
• Methodology and techniques: for linking with data
from port authorities and traffic analyses; estimate
emissions
• Future perspectives: qualitative cost-benefit analysis
19
Data Received
https://maartenpouwels.carto.com/viz/8d319f16-8195-11e6-af04-0ecd1babdde5/public_map
20
Error Types and Causes
21
Possible Tools
22
WP 5: Mobile Phone Data
WP leader: Spain
Partners: Belgium, Finland, France, Germany, Italy,
Netherlands, Romania, UK
• Data access: data availability (workshop with MNOs)
• Data handling: investigation of IT tools and aggregation
level needed
• Statistical outputs: describe a statistical output to be
presented to MNO to carry out a pilot
23
Processing Steps of Mobile Phone Data
24
Daytime Population Based on Mobile Phone Data
25
WP 6: Early Estimates
WP leader: Slovenia
Partners: Finland, Italy, Netherlands, Poland, Portugal
• Data access: sources for consumer confidence index,
nowcasts of turnover and early estimates
• Data handling: technical requirements; deployment of
collection system
• Methodology and techniques: includes feability of linking
administrative and other existing sources
• Future perspectives: calculation of the consumer
confidence index and nowcasts of turnover; pilots for
combining sources for early estimates
26
Domains of First Interest
• Tourism
• Population mobility
• Health statistics
• Agriculture
• Quick and dirty statistics (all domains)
• Economic indicators:
• GDP
• Consumer Price Index (CPI)
• Retail sales
• Balance of Payments (BoP)
• Economic sentiment indicators
27
Early Estimate of GDP vs Official Release
28
WP 7: Multi Domains
WP leader: Poland
Partners: Netherlands, Portugal, UK
• Data access: data availability (inventory, based on
questionnaire), aimed at three domains (populations,
tourism / border crossings, agriculture)
• Data feasibility: exploration of combining sources for
these domains
• Data combination: experiments
• Future perspectives: suggest pilots for 2018
29
Combining Sources
may enrich statistical output in domains:
Big Data
sources
Administrative
data
Statistical data
30
Model for Daily Life Satisfaction
Twitter data
Tweepy
Sklearn
Training Dataset
Machine Learning
algorithm
Data extracting
Predictive model
Labels
Feature vectors
Result set
31
WP 8: Methodology, Quality and IT
WP leader: Netherlands
Partners: Austria, Bulgaria, Italy, Poland,
Portugal, Slovenia
• Literature overview
• Quality of Big Data
• Big Data and IT
• Big Data methodology
32
Main Aspects Identified
Quality IT Methodology
coverage metadata management assessing accuracy
comparability processing life cycle final product definition
processing errors format of processing spatial dimension
chain control datahub changes in data sources
linkability data source integration machine learning
measurement errors infrastructure data linkage
model errors; precision secure and tested APIs multi-party computation
shared libraries; standards inference
data lakes sampling
training, skills and knowledge data process architecture
speed of algorithms unit identification
33
Acknowledgements
Anke Consten
Piet Daas
Marc Debusschere
Maiki Ilves
Boro Nikic
Anna Nowicka
David Salgado
Monica Scannapieco
Nigel Swier
34
Where are we going from here?
35
Plans (1)
Implementation
 Online job vacancies
 Enterprise characteristics
 Electricity and energy consumption
 Waterways and environmental statistics
35
36
Plans (2)
New pilot projects
 Financial transactions data
 Remote sensing
 Mobile network operator data
 Innovative sources and methods for tourism statistics
36
37
Plans (3)
Trusted Smart Statistics
 Citizen science data, smart cities, connected vehicles, etc
38
Interest in the New ESSnet
38
39
Conclusions
 Approach very successful
 Increased ambitions for coming years
 Implement results obtained so far
 Start with trusted smart statistics
 Challenges
 Data access, privacy, methods, implementation, etc.
 The ESS dimension
 Support and commitment
 High interest in participation
 Commitment at all levels
 Recognition of relevance
40
Questions?
https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata
Thank you for your attention!
p.struijs@cbs.nl

More Related Content

What's hot

What's hot (20)

New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.
 
Digital platforms in Audiovisual Media
Digital platforms in Audiovisual MediaDigital platforms in Audiovisual Media
Digital platforms in Audiovisual Media
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
Big Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleBig Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson Schedule
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Big Data and Nowcasting
Big Data and NowcastingBig Data and Nowcasting
Big Data and Nowcasting
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
New Data for Innovation Policy
New Data for Innovation PolicyNew Data for Innovation Policy
New Data for Innovation Policy
 
Best Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open DataBest Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open Data
 
Jan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech RepublicJan Romportl, Chief Data Scientist at O2 Czech Republic
Jan Romportl, Chief Data Scientist at O2 Czech Republic
 
Arloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policyArloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policy
 
Big data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingBig data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modelling
 
European Data Spaces
European Data SpacesEuropean Data Spaces
European Data Spaces
 
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
OGD2011 Requirements Analysis of an Open Government Data Strategy
OGD2011 Requirements Analysis of an Open Government Data StrategyOGD2011 Requirements Analysis of an Open Government Data Strategy
OGD2011 Requirements Analysis of an Open Government Data Strategy
 
M. Scannapieco, Dai Big Data alle Smart Statistiche
M. Scannapieco, Dai Big Data alle Smart StatisticheM. Scannapieco, Dai Big Data alle Smart Statistiche
M. Scannapieco, Dai Big Data alle Smart Statistiche
 
[e-Government Program Action Plan : Warsaw, Poland]
[e-Government Program Action Plan : Warsaw, Poland][e-Government Program Action Plan : Warsaw, Poland]
[e-Government Program Action Plan : Warsaw, Poland]
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 

Similar to P. Struijs, Toward the Use of Big Data for European Statistics

Opening up government data
Opening up government dataOpening up government data
Opening up government data
Pia Waugh
 
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
I4MS_eu
 

Similar to P. Struijs, Toward the Use of Big Data for European Statistics (20)

E. Baldacci, Enabling Data-Driven Services
E. Baldacci,  Enabling Data-Driven ServicesE. Baldacci,  Enabling Data-Driven Services
E. Baldacci, Enabling Data-Driven Services
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilities
 
Using Big Data for Product & Service Innovation
Using Big Data for Product & Service InnovationUsing Big Data for Product & Service Innovation
Using Big Data for Product & Service Innovation
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from space
 
EW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and SolutionsEW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and Solutions
 
Building blocks for fair digital society
Building blocks for fair digital societyBuilding blocks for fair digital society
Building blocks for fair digital society
 
Open data presentation 2013 v0 5
Open data presentation 2013 v0 5Open data presentation 2013 v0 5
Open data presentation 2013 v0 5
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
Opening up government data
Opening up government dataOpening up government data
Opening up government data
 
Brokerage and market Platform
Brokerage and market PlatformBrokerage and market Platform
Brokerage and market Platform
 
"Towards Value-Centric Big Data" e-SIDES Workshop - "Responsible Research: An...
"Towards Value-Centric Big Data" e-SIDES Workshop - "Responsible Research: An..."Towards Value-Centric Big Data" e-SIDES Workshop - "Responsible Research: An...
"Towards Value-Centric Big Data" e-SIDES Workshop - "Responsible Research: An...
 
Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...Using administrative data to measure public procurement of R&D: Opportunities...
Using administrative data to measure public procurement of R&D: Opportunities...
 
Presentation of Gender Pulse
Presentation of Gender PulsePresentation of Gender Pulse
Presentation of Gender Pulse
 
Open Government Data in Austria - Organisation, Procedures and Uptake
Open Government Data in Austria - Organisation, Procedures and UptakeOpen Government Data in Austria - Organisation, Procedures and Uptake
Open Government Data in Austria - Organisation, Procedures and Uptake
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Team Europe Initiative Data Governance in Africa
Team Europe Initiative Data Governance in AfricaTeam Europe Initiative Data Governance in Africa
Team Europe Initiative Data Governance in Africa
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
 

More from Istituto nazionale di statistica

More from Istituto nazionale di statistica (20)

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

P. Struijs, Toward the Use of Big Data for European Statistics

  • 1. Towards the Use of Big Data for European Statistics Peter Struijs Statistics Netherlands 0
  • 2. 1 Scheveningen Memorandum on Big Data  Examine the potential of Big Data sources for official statistics  Official Statistics Big Data strategy as part of wider government strategy  Address privacy and data protection  Collaboration at European and global level  Address need for skills  Partnerships between different stakeholders (government, academics, private sector)  Developments in Methodology, quality assessment and IT  Adopt action plan and roadmap for the European Statistical System
  • 3. 2 Envisaged Benefits of Using Big Data for Official Statistics Faster data production Higher detail, e.g. geographically, frequency More data More flexible response to user needs Increased efficiency Stay relevant
  • 4. 3 The ESSnet Big Data Framework Partnership Agreement: 22 partners Two Specific Grant Agreements: SGA-1: February 2016 – July 2017 1.0 M€ SGA-2: January 2017 – May 2018 1.0 M€
  • 5. 4 Partners of the ESSnet 4
  • 6. 5 ESSnet Big Data: Pilots List of pilot projects  Web scraping (2 work packages)  job vacancies ; enterprise characteristics  Smart meters  electricity consumption ; temporary vacant dwellings  Automatic Identification System (AIS)  vessel identification data  Mobile phone data  preparing for access to data  Early estimates  various domains  Multiple domains  population, tourism / border crossing, agriculture
  • 7. 6 Subdivision of Pilots into Phases 1. Data access Conditions; partnerships 2. Data handling Production criteria; micro versus aggregated data; visualisation 3. Methodology and technology Methodology for long lasting statistics; process design 4. Statistical output Examples of existing and new outputs; potential users; comparison with current estimates (quality, timeliness, level of detail) 5. Future perspectives Applicability in ESS; future production process; exploration of further possibilities of using and combining (big) data sources
  • 8. 7 WP 1: Webscraping / Job Vacancies WP leader: UK Partners: Belgium, Denmark, France, Germany, Greece, Italy, Portugal, Sweden, Slovenia • Data access: job portals • Data handling: legal and technical aspects, test webscraping • Methodology for output production: from semi-structured to structured data • Future perspectives: webscraping enterprise websites, methodology for future production, explore new products
  • 9. 8 Online Job Vacancy Data Landscape
  • 10. 9 Model for Measuring Job Vacancies Target Population: All job vacancies Advertised on enterprise website Advertised on a job portal ‘Ghost’ Vacancies Employing business is identifiable Advertised through an agency
  • 11. 10 Approach to Data Integration Counts from online sources Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E Survey Estimates Enterprise A Enterprise B Enterprise C Enterprise F Enterprise G Scaling Factors (by NACE?) Matching Integrated data set Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E Enterprise F Enterprise G Enterprise H Enterprise I Enterprise J Business Register Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E Enterprise F Enterprise G Enterprise H Enterprise I Enterprise J 1. Scale online data to survey estimates 2. Apply scaling factors to on-line data 3. Use survey estimates 4. Modelled estimates 1. Survey and Online 2. Online only 3. Survey only 4. Neither survey or online Total = Survey Estimate
  • 12. 11 WP 2: Webscraping / Enterprise Characteristics WP leader: Italy Partners: Bulgaria, Netherlands, Poland, Sweden, UK • Data access: inventory of target enterprises, URLs; legal and privacy aspects • Data handling: use cases; actual webscraping • Testing of methods and techniques: proof of concept for selected use cases; build and apply predictor for estimates of enterprise characteristics
  • 14. 13 Towards a List of URLs
  • 15. 14 WP 3: Smart Meters WP leader: Estonia Partners: Austria, Denmark, Italy, Portugal, Sweden • Data access: availability of smart meters, legal aspects • Data handling: coverage assessment, production of cleaned datasets • Methodology and techniques: linkage with administrative data; methodology for electricity consumption businesses and households; also seasonally vacant living spaces • Future perspectives: potential new products, feasibility of using aggregated data
  • 17. 16 Estonian Data Structure Estonian data structure: 4 main tables Metering data – main table with hourly consumptions Metering points – location Agreements – contract info Customers – contract holder information
  • 18. 17 Linking Tables to Establish Aggregates
  • 19. 18 WP 4: AIS Data WP leader: Netherlands Partners: Denmark, Greece, Norway, Poland • Data access: data availability (in particular EMSA) • Data handling: processing and storage, aimed at linking with data from port authorities, traffic analyses, journeys • Methodology and techniques: for linking with data from port authorities and traffic analyses; estimate emissions • Future perspectives: qualitative cost-benefit analysis
  • 23. 22 WP 5: Mobile Phone Data WP leader: Spain Partners: Belgium, Finland, France, Germany, Italy, Netherlands, Romania, UK • Data access: data availability (workshop with MNOs) • Data handling: investigation of IT tools and aggregation level needed • Statistical outputs: describe a statistical output to be presented to MNO to carry out a pilot
  • 24. 23 Processing Steps of Mobile Phone Data
  • 25. 24 Daytime Population Based on Mobile Phone Data
  • 26. 25 WP 6: Early Estimates WP leader: Slovenia Partners: Finland, Italy, Netherlands, Poland, Portugal • Data access: sources for consumer confidence index, nowcasts of turnover and early estimates • Data handling: technical requirements; deployment of collection system • Methodology and techniques: includes feability of linking administrative and other existing sources • Future perspectives: calculation of the consumer confidence index and nowcasts of turnover; pilots for combining sources for early estimates
  • 27. 26 Domains of First Interest • Tourism • Population mobility • Health statistics • Agriculture • Quick and dirty statistics (all domains) • Economic indicators: • GDP • Consumer Price Index (CPI) • Retail sales • Balance of Payments (BoP) • Economic sentiment indicators
  • 28. 27 Early Estimate of GDP vs Official Release
  • 29. 28 WP 7: Multi Domains WP leader: Poland Partners: Netherlands, Portugal, UK • Data access: data availability (inventory, based on questionnaire), aimed at three domains (populations, tourism / border crossings, agriculture) • Data feasibility: exploration of combining sources for these domains • Data combination: experiments • Future perspectives: suggest pilots for 2018
  • 30. 29 Combining Sources may enrich statistical output in domains: Big Data sources Administrative data Statistical data
  • 31. 30 Model for Daily Life Satisfaction Twitter data Tweepy Sklearn Training Dataset Machine Learning algorithm Data extracting Predictive model Labels Feature vectors Result set
  • 32. 31 WP 8: Methodology, Quality and IT WP leader: Netherlands Partners: Austria, Bulgaria, Italy, Poland, Portugal, Slovenia • Literature overview • Quality of Big Data • Big Data and IT • Big Data methodology
  • 33. 32 Main Aspects Identified Quality IT Methodology coverage metadata management assessing accuracy comparability processing life cycle final product definition processing errors format of processing spatial dimension chain control datahub changes in data sources linkability data source integration machine learning measurement errors infrastructure data linkage model errors; precision secure and tested APIs multi-party computation shared libraries; standards inference data lakes sampling training, skills and knowledge data process architecture speed of algorithms unit identification
  • 34. 33 Acknowledgements Anke Consten Piet Daas Marc Debusschere Maiki Ilves Boro Nikic Anna Nowicka David Salgado Monica Scannapieco Nigel Swier
  • 35. 34 Where are we going from here?
  • 36. 35 Plans (1) Implementation  Online job vacancies  Enterprise characteristics  Electricity and energy consumption  Waterways and environmental statistics 35
  • 37. 36 Plans (2) New pilot projects  Financial transactions data  Remote sensing  Mobile network operator data  Innovative sources and methods for tourism statistics 36
  • 38. 37 Plans (3) Trusted Smart Statistics  Citizen science data, smart cities, connected vehicles, etc
  • 39. 38 Interest in the New ESSnet 38
  • 40. 39 Conclusions  Approach very successful  Increased ambitions for coming years  Implement results obtained so far  Start with trusted smart statistics  Challenges  Data access, privacy, methods, implementation, etc.  The ESS dimension  Support and commitment  High interest in participation  Commitment at all levels  Recognition of relevance