SlideShare una empresa de Scribd logo
1 de 58
Visualising the Australian open data and research data landscape
LAND AND WATER
Jonathan Yu, Simon Cox, Benjamin Leighton, Hendra Wijaya, Qifeng Bai
C3DIS Melbourne, May 2018
Context
“Big data”
Jonathan Yu2 |
Focus: transparency Focus: reproducibility
and access
Research questions
Understand the current state of the data landscape in
Australia and measure the complexity.
Survey Questions:
• How much data is available?
• How varied?
• Are they interoperable? Metadata and keywords used
Implications
Jonathan Yu3 |
May 2017
http://global.survey.okfn.org/https://www.cio.com.au/article/618858/
australia-tops-global-open-data-index/
OKFN Global Open Data Index survey 2016
Jonathan Yu4 |
What about quality of
life indicators…
What about
environmental
impacts?
What about research
data?
Jonathan Yu5 |
Jonathan Yu6 |
http://www.opendata500.com/au/
Qualitative approach
Doesn’t report on
volumes, formats,
trends
Usage side
7 |
v
Citizens Research Industry Government
Information ‘prosumer’ community
v
Existing climate (and other) prediction
information infrastructure e.g. ACCESS, CCIA
Data ecosystem in Australia
Academies and research
centres (CRCs, NESP hubs,
CoEs)
ECOcloud
Slide courtesy of Paul Box (CSIRO)
Research
Infrastructures
Government
Data Infrastructures
… Need a more holistic
picture than what
is promoted in the
mainstream
Jonathan Yu
Jonathan Yu 8 |
http://kn.csiro.au
URIs/HTTP Identifiers for everything…
Open Data Search across Gov + Research
Indexing:
137,441 datasets
174,809 geo-features
In-development:
Python and R libraries for in-situ analysis
CSIRO Knowledge Network
Web portal
Dataset sources list
Search example
Open Data Survey Methodology
1. Query – based on Knowledge Network indexes for format types,
and keywords/tags/subjects against known metadata formats –
CKAN, GeoNetwork, Socrata, CSIRO DAP
2. Fetch file sizes - via web crawler over every resource URL
3. Query special cases - size and file resources – CSIRO DAP, known
THREDDS servers (CSIRO, NCI, AODN/IMOS, TPAC, TERN OzFlux)
4. Aggregate and visualise – source, formats, format types*,
keywords
Jonathan Yu9 |
Repositories considered
Open Gov Open Research CKAN THREDDS Socrata WS / Other File size info
data-gov-au
data-nsw-gov-au
data-qld-gov-au
data-sa-gov-au
data-wa-gov-au
nsw-oeh
nsw-seed
www-data-vic-gov-au
Yes Some Yes Contained some Yes
AODN
CSIRO THREDDS
NCI
TERN ozflux
TPAC
Yes Yes Contained some Yes
CSIRO DAP Yes Contained some Yes
ALA, AURIN, TERN Yes Yes
GA Yes Yes
data.act, data.melbourne Yes Yes
Jonathan Yu10 |
Results
(October 2017)
Datasets/collections: 125,819
Files crawled: 10,083,968 (~946 TB)
File Formats: 1,548 unique formats
Keywords found: 667,965 (25,000 unique)
Jonathan Yu11 |
Counts - #of datasets per source
Frequency - # datasets per source published over time
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Quite a bit of CSVs
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Look here for GIS data
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Structured data prolific
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Lots of APIs!
Variation by format and volume (counts of datasets)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Compressed files
popular too
Volume - How much data?
Volume - How much data? Open Gov Data
https://bl.ocks.org/jyucsiro/227d
cf04d0b30af3f09435f4d9ef9739
Aggregated size by
Repository source
Gigabytes
Open Gov Data – 1.7 TB
CSIRO DAP – 672.3 tb
Total THREDDS - 272.9 tb
Volume - How much data? Open Research data + Open Gov Data
Terabytes
Open Gov Data – 1.7 TB
Volume - How much data? Open Research data + Open Gov Data
Terabytes Astronomy
data
Volume - How much data? Open Research data + Open Gov Data
Open Gov Data – 1.7 tb
Total THREDDS - 272.9 tb
CSIRO DAP* – 74.1 tb
* Removed
astronomy
data
Terabytes
Open research data vs Open Gov Data
(Factor of 700)
Terabytes
Variation by format and volume (bytes)
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
0 1mb 100mb 1gb 50gb 100gb 500gb 1tb >100tb
Variation – Keywords…
Jonathan Yu27 |
0 5000 10000 15000 20000
Earth Sciences
Oceans
GA Publication
Ocean Temperature
Water Temperature
Top 5 CKAN tags
0 50 100 150 200 250 300 350 400
environment
EARTH SCIENCES
National Computational Infrastructure (NCI)
climatologyMeteorologyAtmosphere
ATMOSPHERIC SCIENCES
Top 5 GeoNetwork subjects
Keywords found: 667,965 - 25,000 unique
Peeking into the content
Variation - Keyword co-occurrences
Variation - Keyword co-occurrences
Theme = ‘environment’, n>10
Missing link due to ‘weak semantics’
Variation - Keyword co-occurrences
Theme = ‘environment’, n>50
Variation - Keyword correlations (co-efficient >= .7)
Keyword correlations – Open Gov (co-efficient >= .9)
Keyword correlations – Research data (co-efficient >= .9)
Open Gov
Open Research
Open Data Ecosystem
data.vic.gov.au
NSW OEH
…
ALA
AURIN
…
Open Gov
Open Research
Open Data Ecosystem
data.vic.gov.au
NSW OEH
…
ALA
AURIN
…
Open Data
content signatures?
Keyword correlations – data.vic.gov.au (p >= .6)
Keyword correlations – AURIN (p >= .6)
Keyword correlations – ALA (p >= .6)
Keyword correlations – NSW OEH (p >= .6)
Keyword correlations – CSIRO DAP (p >= .6)
Variation - Keyword coccurences and correlations
Entirely automated from metadata
catalogues.
Intent was to survey but this correlation data
could have interesting applications…
• “Content signatures” – summary, compare
• Keyword suggestions
• Trends over time
• (Semi-)automated metadata generation
Better if keywords were stronger semantic
tags (e.g. controlled vocabularies)…
Reflections on visualisations
Some great tooling in data analytics space - particularly R,
to produce visualisations and exploring data
Opportunities
• Regular updates
• UI elements (like sliders) to filter and select ranges
e.g. Keyword analysis, Subset Organisations, Date
ranges
Jonathan Yu44 |
Reflections on survey and future work
First cut quantitative survey across open data ecosystem –
government and research.
Methodology could be equally applied to internal data repositories.
Provider-centric perspective – we need usage/reuse perspectives!
Future work
• Allow better filtering (e.g. on source, clusters) and address gaps
• Unify qualitative and quantitative – provider and user perspectives
• Annual national “State-of-Open-Data” analysis valuable?
Jonathan Yu45 |
Summary & Take Home Messages
Australia is ranked high on the open data scale but we lack
consistent approach to quantify the data landscape
Presented a proof-of-concept, 1st-cut quantitative survey of
open data in Australia
“Open Data” needs to include Open Research Data
• Shown volume and velocity beyond just the Aus Gov Data
Future work is needed to gain better insight into the variety
of data being published and tools for understanding how the
data is actually used as that is not well understood.
Jonathan Yu46 |
Jonathan Yu
Research Data Scientist
Jonathan.Yu@csiro.au
@jonyu6
@jyucsiro
Thank you
Link to data visualisations:
https://gist.github.com/jyucsiro/3bea05a8977
0b9651e97f3f57d7eb332
# datasets per source timeseries
# datasets per source timeseries 2015 - current
# datasets per source timeseries 2015 - current
# datasets per source timeseries 2015 – current middle tier
How varied? (Formats) – Open Gov Data
Counts of resources aggregated
by format type
Variation by format and volume (bytes) – exclude CSIRO DAP
0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Variation by format and volume (bytes) – exclude CSIRO DAP
0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Large volumes of structured data
in THREDDS
Variation by format and volume (bytes) – exclude CSIRO DAP
0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Large volumes of compressed files
in these Open Gov Data
Variation by format and volume (bytes) – exclude CSIRO DAP
0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb
API – Web services and APIs
code – Software code
z – Zipped or compressed files
db – Database dumps
doc – documents
gis – Geospatial files
media – Images, videos and other media
otr – Other
str – Structured data (XML, JSON, netCDF)
csv – Tabular e.g. CSV and Excel files
web – Web documents e.g HTML
Documents can amount to
large volumes too!
Variation - Format co-occurrences
Jonathan Yu57 |
Variation - Formats correlation
Jonathan Yu58 |

Más contenido relacionado

La actualidad más candente

Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked dataReza Ramezani
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...andimou
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Qualityandimou
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesXiaogang (Marshall) Ma
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021Fabien Gandon
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
DBpedia Mappings Quality Assessment
DBpedia Mappings Quality AssessmentDBpedia Mappings Quality Assessment
DBpedia Mappings Quality Assessmentandimou
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 

La actualidad más candente (20)

Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental Sciences
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
DBpedia Mappings Quality Assessment
DBpedia Mappings Quality AssessmentDBpedia Mappings Quality Assessment
DBpedia Mappings Quality Assessment
 
General Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open DataGeneral Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open Data
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 

Similar a Visualising the Australian open data and research data landscape

TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN Australia
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyCESSDA Training
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...National Institute of Informatics (NII)
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of InformationAdrian Paschke
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research dataHerbert Gruttemeier
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
Presentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOMPresentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOMMathieu d'Aquin
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 

Similar a Visualising the Australian open data and research data landscape (20)

TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Brislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evsBrislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evs
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Sharing data
Sharing dataSharing data
Sharing data
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Presentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOMPresentation of LUCERO at EURECOM
Presentation of LUCERO at EURECOM
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Visualising the Australian open data and research data landscape

  • 1. Visualising the Australian open data and research data landscape LAND AND WATER Jonathan Yu, Simon Cox, Benjamin Leighton, Hendra Wijaya, Qifeng Bai C3DIS Melbourne, May 2018
  • 2. Context “Big data” Jonathan Yu2 | Focus: transparency Focus: reproducibility and access
  • 3. Research questions Understand the current state of the data landscape in Australia and measure the complexity. Survey Questions: • How much data is available? • How varied? • Are they interoperable? Metadata and keywords used Implications Jonathan Yu3 |
  • 5. What about quality of life indicators… What about environmental impacts? What about research data? Jonathan Yu5 |
  • 6. Jonathan Yu6 | http://www.opendata500.com/au/ Qualitative approach Doesn’t report on volumes, formats, trends Usage side
  • 7. 7 | v Citizens Research Industry Government Information ‘prosumer’ community v Existing climate (and other) prediction information infrastructure e.g. ACCESS, CCIA Data ecosystem in Australia Academies and research centres (CRCs, NESP hubs, CoEs) ECOcloud Slide courtesy of Paul Box (CSIRO) Research Infrastructures Government Data Infrastructures … Need a more holistic picture than what is promoted in the mainstream Jonathan Yu
  • 8. Jonathan Yu 8 | http://kn.csiro.au URIs/HTTP Identifiers for everything… Open Data Search across Gov + Research Indexing: 137,441 datasets 174,809 geo-features In-development: Python and R libraries for in-situ analysis CSIRO Knowledge Network Web portal Dataset sources list Search example
  • 9. Open Data Survey Methodology 1. Query – based on Knowledge Network indexes for format types, and keywords/tags/subjects against known metadata formats – CKAN, GeoNetwork, Socrata, CSIRO DAP 2. Fetch file sizes - via web crawler over every resource URL 3. Query special cases - size and file resources – CSIRO DAP, known THREDDS servers (CSIRO, NCI, AODN/IMOS, TPAC, TERN OzFlux) 4. Aggregate and visualise – source, formats, format types*, keywords Jonathan Yu9 |
  • 10. Repositories considered Open Gov Open Research CKAN THREDDS Socrata WS / Other File size info data-gov-au data-nsw-gov-au data-qld-gov-au data-sa-gov-au data-wa-gov-au nsw-oeh nsw-seed www-data-vic-gov-au Yes Some Yes Contained some Yes AODN CSIRO THREDDS NCI TERN ozflux TPAC Yes Yes Contained some Yes CSIRO DAP Yes Contained some Yes ALA, AURIN, TERN Yes Yes GA Yes Yes data.act, data.melbourne Yes Yes Jonathan Yu10 |
  • 11. Results (October 2017) Datasets/collections: 125,819 Files crawled: 10,083,968 (~946 TB) File Formats: 1,548 unique formats Keywords found: 667,965 (25,000 unique) Jonathan Yu11 |
  • 12. Counts - #of datasets per source
  • 13. Frequency - # datasets per source published over time
  • 14. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML
  • 15. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Quite a bit of CSVs
  • 16. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Look here for GIS data
  • 17. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Structured data prolific
  • 18. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Lots of APIs!
  • 19. Variation by format and volume (counts of datasets) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Compressed files popular too
  • 20. Volume - How much data?
  • 21. Volume - How much data? Open Gov Data https://bl.ocks.org/jyucsiro/227d cf04d0b30af3f09435f4d9ef9739 Aggregated size by Repository source Gigabytes
  • 22. Open Gov Data – 1.7 TB CSIRO DAP – 672.3 tb Total THREDDS - 272.9 tb Volume - How much data? Open Research data + Open Gov Data Terabytes
  • 23. Open Gov Data – 1.7 TB Volume - How much data? Open Research data + Open Gov Data Terabytes Astronomy data
  • 24. Volume - How much data? Open Research data + Open Gov Data Open Gov Data – 1.7 tb Total THREDDS - 272.9 tb CSIRO DAP* – 74.1 tb * Removed astronomy data Terabytes
  • 25. Open research data vs Open Gov Data (Factor of 700) Terabytes
  • 26. Variation by format and volume (bytes) API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML 0 1mb 100mb 1gb 50gb 100gb 500gb 1tb >100tb
  • 27. Variation – Keywords… Jonathan Yu27 | 0 5000 10000 15000 20000 Earth Sciences Oceans GA Publication Ocean Temperature Water Temperature Top 5 CKAN tags 0 50 100 150 200 250 300 350 400 environment EARTH SCIENCES National Computational Infrastructure (NCI) climatologyMeteorologyAtmosphere ATMOSPHERIC SCIENCES Top 5 GeoNetwork subjects Keywords found: 667,965 - 25,000 unique Peeking into the content
  • 28. Variation - Keyword co-occurrences
  • 29. Variation - Keyword co-occurrences Theme = ‘environment’, n>10
  • 30.
  • 31. Missing link due to ‘weak semantics’
  • 32. Variation - Keyword co-occurrences Theme = ‘environment’, n>50
  • 33. Variation - Keyword correlations (co-efficient >= .7)
  • 34. Keyword correlations – Open Gov (co-efficient >= .9)
  • 35. Keyword correlations – Research data (co-efficient >= .9)
  • 36. Open Gov Open Research Open Data Ecosystem data.vic.gov.au NSW OEH … ALA AURIN …
  • 37. Open Gov Open Research Open Data Ecosystem data.vic.gov.au NSW OEH … ALA AURIN … Open Data content signatures?
  • 38. Keyword correlations – data.vic.gov.au (p >= .6)
  • 39. Keyword correlations – AURIN (p >= .6)
  • 40. Keyword correlations – ALA (p >= .6)
  • 41. Keyword correlations – NSW OEH (p >= .6)
  • 42. Keyword correlations – CSIRO DAP (p >= .6)
  • 43. Variation - Keyword coccurences and correlations Entirely automated from metadata catalogues. Intent was to survey but this correlation data could have interesting applications… • “Content signatures” – summary, compare • Keyword suggestions • Trends over time • (Semi-)automated metadata generation Better if keywords were stronger semantic tags (e.g. controlled vocabularies)…
  • 44. Reflections on visualisations Some great tooling in data analytics space - particularly R, to produce visualisations and exploring data Opportunities • Regular updates • UI elements (like sliders) to filter and select ranges e.g. Keyword analysis, Subset Organisations, Date ranges Jonathan Yu44 |
  • 45. Reflections on survey and future work First cut quantitative survey across open data ecosystem – government and research. Methodology could be equally applied to internal data repositories. Provider-centric perspective – we need usage/reuse perspectives! Future work • Allow better filtering (e.g. on source, clusters) and address gaps • Unify qualitative and quantitative – provider and user perspectives • Annual national “State-of-Open-Data” analysis valuable? Jonathan Yu45 |
  • 46. Summary & Take Home Messages Australia is ranked high on the open data scale but we lack consistent approach to quantify the data landscape Presented a proof-of-concept, 1st-cut quantitative survey of open data in Australia “Open Data” needs to include Open Research Data • Shown volume and velocity beyond just the Aus Gov Data Future work is needed to gain better insight into the variety of data being published and tools for understanding how the data is actually used as that is not well understood. Jonathan Yu46 |
  • 47. Jonathan Yu Research Data Scientist Jonathan.Yu@csiro.au @jonyu6 @jyucsiro Thank you Link to data visualisations: https://gist.github.com/jyucsiro/3bea05a8977 0b9651e97f3f57d7eb332
  • 48. # datasets per source timeseries
  • 49. # datasets per source timeseries 2015 - current
  • 50. # datasets per source timeseries 2015 - current
  • 51. # datasets per source timeseries 2015 – current middle tier
  • 52. How varied? (Formats) – Open Gov Data Counts of resources aggregated by format type
  • 53. Variation by format and volume (bytes) – exclude CSIRO DAP 0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML
  • 54. Variation by format and volume (bytes) – exclude CSIRO DAP 0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Large volumes of structured data in THREDDS
  • 55. Variation by format and volume (bytes) – exclude CSIRO DAP 0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Large volumes of compressed files in these Open Gov Data
  • 56. Variation by format and volume (bytes) – exclude CSIRO DAP 0 1mb 100mb 1gb 50gb 100gb 250gb 500gb 1tb API – Web services and APIs code – Software code z – Zipped or compressed files db – Database dumps doc – documents gis – Geospatial files media – Images, videos and other media otr – Other str – Structured data (XML, JSON, netCDF) csv – Tabular e.g. CSV and Excel files web – Web documents e.g HTML Documents can amount to large volumes too!
  • 57. Variation - Format co-occurrences Jonathan Yu57 |
  • 58. Variation - Formats correlation Jonathan Yu58 |

Notas del editor

  1. “Open Data” seems to be emphasised as Open Gov Data, whereas it ignores the research infrastructures in Australia and data collected by NCRIS, Universities, Research institutions (e.g. CSIRO), etc. which are often open research data. Data ecosystems are complex and analogous to natural ecosystems Individual > populations > communities within ecosystems - diverse, dynamic and adaptive - Each ecosystem has its own distinct ‘context set of conditions – combination of constituent parts – culture, behaviours interactions system installed base 2. Data flows through this ecosystem like food webs e.g. biodiversity data - data collected by many agencies - consumed aggregated and processed to support others in the ecosystem Consumers bear high cost - aggregation and integration as data is diverse Fragile supply chains Need data governance – PC NDC Agreements including standards around data are necessary to enable data to flow Rnet – academic and research internet provider (telecoms networks layers) – role in connecting research sensors in Research IoT