2024: Domino Containers - The Next Step. News from the Domino Container commu...
Finding Data Sets
1. Finding Data Sets
Anja Jentzsch, Freie Universität Berlin
17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
WWW2012, Lyon, France
1
2. Different motivations
• Finding data sets
• Look for resources to link a data set to
• Find a data set with relevant data to consume / integrate
• Finding vocabularies
• Find vocabularies to use to model data sets
• Find vocabularies to map your existing schema to
2
3. Different tool types
• Search engines
• find data sets based on keywords
• Data catalogs / directories
• explore data sets and faceted search
• Data Marketplaces
• explore and consume data sets
3
4. Linked Data Search Engines
• The description of the resources is published as document in RDF
• RDF search engine index the RDF documents
• Process similar to that of search engines for HTML documents
4
13. Suitability
• Look for resources to link a data set to
• Good
• Find a data set with relevant data to consume
• Maybe good: depends on how the query is expressed
• Find vocabularies to use to model data sets
• Not good: everything is indexed, too much noise
13
14. Data catalogs
• Several governments and institutions are opening their catalogs
• http://datacatalogs.org provides a manually curated index of 226 data catalogs
14
17. The Data Hub
• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets
• Various metadata for each data set
• Other views over (part of) its content
• Semantic CKAN (http://semantic.ckan.net)
• LATC Data Source Inventory
• LOD Cloud
• State of the LOD Cloud
17
24. Data Marketplaces
• “Services that make it easy to find data from a range of secondary data sources,
then consume or acquire the data in a usable and unified format. Several of these
services are trying to create marketplaces for data, envisioning that data providers
can offer their data sets for sale to data seekers.” (http://datamarket.com)
24
25. Kasabi
• Data domain
• All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
• Public datasets
• User submitted datasets
• Data size
• 186 data sets
• Data model
• RDF
25
27. Freebase
• Metaweb (USA), now Google
• Free for 100K read API calls per day (10K write), paid for higher volumes
• Data access
• REST API
• Linked Data endpoint (http://rdf.freebase.com)
• Triple uploader / RDF dumps
• Data tools
• Web based – schema editor, review queue, viewers, …
• GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension) 27