SlideShare una empresa de Scribd logo
1 de 25
“Open Data Web” –
A Linked Open Data
Repository Built with CKAN
Cheng-Jen Lee
Andrea Wei-Ching Huang
Tyng-Ruey Chuang
Institute of Information Science, Academia Sinica, Taiwan
CKANCon 2016@Madrid
2016/10/04
Outline
• Data Source
• Linked Data
• From Archive Catalog to Linked Data
• Linked Open Data Repository: Open Data Web
• System Architecture
• Implementation
• Limitations
• Future Work
2
Data Source
• Union Catalog of Digital Archives Taiwan
• http://catalog.digitalarchives.tw
• Web catalog for digitized archives in 14 domains
from many institutions.
• Part of the catalog is released under CC licenses
• About 840,000 catalog records.
• Free to copy and redistribute.
• Represent resources in a linked data format
• Provide semantic query for time, place, object, etc.
• Enrich resources by linking them to third-party datasets.
3
Linked Data
• Linked Data (from Wikipedia)
• A method of publishing structured data.
• It can be interlinked and become more useful
through semantic queries.
• Linked Open Data is linked data that is open content.
• Mostly in the form of RDF.
• RDF (from W3C RDF 1.1 Primer)
• Resource Description Framework
• A framework for expressing information about resources.
• RDF can enrich a dataset by linking it to third-party datasets.
• Ex. Enrich a dataset about paintings by linking them to the
corresponding artists in Wikidata.
4
RDF Data Model
• A Triple: <subject> <predicate> <object>
• <Bob> <is a> <person>.
• <Bob> <is interested in> <the Mona Lisa>.
• <the Mona Lisa> <was created by> <Leonardo da Vinci>.
Source: https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/#section-triple
5
From Archive Catalog to Linked
Data
• We converted archive catalog to two versions of linked data.
• Version D: triples with just Dublin Core descriptions from
the catalog
• D means Dublin Core
• Version R: mapping column values in the catalog to external
datasets (with domain vocabularies) to give enriched
semantics
• R means Refined
• Extract place names from "Coverage" column (dc:coverage) in the
catalog and map them to place IDs on geonames.org.
• Normalize values in "Date" column (dc:date) to ISO8601 format, or
map them to Wikidata IDs.
• Map titles of biology archives to entries on Encyclopedia of Life.
6
Archive
Catalog
XML&CSV
txn:hasEOLPage
<http://eol.org/pages/1134120> ;
--------------------------------------------
skos:editorialNote "採集日期" ;
dwc:eventDate "1993-04-25" ;
RDF-like
CSV
Step 1:
Mapping
column
values to
vocabularies
• "採集日期” means date collected in English.
Step 2:
Converting
CSV data to
linked data
Original Data
Results
After Vocabulary Mapping
Linked Data (RDF)
Title 台灣一葉蘭
Date::field 採集日期
Date 1993-04-25
txn:hasEOLPage eol:1134120
rdf:type schema:CreateAction
skos:editorialNote 採集日期
dwc:eventDate 1993-04-25
Vocabulary Mapping and Data
Conversion Python Scripts: https://gitlab.com/iislod/dat2ld
7
Linked Open Data Repository:
Open Data Web (ODW)
http://data.odw.tw
Ontology* for Open Data Web (Draft)
http://voc.odw.tw
* Definitions of the vocabularies used to describe objects in RDF.
8
Feature (1): Linked Data Browsing
Main Menu
Records: D version
Refined: R version (still uploading)
http://data.odw.tw/record/
9
Feature (1): Linked Data Browsing
http://data.odw.tw/record/
List of Resources
Filters
10
Feature (1): Linked Data Browsing
http://data.odw.tw/record/
Get D or R version of
the same resource
11
Example: “Girl Lost in Thought”
linked data
(triples)
http://data.odw.tw/record/d4502674
12
Example: “Girl Lost in Thought”
Export single
resource in linked
data format
http://data.odw.tw/record/d4502674
13
• Spatial indexing based on geo:lat and geo:long values.
Resources about
Tainan City
Feature (2): Spatial Query
14
• Temporal indexing based on dct:W3CDTF, xsd:date, and xsd:gYear values.
Resources in 19th century
Feature (3): Temporal Query
15
Feature (4): SPARQL Endpoint
http://data.odw.tw/sparql/ (For testing)
http://sparql.odw.tw/ (For machine access)
16
Feature (5):
Spatial
Representation
• Only for R version (still uploading).
• Only shows geonames
information in the gn:locatedIn
property.
http://data.odw.tw/r1/r1-r4502674
17
System
Architecture
SPARQL
Query Page
HTML
for individual
record
RDF
for individual
record
ckanext-scheming&
ckanext-repeating
template
ckanext-dcat
output profile
User
Access
individual
resource
SPARQL
(testing)
Computer
SPARQL
Linked Data
(Turtle format)
ImportHarvest
Icon made by SimpleIcon
(http://www.flaticon.com/aut
hors/simpleicon) and Freepik
(http://www.flaticon.com/aut
hors/freepik)
18
Implementation (1/3)
• Custom fields
• ckanext-scheming and ckanext-repeating extension
• Define CKAN custom fields for a data type in a JSON file
• Each data type has its own directory.
• Ex. record.json is for D ver. (http://data.odw.tw/record/)
• A field is defined by a JSON object, for example:
{
"field_name": "dc:format",
"label": "dc:format",
"display_property": "dc:format",
"preset": "repeating_text_modified”
},
19
Implementation (2/3)
• Import linked data
• ckanext-dcat extension for linked data import/export
• CKAN harvesting mechanism by ckanext-harvest extension
• Extend DCATRDFHarvester in ckanext.dcat.harvesters.rdf
• Extend RDFProfile in ckanext.dcat.profiles
• def parse_dataset(self, dataset_dict, dataset_ref):
• (Import) Parse dataset_ref from loaded linked data to CKAN’s
dataset_dict
• def graph_from_dataset(self, dataset_dict, dataset_ref):
• (Export) Generate a linked data graph dataset_ref from CKAN’s
dataset_dict
• Modify ckanext-dcat itself
• To support more namespace (ckanext-dcat is originally designed
for DCAT vocabularies.)
20
21
Implementation (3/3)
• Virtuoso SPARQL endpoint integration
• ckanext-sparql extension
• Spatial indexing and searching
• ckanext-spatial extension
• Time indexing and searching
• We developed the ckanext-tempsearch extension.
• Source code available on GitLab.
• https://gitlab.com/iislod/
22
Limitations
• Maintaining two triple stores (CKAN & Virtuoso).
• They may be inconsistent since we do not sync them for
now.
• Slow harvesting speed on CKAN.
• 4 hrs+ for harvesting 20,000 records on a Core i7-2600
3.4 GHz machine (still uploading now).
23
Future Work
• Provide native SPARQL queries in CKAN.
• Then we do not need Virtuoso anymore.
• Harvest multiple resources as a CKAN dataset
• To improve import speed.
• Time and place names mappings to third-party
datasets
• Still need further verifications.
24
Open Data Web (http://data.odw.tw)
E-mail: ask AT odw.tw
We welcome your valuable
comments & suggestions!
25
Acknowledgement: Hsin-Ping Chen (k26021409 AT gmail.com)
for processing geonames data.
Find me at @u10313335, http://about.me/SolLee, cjlee AT iis.sinica.edu.tw

Más contenido relacionado

La actualidad más candente

Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesNeo4j
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM TutorialISLCCIFORTH
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelPascalDesmarets1
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenJeffrey T. Pollock
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsConnor McDonald
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphNeo4j
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportWes McKinney
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAAchieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAMarkus Michalewicz
 
SPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumSPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumThomas Francart
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframeJaemun Jung
 
Heat Map and Automatic Data Optimization with Oracle Database 12c
Heat Map and Automatic Data Optimization with Oracle Database 12cHeat Map and Automatic Data Optimization with Oracle Database 12c
Heat Map and Automatic Data Optimization with Oracle Database 12cDigicomp Academy Suisse Romande SA
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixGerger
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 

La actualidad más candente (20)

Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM Tutorial
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tips
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAAchieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAA
 
SPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumSPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseum
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Heat Map and Automatic Data Optimization with Oracle Database 12c
Heat Map and Automatic Data Optimization with Oracle Database 12cHeat Map and Automatic Data Optimization with Oracle Database 12c
Heat Map and Automatic Data Optimization with Oracle Database 12c
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 

Destacado

Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National LibraryReinhold Heuvelmann
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataCyndy Parr
 
Flagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertierFlagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertierFlagis VZW
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Nikolaos Konstantinou
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 

Destacado (6)

Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National Library
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and data
 
Ocd arc energy_20160427
Ocd arc energy_20160427Ocd arc energy_20160427
Ocd arc energy_20160427
 
Flagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertierFlagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertier
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 

Similar a “Open Data Web” – A Linked Open Data Repository Built with CKAN

20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKANandrea huang
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
20160922 Materials Data Facility TMS Webinar
20160922 Materials Data Facility TMS Webinar20160922 Materials Data Facility TMS Webinar
20160922 Materials Data Facility TMS WebinarBen Blaiszik
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...Jennifer Bowen
 
Ontology-based multi-domain metadata for research data management using tripl...
Ontology-based multi-domain metadata for research data management using tripl...Ontology-based multi-domain metadata for research data management using tripl...
Ontology-based multi-domain metadata for research data management using tripl...João Rocha da Silva
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationWARCnet
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
Azure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare IntegrationAzure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare IntegrationBizTalk360
 
Research data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesResearch data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesBlue BRIDGE
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Ricard de la Vega
 

Similar a “Open Data Web” – A Linked Open Data Repository Built with CKAN (20)

20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
20160922 Materials Data Facility TMS Webinar
20160922 Materials Data Facility TMS Webinar20160922 Materials Data Facility TMS Webinar
20160922 Materials Data Facility TMS Webinar
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
Ontology-based multi-domain metadata for research data management using tripl...
Ontology-based multi-domain metadata for research data management using tripl...Ontology-based multi-domain metadata for research data management using tripl...
Ontology-based multi-domain metadata for research data management using tripl...
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Azure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare IntegrationAzure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare Integration
 
Research data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesResearch data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciences
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 

Más de Chengjen Lee

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsChengjen Lee
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioChengjen Lee
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹Chengjen Lee
 
CKANCon 2016 & IODC16
CKANCon 2016 & IODC16CKANCon 2016 & IODC16
CKANCon 2016 & IODC16Chengjen Lee
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)Chengjen Lee
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)Chengjen Lee
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享Chengjen Lee
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例Chengjen Lee
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)Chengjen Lee
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)Chengjen Lee
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Chengjen Lee
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to PelicanChengjen Lee
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesChengjen Lee
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper lookChengjen Lee
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 IntroductionChengjen Lee
 

Más de Chengjen Lee (17)

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.io
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
 
CKANCon 2016 & IODC16
CKANCon 2016 & IODC16CKANCon 2016 & IODC16
CKANCon 2016 & IODC16
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
 
Report 140227
Report 140227Report 140227
Report 140227
 
Report 140213
Report 140213Report 140213
Report 140213
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

“Open Data Web” – A Linked Open Data Repository Built with CKAN

  • 1. “Open Data Web” – A Linked Open Data Repository Built with CKAN Cheng-Jen Lee Andrea Wei-Ching Huang Tyng-Ruey Chuang Institute of Information Science, Academia Sinica, Taiwan CKANCon 2016@Madrid 2016/10/04
  • 2. Outline • Data Source • Linked Data • From Archive Catalog to Linked Data • Linked Open Data Repository: Open Data Web • System Architecture • Implementation • Limitations • Future Work 2
  • 3. Data Source • Union Catalog of Digital Archives Taiwan • http://catalog.digitalarchives.tw • Web catalog for digitized archives in 14 domains from many institutions. • Part of the catalog is released under CC licenses • About 840,000 catalog records. • Free to copy and redistribute. • Represent resources in a linked data format • Provide semantic query for time, place, object, etc. • Enrich resources by linking them to third-party datasets. 3
  • 4. Linked Data • Linked Data (from Wikipedia) • A method of publishing structured data. • It can be interlinked and become more useful through semantic queries. • Linked Open Data is linked data that is open content. • Mostly in the form of RDF. • RDF (from W3C RDF 1.1 Primer) • Resource Description Framework • A framework for expressing information about resources. • RDF can enrich a dataset by linking it to third-party datasets. • Ex. Enrich a dataset about paintings by linking them to the corresponding artists in Wikidata. 4
  • 5. RDF Data Model • A Triple: <subject> <predicate> <object> • <Bob> <is a> <person>. • <Bob> <is interested in> <the Mona Lisa>. • <the Mona Lisa> <was created by> <Leonardo da Vinci>. Source: https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/#section-triple 5
  • 6. From Archive Catalog to Linked Data • We converted archive catalog to two versions of linked data. • Version D: triples with just Dublin Core descriptions from the catalog • D means Dublin Core • Version R: mapping column values in the catalog to external datasets (with domain vocabularies) to give enriched semantics • R means Refined • Extract place names from "Coverage" column (dc:coverage) in the catalog and map them to place IDs on geonames.org. • Normalize values in "Date" column (dc:date) to ISO8601 format, or map them to Wikidata IDs. • Map titles of biology archives to entries on Encyclopedia of Life. 6
  • 7. Archive Catalog XML&CSV txn:hasEOLPage <http://eol.org/pages/1134120> ; -------------------------------------------- skos:editorialNote "採集日期" ; dwc:eventDate "1993-04-25" ; RDF-like CSV Step 1: Mapping column values to vocabularies • "採集日期” means date collected in English. Step 2: Converting CSV data to linked data Original Data Results After Vocabulary Mapping Linked Data (RDF) Title 台灣一葉蘭 Date::field 採集日期 Date 1993-04-25 txn:hasEOLPage eol:1134120 rdf:type schema:CreateAction skos:editorialNote 採集日期 dwc:eventDate 1993-04-25 Vocabulary Mapping and Data Conversion Python Scripts: https://gitlab.com/iislod/dat2ld 7
  • 8. Linked Open Data Repository: Open Data Web (ODW) http://data.odw.tw Ontology* for Open Data Web (Draft) http://voc.odw.tw * Definitions of the vocabularies used to describe objects in RDF. 8
  • 9. Feature (1): Linked Data Browsing Main Menu Records: D version Refined: R version (still uploading) http://data.odw.tw/record/ 9
  • 10. Feature (1): Linked Data Browsing http://data.odw.tw/record/ List of Resources Filters 10
  • 11. Feature (1): Linked Data Browsing http://data.odw.tw/record/ Get D or R version of the same resource 11
  • 12. Example: “Girl Lost in Thought” linked data (triples) http://data.odw.tw/record/d4502674 12
  • 13. Example: “Girl Lost in Thought” Export single resource in linked data format http://data.odw.tw/record/d4502674 13
  • 14. • Spatial indexing based on geo:lat and geo:long values. Resources about Tainan City Feature (2): Spatial Query 14
  • 15. • Temporal indexing based on dct:W3CDTF, xsd:date, and xsd:gYear values. Resources in 19th century Feature (3): Temporal Query 15
  • 16. Feature (4): SPARQL Endpoint http://data.odw.tw/sparql/ (For testing) http://sparql.odw.tw/ (For machine access) 16
  • 17. Feature (5): Spatial Representation • Only for R version (still uploading). • Only shows geonames information in the gn:locatedIn property. http://data.odw.tw/r1/r1-r4502674 17
  • 18. System Architecture SPARQL Query Page HTML for individual record RDF for individual record ckanext-scheming& ckanext-repeating template ckanext-dcat output profile User Access individual resource SPARQL (testing) Computer SPARQL Linked Data (Turtle format) ImportHarvest Icon made by SimpleIcon (http://www.flaticon.com/aut hors/simpleicon) and Freepik (http://www.flaticon.com/aut hors/freepik) 18
  • 19. Implementation (1/3) • Custom fields • ckanext-scheming and ckanext-repeating extension • Define CKAN custom fields for a data type in a JSON file • Each data type has its own directory. • Ex. record.json is for D ver. (http://data.odw.tw/record/) • A field is defined by a JSON object, for example: { "field_name": "dc:format", "label": "dc:format", "display_property": "dc:format", "preset": "repeating_text_modified” }, 19
  • 20. Implementation (2/3) • Import linked data • ckanext-dcat extension for linked data import/export • CKAN harvesting mechanism by ckanext-harvest extension • Extend DCATRDFHarvester in ckanext.dcat.harvesters.rdf • Extend RDFProfile in ckanext.dcat.profiles • def parse_dataset(self, dataset_dict, dataset_ref): • (Import) Parse dataset_ref from loaded linked data to CKAN’s dataset_dict • def graph_from_dataset(self, dataset_dict, dataset_ref): • (Export) Generate a linked data graph dataset_ref from CKAN’s dataset_dict • Modify ckanext-dcat itself • To support more namespace (ckanext-dcat is originally designed for DCAT vocabularies.) 20
  • 21. 21
  • 22. Implementation (3/3) • Virtuoso SPARQL endpoint integration • ckanext-sparql extension • Spatial indexing and searching • ckanext-spatial extension • Time indexing and searching • We developed the ckanext-tempsearch extension. • Source code available on GitLab. • https://gitlab.com/iislod/ 22
  • 23. Limitations • Maintaining two triple stores (CKAN & Virtuoso). • They may be inconsistent since we do not sync them for now. • Slow harvesting speed on CKAN. • 4 hrs+ for harvesting 20,000 records on a Core i7-2600 3.4 GHz machine (still uploading now). 23
  • 24. Future Work • Provide native SPARQL queries in CKAN. • Then we do not need Virtuoso anymore. • Harvest multiple resources as a CKAN dataset • To improve import speed. • Time and place names mappings to third-party datasets • Still need further verifications. 24
  • 25. Open Data Web (http://data.odw.tw) E-mail: ask AT odw.tw We welcome your valuable comments & suggestions! 25 Acknowledgement: Hsin-Ping Chen (k26021409 AT gmail.com) for processing geonames data. Find me at @u10313335, http://about.me/SolLee, cjlee AT iis.sinica.edu.tw