SlideShare una empresa de Scribd logo
1 de 18
CKAN
an open-source data management solution
for open data
Ivan Ermilov
AKSW Research Group
http://aksw.org
My experience with CKAN
● PublicData.eu portal
o Crowd-sourcing CSV2RDF mappings
● LODStats
o Version 1: crawling datahub.io (CKAN)
o Version 2: CKAN aggregator for data.gov,
publicdata.eu and datahub.io
o Version 2: Crawled all three portals and published
the data on datahub.io
CKAN IS NOT
a file storage!
Why CKAN?
● An open source platform
o Relatively easy to deploy
o Provides a rich set of features for free
● Data management
● Community involvement
Who use CKAN?
● All major open governments
o Canada (open.canada.ca): 244,238 datasets
o The U.S. (data.gov): 131,348 datasets
o Europe (publicdata.eu): 47,863 datasets
● And some other communities:
o Semantic Web community (datahub.io): 9,509
datasets
CKAN architecture
CKAN Pros/Cons
● Pros
o Organizes your data in structured way
o Have an extension to support DCAT (only for
datasets)
o Provides API to digest your data
● Cons
o The data model does not work for all use cases
(DBpedia)
o No strict guidelines for dataset publishing
CKAN functionality
● Publishing metadata
● Exposing metadata (API/front-end)
● Access control for users/organizations
● Additional functionality via plugins
CKAN extensions/plugins
● Data preview and visualization
● CKAN + DCAT
● Extension that adds the Disqus commenting
system to CKAN
● Simple API dataset hits counter
Full list is available at: http://extensions.ckan.org/
CKAN deployment
● From source
● OS package (e.g. as debian package)
● Docker image
Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html
CKAN Multi-Tier Deployment
CKAN API
● Well documented
● Covers everything you can do with the web
interface
o You can write your own web interface
● Various API clients
o ckanclient (python) - official
o Ruby, PHP, Java, Nodejs, Perl, R
https://github.com/ckan/ckan/wiki/CKAN-API-Clients
CKAN API methods
● Retrieving data
● Creating new data
● Update existing data
● Delete existing data
● Data is: packages, resources, groups, tags,
users etc.
http://docs.ckan.org/en/latest/api/index.html
CKAN API: Examples
● Get package list
o http://demo.ckan.org/api/3/action/package_list
o Disabled for data.gov
● Get one package
o http://demo.ckan.org/api/3/action/package_show?id=
adur_district_spending
● ckan.logic.action.get.organization_show
o api/3/action/organization_show?id=...
Use Case: LODStats
● Aggregate CKAN
instances via API
● Filter out only related
datasets
● Build an application on top
of it
Use Case: CSV2RDF
● Integrated with a particular CKAN instance
● Aggregates all CSV files from the instance
● Provides an interface for CSV2RDF conversion
Thank you for your attention!
Presented by Ivan Ermilov.
LinkedIn: https://www.linkedin.com/in/iermilov
Email: iermilov@informatik.uni-leipzig.de
Skype: earthquakesan

Más contenido relacionado

La actualidad más candente

Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQueryPradeep Bhadani
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKANChengjen Lee
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진PgDay.Seoul
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Oracle GoldenGate 18c - REST API Examples
Oracle GoldenGate 18c - REST API ExamplesOracle GoldenGate 18c - REST API Examples
Oracle GoldenGate 18c - REST API ExamplesBobby Curtis
 
Journey for a data driven organization
Journey for a data driven organizationJourney for a data driven organization
Journey for a data driven organizationDr. Jimmy Schwarzkopf
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Jouko Nyholm
 
Fraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityFraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityNeo4j
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry IntroDimitrisFinas1
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 

La actualidad más candente (20)

Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진
[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Oracle GoldenGate 18c - REST API Examples
Oracle GoldenGate 18c - REST API ExamplesOracle GoldenGate 18c - REST API Examples
Oracle GoldenGate 18c - REST API Examples
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Journey for a data driven organization
Journey for a data driven organizationJourney for a data driven organization
Journey for a data driven organization
 
Data lineage
Data lineageData lineage
Data lineage
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...
Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...
 
Fraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business AuthorityFraud Detection with Graphs at the Danish Business Authority
Fraud Detection with Graphs at the Danish Business Authority
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 

Similar a CKAN as an open-source data management solution for open data

BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
ODN - Technical introduction of the platform
ODN - Technical introduction of the platformODN - Technical introduction of the platform
ODN - Technical introduction of the platformComsode - FP7 project
 
Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Chris Aniszczyk
 
Architecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCArchitecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCDaniel Barker
 
Architecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLiningsArchitecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLiningsDaniel Barker
 
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...Phil Estes
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)Chengjen Lee
 
Architecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockArchitecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockDaniel Barker
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.inovex GmbH
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionJohn Garbutt
 
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...Cisco DevNet
 
An API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM servicesAn API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM servicesJoss Winn
 
Open Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayOpen Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayComsode - FP7 project
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific ComputingPeter Bryzgalov
 

Similar a CKAN as an open-source data management solution for open data (20)

LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
ODN - Technical introduction of the platform
ODN - Technical introduction of the platformODN - Technical introduction of the platform
ODN - Technical introduction of the platform
 
Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)
 
Architecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCArchitecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDC
 
Architecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLiningsArchitecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLinings
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
EVA_Navigator_Presentation.ppt
EVA_Navigator_Presentation.pptEVA_Navigator_Presentation.ppt
EVA_Navigator_Presentation.ppt
 
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
 
Architecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockArchitecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStock
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
 
An API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM servicesAn API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM services
 
Open Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayOpen Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-May
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 

Más de AIMS (Agricultural Information Management Standards)

Más de AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

CKAN as an open-source data management solution for open data

  • 1. CKAN an open-source data management solution for open data Ivan Ermilov
  • 3. My experience with CKAN ● PublicData.eu portal o Crowd-sourcing CSV2RDF mappings ● LODStats o Version 1: crawling datahub.io (CKAN) o Version 2: CKAN aggregator for data.gov, publicdata.eu and datahub.io o Version 2: Crawled all three portals and published the data on datahub.io
  • 4. CKAN IS NOT a file storage!
  • 5. Why CKAN? ● An open source platform o Relatively easy to deploy o Provides a rich set of features for free ● Data management ● Community involvement
  • 6. Who use CKAN? ● All major open governments o Canada (open.canada.ca): 244,238 datasets o The U.S. (data.gov): 131,348 datasets o Europe (publicdata.eu): 47,863 datasets ● And some other communities: o Semantic Web community (datahub.io): 9,509 datasets
  • 8. CKAN Pros/Cons ● Pros o Organizes your data in structured way o Have an extension to support DCAT (only for datasets) o Provides API to digest your data ● Cons o The data model does not work for all use cases (DBpedia) o No strict guidelines for dataset publishing
  • 9. CKAN functionality ● Publishing metadata ● Exposing metadata (API/front-end) ● Access control for users/organizations ● Additional functionality via plugins
  • 10. CKAN extensions/plugins ● Data preview and visualization ● CKAN + DCAT ● Extension that adds the Disqus commenting system to CKAN ● Simple API dataset hits counter Full list is available at: http://extensions.ckan.org/
  • 11. CKAN deployment ● From source ● OS package (e.g. as debian package) ● Docker image Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html
  • 13. CKAN API ● Well documented ● Covers everything you can do with the web interface o You can write your own web interface ● Various API clients o ckanclient (python) - official o Ruby, PHP, Java, Nodejs, Perl, R https://github.com/ckan/ckan/wiki/CKAN-API-Clients
  • 14. CKAN API methods ● Retrieving data ● Creating new data ● Update existing data ● Delete existing data ● Data is: packages, resources, groups, tags, users etc. http://docs.ckan.org/en/latest/api/index.html
  • 15. CKAN API: Examples ● Get package list o http://demo.ckan.org/api/3/action/package_list o Disabled for data.gov ● Get one package o http://demo.ckan.org/api/3/action/package_show?id= adur_district_spending ● ckan.logic.action.get.organization_show o api/3/action/organization_show?id=...
  • 16. Use Case: LODStats ● Aggregate CKAN instances via API ● Filter out only related datasets ● Build an application on top of it
  • 17. Use Case: CSV2RDF ● Integrated with a particular CKAN instance ● Aggregates all CSV files from the instance ● Provides an interface for CSV2RDF conversion
  • 18. Thank you for your attention! Presented by Ivan Ermilov. LinkedIn: https://www.linkedin.com/in/iermilov Email: iermilov@informatik.uni-leipzig.de Skype: earthquakesan

Notas del editor

  1. What is CKAN? In two words. Who am I? -) PhD student @AKSW, University of Leipzig URZ (university data center) I hope, the presentation will be interesting for all of you and I’m looking forward to discussion.
  2. I want to briefly introduce our research group. We are relatively big, having 40+ PhD students and research assistants. Our group is divided in subgroups working on different topics, as you can see from the group roster, such as “Semantic Abstraction”, “Emergent Semantics”, “Machine Learning” etc.
  3. Projects started in LOD2 project
  4. The common misconception about CKAN is that it can store files for you. It can be extended to store files, indeed. But initially it dedicated to store METADATA, not data itself.
  5. Open source Open source solutions offer quite scarce documentation in general and even a small deviation from a typical scenario requires a specialist to be involved. In most of the cases it can be resolved through the mailing lists of a project. The customization of CKAN instance (if plugins are not available) requires a programmer to be involved. Data management CKAN enables organization and individuals to publish metadata about their datasets through an interface on a web front-end. This is an easy task, which does not require much effort. Community involvement CKAN has two main subdivisions for users: individual users and organizations. For govermental portals registration process is closed usually, because only governmental offices should be able to publish the data. For registered users it is possible to comment on the datasets as well as receive updates via various interfaces (more about it later).
  6. CKAN is adopted by all the major open governmental portals (for instance, data.gov was previously running on Socrata data platform). Why? Because of the reasons I mentioned before. What is also important, that CKAN supports multi-tier architecture, where local CKAN instances (for instance, for cities) can be aggregated on the regional CKAN instance. I will have an example of publicdata.eu portal to show how it can be achieved.
  7. On this slide I depicted a general overview of the CKAN architecture. As any web application, it consists of a back-end and a front-end. Organization example: We @AKSW group have an organization created at datahub.io portal, where we publish our datasets (to support dissemination).
  8. CKAN has a flexible architecture, where new functionality can be added via extensions. We’ve already seen what CKAN provides out-of-the-box Simple API dataset hits counter: Store a counter for calls to the “show” API command for a given dataset. CKAN + DCAT exposes dataset information as RDF. All the packages/resources fields are mapped to the dcat RDF vocabulary, which has a status of W3C recommendation.
  9. CKAN is relatively easy to deploy. The most complex installation is one from the source, it requires manual installation of ckan itself into the virual python environment and setup of apache solr (full text index, uses lucene). It is totally necessary to install CKAN from source only in case, if you want to write your own extension or modify the source code for some reason. The second option, that is installation from the operation system package, can be a good option if you wish to run only one CKAN per server (or virtual machine). The drawbacks for packages is that they are not very well maintained or in other words you will have to wait for a long time for it to be updated. The third option is relatively new and by far is the most suitable for large scale deployment. Or if you need several CKAN instances per server/VM. Docker image is assembled from the source code and the last image is available on docker hub. If it’s not available, you can compile it yourself. The overhead here is a person, who can work with docker. -) The environment we prefer at AKSW is Ubuntu Server last long-term support version.
  10. PublicData.eu is an initiative to make a one-stop portal for data in Europe. Aggregation was not a part of initial CKAN functionality. The special harvest extension was developed for this purpose. Therefore local governments can deploy their own CKAN instance and then they can be aggregated.
  11. You need a good API for your metadata to support the creation of cool applications on top of the data.