CKAN as an open-source data management solution for open data

•Descargar como PPTX, PDF•

4 recomendaciones•3,310 vistas

AIMS (Agricultural Information Management Standards)

CKAN is an open-source data management solution for open data. It provides a platform for publishing and exposing metadata through an API and front-end interface. Major governments and communities use CKAN to organize large numbers of datasets. While it has advantages like organizing data in a structured way and providing APIs, its data model does not work for all use cases and there are no strict guidelines for dataset publishing. Extensions allow additional functionality and it can be deployed in various ways.

Tecnología

CKAN
an open-source data management solution
for open data
Ivan Ermilov

My experience with CKAN
● PublicData.eu portal
o Crowd-sourcing CSV2RDF mappings
● LODStats
o Version 1: crawling datahub.io (CKAN)
o Version 2: CKAN aggregator for data.gov,
publicdata.eu and datahub.io
o Version 2: Crawled all three portals and published
the data on datahub.io

Why CKAN?
● An open source platform
o Relatively easy to deploy
o Provides a rich set of features for free
● Data management
● Community involvement

Who use CKAN?
● All major open governments
o Canada (open.canada.ca): 244,238 datasets
o The U.S. (data.gov): 131,348 datasets
o Europe (publicdata.eu): 47,863 datasets
● And some other communities:
o Semantic Web community (datahub.io): 9,509
datasets

CKAN Pros/Cons
● Pros
o Organizes your data in structured way
o Have an extension to support DCAT (only for
datasets)
o Provides API to digest your data
● Cons
o The data model does not work for all use cases
(DBpedia)
o No strict guidelines for dataset publishing

CKAN functionality
● Publishing metadata
● Exposing metadata (API/front-end)
● Access control for users/organizations
● Additional functionality via plugins

CKAN extensions/plugins
● Data preview and visualization
● CKAN + DCAT
● Extension that adds the Disqus commenting
system to CKAN
● Simple API dataset hits counter
Full list is available at: http://extensions.ckan.org/

CKAN deployment
● From source
● OS package (e.g. as debian package)
● Docker image
Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html

CKAN API
● Well documented
● Covers everything you can do with the web
interface
o You can write your own web interface
● Various API clients
o ckanclient (python) - official
o Ruby, PHP, Java, Nodejs, Perl, R
https://github.com/ckan/ckan/wiki/CKAN-API-Clients

CKAN API methods
● Retrieving data
● Creating new data
● Update existing data
● Delete existing data
● Data is: packages, resources, groups, tags,
users etc.
http://docs.ckan.org/en/latest/api/index.html

CKAN API: Examples
● Get package list
o http://demo.ckan.org/api/3/action/package_list
o Disabled for data.gov
● Get one package
o http://demo.ckan.org/api/3/action/package_show?id=
adur_district_spending
● ckan.logic.action.get.organization_show
o api/3/action/organization_show?id=...

Use Case: LODStats
● Aggregate CKAN
instances via API
● Filter out only related
datasets
● Build an application on top
of it

Use Case: CSV2RDF
● Integrated with a particular CKAN instance
● Aggregates all CSV files from the instance
● Provides an interface for CSV2RDF conversion

Thank you for your attention!
Presented by Ivan Ermilov.
LinkedIn: https://www.linkedin.com/in/iermilov
Email: iermilov@informatik.uni-leipzig.de
Skype: earthquakesan

Más contenido relacionado

La actualidad más candente

Getting started with BigQueryPradeep Bhadani

Pentaho Data Integration Introductionmattcasters

Strata sf - Amundsen presentationTao Feng

How Lyft Drives Data DiscoveryNeo4j

Power BI OverviewJames Serra

“Open Data Web” – A Linked Open Data Repository Built with CKANChengjen Lee

Intro to elasticsearchJoey Wen

OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS

[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진PgDay.Seoul

Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines

Oracle GoldenGate 18c - REST API ExamplesBobby Curtis

ElasticsearchHermeto Romano

Journey for a data driven organizationDr. Jimmy Schwarzkopf

Data lineageGirishLingappa

Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j

Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...Jouko Nyholm

Fraud Detection with Graphs at the Danish Business AuthorityNeo4j

Meetup OpenTelemetry IntroDimitrisFinas1

Introduction to MLflowDatabricks

Designing An Enterprise Data FabricAlan McSweeney

La actualidad más candente (20)

Getting started with BigQuery

Pentaho Data Integration Introduction

Strata sf - Amundsen presentation

How Lyft Drives Data Discovery

Power BI Overview

“Open Data Web” – A Linked Open Data Repository Built with CKAN

Intro to elasticsearch

OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...

[Pgday.Seoul 2017] 1. PostGIS의 사례로 본 PostgreSQL 확장 - 장병진

Webinar slides: An Introduction to Performance Monitoring for PostgreSQL

Oracle GoldenGate 18c - REST API Examples

Elasticsearch

Journey for a data driven organization

Data lineage

Neo4j – The Fastest Path to Scalable Real-Time Analytics

Power BI Governance and Development Best Practices - Presentation at #MSBIFI ...

Fraud Detection with Graphs at the Danish Business Authority

Meetup OpenTelemetry Intro

Introduction to MLflow

Designing An Enterprise Data Fabric

Similar a CKAN as an open-source data management solution for open data

LOD2 Webinar Series: publicdata.eu and CKANLOD2 Creating Knowledge out of Interlinked Data

BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe

What's New in Docker - February 2017Patrick Chanezon

ODN - Technical introduction of the platformComsode - FP7 project

Cloud Native Landscape (CNCF and OCI)Chris Aniszczyk

Architecting the Future: Abstractions and Metadata - KCDCDaniel Barker

Architecting the Future: Abstractions and Metadata - STL SilverLiningsDaniel Barker

Docker introductionMarcelo Ochoa

EVA_Navigator_Presentation.pptViniciusSantos19485

Diving Through The Layers: Investigating runc, containerd, and the Docker eng...Phil Estes

ckan 2.0 Introduction (20140522 updated)Chengjen Lee

Architecting the Future: Abstractions and Metadata - CodeStockDaniel Barker

Suche mit Apache Lucene & Co.inovex GmbH

OpenStack Nova - Developer IntroductionJohn Garbutt

OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...Cisco DevNet

An API-first approach. Integrating ckan with RDM servicesJoss Winn

Open Data Node - Platform and Methodology - 2015-MayComsode - FP7 project

Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM

Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll

Docker Application to Scientific ComputingPeter Bryzgalov

Similar a CKAN as an open-source data management solution for open data (20)

LOD2 Webinar Series: publicdata.eu and CKAN

BigDataEurope @BDVA Summit2016 2: Societal Pilots

What's New in Docker - February 2017

ODN - Technical introduction of the platform

Cloud Native Landscape (CNCF and OCI)

Architecting the Future: Abstractions and Metadata - KCDC

Architecting the Future: Abstractions and Metadata - STL SilverLinings

Docker introduction

EVA_Navigator_Presentation.ppt

Diving Through The Layers: Investigating runc, containerd, and the Docker eng...

ckan 2.0 Introduction (20140522 updated)

Architecting the Future: Abstractions and Metadata - CodeStock

Suche mit Apache Lucene & Co.

OpenStack Nova - Developer Introduction

OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...

An API-first approach. Integrating ckan with RDM services

Open Data Node - Platform and Methodology - 2015-May

Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...

Being Ready for Apache Kafka - Apache: Big Data Europe 2015

Docker Application to Scientific Computing

Más de AIMS (Agricultural Information Management Standards)

Linked Data Competency Index : Mapping the field for teachers and learnersAIMS (Agricultural Information Management Standards)

Metadata as Standard: improving Interoperability through the Research Data Al...AIMS (Agricultural Information Management Standards)

Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAIMS (Agricultural Information Management Standards)

VocBench 3: some insights on the forthcoming release AIMS (Agricultural Information Management Standards)

The case for Digital Objects Identifiers (DOIs) in support of research activi...AIMS (Agricultural Information Management Standards)

Webinar@AIMS_FAIR Principles and Data Management PlanningAIMS (Agricultural Information Management Standards)

Webinar@ASIRA: How to foster openness from an academic library AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingAIMS (Agricultural Information Management Standards)

Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...AIMS (Agricultural Information Management Standards)

Using AGRIS as a portal of choice to access agricultural research and technol...AIMS (Agricultural Information Management Standards)

Research4Life: La bibliothèque qui ouvre ses portesAIMS (Agricultural Information Management Standards)

Publishing skos concept schemes with skosmosAIMS (Agricultural Information Management Standards)

Research4Life: La biblioteca que abre puertasAIMS (Agricultural Information Management Standards)

Research4Life: The library that opens doorsAIMS (Agricultural Information Management Standards)

Más de AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners

Metadata as Standard: improving Interoperability through the Research Data Al...

Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources

VocBench 3: some insights on the forthcoming release

The case for Digital Objects Identifiers (DOIs) in support of research activi...

Webinar@AIMS_FAIR Principles and Data Management Planning

Webinar@ASIRA: How to foster openness from an academic library

Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research

Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...

Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals

Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)

Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...

Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context

Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing

Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...

Using AGRIS as a portal of choice to access agricultural research and technol...

Research4Life: La bibliothèque qui ouvre ses portes

Publishing skos concept schemes with skosmos

Research4Life: La biblioteca que abre puertas

Research4Life: The library that opens doors

Último

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Corporate and higher education May webinar.pptxRustici Software

GenAI Risks & Security Meetup 01052024.pdflior mazor

A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Why Teams call analytics are critical to your entire businesspanagenda

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays

MS Copilot expands with MS Graph connectorsNanddeep Nachan

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Architecting Cloud Native ApplicationsWSO2

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

CKAN as an open-source data management solution for open data

1. CKAN an open-source data management solution for open data Ivan Ermilov

2. AKSW Research Group http://aksw.org

3. My experience with CKAN ● PublicData.eu portal o Crowd-sourcing CSV2RDF mappings ● LODStats o Version 1: crawling datahub.io (CKAN) o Version 2: CKAN aggregator for data.gov, publicdata.eu and datahub.io o Version 2: Crawled all three portals and published the data on datahub.io

4. CKAN IS NOT a file storage!

5. Why CKAN? ● An open source platform o Relatively easy to deploy o Provides a rich set of features for free ● Data management ● Community involvement

6. Who use CKAN? ● All major open governments o Canada (open.canada.ca): 244,238 datasets o The U.S. (data.gov): 131,348 datasets o Europe (publicdata.eu): 47,863 datasets ● And some other communities: o Semantic Web community (datahub.io): 9,509 datasets

7. CKAN architecture

8. CKAN Pros/Cons ● Pros o Organizes your data in structured way o Have an extension to support DCAT (only for datasets) o Provides API to digest your data ● Cons o The data model does not work for all use cases (DBpedia) o No strict guidelines for dataset publishing

9. CKAN functionality ● Publishing metadata ● Exposing metadata (API/front-end) ● Access control for users/organizations ● Additional functionality via plugins

10. CKAN extensions/plugins ● Data preview and visualization ● CKAN + DCAT ● Extension that adds the Disqus commenting system to CKAN ● Simple API dataset hits counter Full list is available at: http://extensions.ckan.org/

11. CKAN deployment ● From source ● OS package (e.g. as debian package) ● Docker image Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html

12. CKAN Multi-Tier Deployment

13. CKAN API ● Well documented ● Covers everything you can do with the web interface o You can write your own web interface ● Various API clients o ckanclient (python) - official o Ruby, PHP, Java, Nodejs, Perl, R https://github.com/ckan/ckan/wiki/CKAN-API-Clients

14. CKAN API methods ● Retrieving data ● Creating new data ● Update existing data ● Delete existing data ● Data is: packages, resources, groups, tags, users etc. http://docs.ckan.org/en/latest/api/index.html

15. CKAN API: Examples ● Get package list o http://demo.ckan.org/api/3/action/package_list o Disabled for data.gov ● Get one package o http://demo.ckan.org/api/3/action/package_show?id= adur_district_spending ● ckan.logic.action.get.organization_show o api/3/action/organization_show?id=...

16. Use Case: LODStats ● Aggregate CKAN instances via API ● Filter out only related datasets ● Build an application on top of it

17. Use Case: CSV2RDF ● Integrated with a particular CKAN instance ● Aggregates all CSV files from the instance ● Provides an interface for CSV2RDF conversion

18. Thank you for your attention! Presented by Ivan Ermilov. LinkedIn: https://www.linkedin.com/in/iermilov Email: iermilov@informatik.uni-leipzig.de Skype: earthquakesan

Notas del editor

What is CKAN? In two words. Who am I? -) PhD student @AKSW, University of Leipzig URZ (university data center) I hope, the presentation will be interesting for all of you and I’m looking forward to discussion.
I want to briefly introduce our research group. We are relatively big, having 40+ PhD students and research assistants. Our group is divided in subgroups working on different topics, as you can see from the group roster, such as “Semantic Abstraction”, “Emergent Semantics”, “Machine Learning” etc.
Projects started in LOD2 project
The common misconception about CKAN is that it can store files for you. It can be extended to store files, indeed. But initially it dedicated to store METADATA, not data itself.
Open source Open source solutions offer quite scarce documentation in general and even a small deviation from a typical scenario requires a specialist to be involved. In most of the cases it can be resolved through the mailing lists of a project. The customization of CKAN instance (if plugins are not available) requires a programmer to be involved. Data management CKAN enables organization and individuals to publish metadata about their datasets through an interface on a web front-end. This is an easy task, which does not require much effort. Community involvement CKAN has two main subdivisions for users: individual users and organizations. For govermental portals registration process is closed usually, because only governmental offices should be able to publish the data. For registered users it is possible to comment on the datasets as well as receive updates via various interfaces (more about it later).
CKAN is adopted by all the major open governmental portals (for instance, data.gov was previously running on Socrata data platform). Why? Because of the reasons I mentioned before. What is also important, that CKAN supports multi-tier architecture, where local CKAN instances (for instance, for cities) can be aggregated on the regional CKAN instance. I will have an example of publicdata.eu portal to show how it can be achieved.
On this slide I depicted a general overview of the CKAN architecture. As any web application, it consists of a back-end and a front-end. Organization example: We @AKSW group have an organization created at datahub.io portal, where we publish our datasets (to support dissemination).
CKAN has a flexible architecture, where new functionality can be added via extensions. We’ve already seen what CKAN provides out-of-the-box Simple API dataset hits counter: Store a counter for calls to the “show” API command for a given dataset. CKAN + DCAT exposes dataset information as RDF. All the packages/resources fields are mapped to the dcat RDF vocabulary, which has a status of W3C recommendation.
CKAN is relatively easy to deploy. The most complex installation is one from the source, it requires manual installation of ckan itself into the virual python environment and setup of apache solr (full text index, uses lucene). It is totally necessary to install CKAN from source only in case, if you want to write your own extension or modify the source code for some reason. The second option, that is installation from the operation system package, can be a good option if you wish to run only one CKAN per server (or virtual machine). The drawbacks for packages is that they are not very well maintained or in other words you will have to wait for a long time for it to be updated. The third option is relatively new and by far is the most suitable for large scale deployment. Or if you need several CKAN instances per server/VM. Docker image is assembled from the source code and the last image is available on docker hub. If it’s not available, you can compile it yourself. The overhead here is a person, who can work with docker. -) The environment we prefer at AKSW is Ubuntu Server last long-term support version.
PublicData.eu is an initiative to make a one-stop portal for data in Europe. Aggregation was not a part of initial CKAN functionality. The special harvest extension was developed for this purpose. Therefore local governments can deploy their own CKAN instance and then they can be aggregated.
You need a good API for your metadata to support the creation of cool applications on top of the data.

CKAN as an open-source data management solution for open data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a CKAN as an open-source data management solution for open data

Similar a CKAN as an open-source data management solution for open data (20)

Más de AIMS (Agricultural Information Management Standards)

Más de AIMS (Agricultural Information Management Standards) (20)

Último

Último (20)

CKAN as an open-source data management solution for open data

Notas del editor