SlideShare una empresa de Scribd logo
1 de 39
5 years of Dataverse
evolution
Slava Tykhonov
Senior Information Scientist,
Research & Innovation meeting (DANS-KNAW)
26.01.2021
Dataverse based Clio Infra collaboration platform (2015)
Clio Infra functionality based on the Dataverse solution:
- teams curate, share and analyze research datasets collaboratively
- teams members can share the responsibility to collect data on specific variables
(for example, countries) and inform each other about changes and additions
- dataset version control system is able to track changes in datasets
- other researchers can download their own copy of the data if dataset is
published as Open Data Dataverse in flexible metadata store (Dataverse) that
connected with Research datasets storage by data processing engine
Interactive Clio Infra Dashboard with data in Dataverse (2015)
DANS Dataverse 3.x migration (2016)
Basic DataverseNL services:
• Federated login for Netherlands
institutions
• Persistent Identifier Services (DOI and
handle)
• Integration with archival systems
Applications:
• Modern and historical world maps
visualisations
• Data API and Geo API services for
projects with data
• Panel datasets constructor
• Time series plot
• Treemaps
• Pie and chart visualizations
• Descriptive statistics tools
Major challenges to provide services for researchers
● Maintenance concerns - who will be in charge after project is finished?
● Infrastructure problems - how to install and run tools for researchers?
● Various Interoperability issues - how to leverage data exchange between
different systems and services
Software updates and bug fixing, licences, technical staff training, legal aspects
and so on...
The influence of APIs standards on innovation
Source: V. Tykhonov “API Economy”
Interoperability in EOSC
● Technical interoperability defined as the “ability of different information technology systems
and software applications to communicate and exchange data”. It should allow “to accept
data from each other and perform a given task in an appropriate and satisfactory manner
without the need for extra operator intervention”.
● Semantic interoperability is “the ability of computer systems to transmit data with
unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine
computable logic, inferencing, knowledge discovery, and data”.
● Organisational interoperability refers to the “way in which organisations align their
business processes, responsibilities and expectations to achieve commonly agreed and
mutually beneficial goals. Focus on the requirements of the user community by making
services available, easily identifiable, accessible and user-focused”.
● Legal interoperability covers “the broader environment of laws, policies, procedures and
cooperation agreements”
Source: EOSC Interoperability Framework v1.0
Open vs Closed Innovation
DANS Data Stations - Future Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
Dataverse architecture in the nutshell
Basic components: Database (postgres), search index (solr) and web application (Glassfish/Payara)
Simple but
powerful!
How about
maintenance?
Dataverse Docker module (CESSDA Dataverse, 2018)
Source: https://github.com/IQSS/dataverse-docker
The Cathedral and the Bazaar
“The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary
(abbreviated CatB) is an essay, and later a book, by Eric S. Raymond on software engineering methods,
based on his observations of the Linux kernel development process and his experiences managing an
open source project, fetchmail. It examines the struggle between top-down and bottom-up design.”
Wikipedia
Some important points:
● Smart data structures and dumb code works a lot better than the other way
around
● When writing gateway software of any kind, take pains to disturb the data
stream as little as possible—and never throw away information unless the
recipient forces you to!
● Any tool should be useful in the expected way, but a truly great tool lends itself
to uses you never expected
Principle of good enough
The principle of good enough or "good enough" principle is a rule in software and systems design. It
indicates that consumers will use products that are good enough for their requirements, despite the
availability of more advanced technology.
Wikipedia
The KISS Principle of "Keep it Simple, Stupid” provides a series of design rules, some of them:
● Separate mechanisms from policy
● Write simple programs
● Write transparent programs
● Value developer time over machine time
● Make data complicated when required, not the program
● Build on potential users' expected knowledge
● Write programs which fail in a way that is easy to diagnose
● Prototype software before polishing it
● Make the program and protocols extensible
What should be simplified to make Dataverse “good enough”?
“One-liner” installation requirements include:
● even users without any technical knowledge should be able to install it
● simple, clear and transparent infrastructure ready for integration (Docker based)
● reverse proxy and load balancer should be set up both locally and on a remote host to
run Dataverse website (Nginx/Traefik)
Q: How do we cross the chasm?
A: Let’s try to capture the
mainstream!
Using Dataverse to fight against COVID-19
1300+ people
registered in the
organization
15
Jupyter integration: datasets conversion to pandas
dataframe
Can AI researchers read and reuse data directly from Dataverse in a collaborative
way?
Crossing the chasm...
The technology adoption requires further automation of all processes.
Our goal is to deliver production ready Dataverse for the European Open Science
Cloud (EOSC):
● SSHOC project: Docker/Kubernetes, common CI/CD pipeline, integration
tests, previewers, language localization, external tools
● EOSC Synergy Software Quality Assurance (SqaaS) pipeline integration
● CLARIAH - leveraging metadata schema with CLARIN community, CLARIN
tools integration, development common pipelines
● FAIRsFAIR - enabling FAIR Data Points (FDP) in Dataverse
● ODISSEI - using Dataverse as a data registry
Services in European Open Science Cloud (EOSC)
● EOSC requires the level 8 of maturity (at least)
● we need the highest quality of software to be
accepted as a service
● clear and transparent evaluation of services is
essential
● the evidence of technical maturity is the key to
success
● the limited warranty will allow to stop out-of-
warranty services
Running Dataverse in production on Cloud
HTTP(S) Load
Balancer Kubernetes Engine
Container Registry
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment
PostgreS
QL
Service
Solr Deployment
PostgreSQL Deployment
Users
Certbot Cronjob
Email Relay Deployment
Certbot
Service
Email
relay
Service
Dataverse Service
Solr
Service
Dataverse Kubernetes
Project maintained by Oliver Bertuch (FZ Julich) and available in Global
Dataverse Community Consortium github (GDCC)
Google Cloud, Amazon AWS, Microsoft Azure platforms supported
Open Source, community pull requests are welcome
http://github.com/IQSS/dataverse-kubernetes
SQA process with Selenium tests for Dataverse
Selenium IDE allows
to create and replay
all UI tests in your
browser
Shared tests can be
reused by community
to increase
reproducibility
SQA for the service maturity = unit tests + integration tests
21
Source: SSHOC project, data repositories task WP5.2
CI/CD pipeline with SQAaaS (S)
1
2
3
git
push
Push GCP
container
registry
webhook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline (Jenkinsfile)
9
7
Run SQA
S 8
1. Developer pushes code to GitHub
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. (S) Runs SQA tests and does FAIRness check
5. (S) Issuing digital badge according to the results
6. (S) SQAaaS API triggers appropriate workflow
7. Creates docker image if success
8. Pushes new docker image to container registry
9. Updates the kubernetes deployment
22
Source: EOSC Synergy project
Data Commons is essential for integrations
Merce Crosas, “Harvard Data Commons”
FAIR Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond:
implementation in
Dataverse”
Our goals to increase Dataverse interoperability
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and CVs:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
One metadata field can be linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
The FAIR Signposting Profile
Herbert Van de Sompel
https://hvdsomp.info
Two levels of access to Web resources:
● level one provides a concise set of links or a
minimal set of links by value in the HTTP
header
● level two delivers a complete comprehensive
set of links by reference meaning in a
standalone document (link set)
Dataverse meta(data) in FAIR Data Point (FDP)
● RESTful web service that enables data
owners to expose their data sets using
rich machine-readable metadata
● Provides standardized descriptions
(RDF-based metadata) using
controlled vocabularies and ontologies
● FDP spec is public
Source: FDP
The goal is to run FDP on
Dataverse side (DCAT, CVs) and
provide metadata export in RDF!
F-UJI Automated FAIR Data Assessment Tool
Dataverse localization with Weblate
● service to connect files to Weblate in order to
translate them in a structured way
● several options for project visibility: accept
translations by the crowd, or only give access
to a select group of translators.
● Weblate indicates untranslated strings,
strings with failing checks, and strings that
need approval.
● when new strings are added with an upgrade
of Dataverse, Weblate can indicate which
strings are new and untranslated.
GUI translation with Weblate as a service
Source: SSHOC Weblate
Dataverse App Store
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render,
audio, JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies (CESSDA CV Manager)
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance including SPARQL endpoint (FDP)
Federated login: eduGAIN, PIONIER ID
CLARIN Switchboard integration: Natural Language Processing tools
Visualization tools (maps, charts, timelines)
Dataverse and CLARIN tools integration
Make Data Count
Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics Working Group
which helped to produce a specification called the COUNTER Code of Practice for Research Data.
The following metrics can be downloaded directly from the DataCite hub for datasets hosted by Dataverse
installations:
● Total Views for a Dataset
● Unique Views for a Dataset
● Total Downloads for a Dataset
● Downloads for a Dataset
● Citations for a Dataset (via Crossref)
Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape monitoring.
Metrics for BI and integration with Apache Superset
Source: Apache Superset (Open Source)
Apache Superset visualizations
Apache Airflow for Dataverse pipelines
● Intended for acyclic processes,
around those processing data with a
point of "completion."
● DAG (Directed Acyclic Graph) is a
collection of all the tasks organized in
a way that reflects their relationships
and dependencies
● absolutely essential component for
the harvesting and depositing data
● Airflow dashboard allows to get a
clear overview and status of all
running processes
On the roadmap of ODISSEI project!
Conclusion
Due to the open architecture and the use of open standards, Dataverse team has
managed to attract the best people and create a strong community, and finally
build a product completely aligned with principles of Open Innovation.
Suitable for the future, community-driven, it has all chances to “cross the chasm”
and become a prominent FAIR data repository on all continents.
Dataverse already has a very rich ecosystem for technological innovation that will
allow to integrate tools which don't exist yet.
“Any tool should be useful in the expected way, but a truly great tool
lends itself to uses you never expected”...
Questions?
Slava Tykhonov,
Senior Information Scientist
vyacheslav.tykhonov@dans.knaw.nl

Más contenido relacionado

La actualidad más candente

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataversevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2vty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repositoryvty
 

La actualidad más candente (20)

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink SoftwareLOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink Software
 

Similar a 5 years of Dataverse evolution

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectvty
 
{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell TechnologiesThe {code} Team
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsRay Février
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018Krishna-Kumar
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack
 
Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summitMatt Carroll
 
Frequently Used Terms in Data Centers
Frequently Used Terms in Data CentersFrequently Used Terms in Data Centers
Frequently Used Terms in Data CentersHTS Hosting
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdfMASSAL3
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Startup Club
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Futurestratuslab
 
Powering Microservices with Docker
Powering Microservices with DockerPowering Microservices with Docker
Powering Microservices with DockerCognizant
 
Introducing the Open Container Project
Introducing the Open Container ProjectIntroducing the Open Container Project
Introducing the Open Container ProjectAndrew Kennedy
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Denodo
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 

Similar a 5 years of Dataverse evolution (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU project
 
{code} and containers
{code} and containers{code} and containers
{code} and containers
 
{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies{code} and Containers - Open Source Infrastructure within Dell Technologies
{code} and Containers - Open Source Infrastructure within Dell Technologies
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
 
Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summit
 
Frequently Used Terms in Data Centers
Frequently Used Terms in Data CentersFrequently Used Terms in Data Centers
Frequently Used Terms in Data Centers
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdf
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Future
 
Powering Microservices with Docker
Powering Microservices with DockerPowering Microservices with Docker
Powering Microservices with Docker
 
Introducing the Open Container Project
Introducing the Open Container ProjectIntroducing the Open Container Project
Introducing the Open Container Project
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 

Más de vty

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC projectvty
 

Más de vty (6)

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 

Último

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Último (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

5 years of Dataverse evolution

  • 1. 5 years of Dataverse evolution Slava Tykhonov Senior Information Scientist, Research & Innovation meeting (DANS-KNAW) 26.01.2021
  • 2. Dataverse based Clio Infra collaboration platform (2015) Clio Infra functionality based on the Dataverse solution: - teams curate, share and analyze research datasets collaboratively - teams members can share the responsibility to collect data on specific variables (for example, countries) and inform each other about changes and additions - dataset version control system is able to track changes in datasets - other researchers can download their own copy of the data if dataset is published as Open Data Dataverse in flexible metadata store (Dataverse) that connected with Research datasets storage by data processing engine
  • 3. Interactive Clio Infra Dashboard with data in Dataverse (2015)
  • 4. DANS Dataverse 3.x migration (2016) Basic DataverseNL services: • Federated login for Netherlands institutions • Persistent Identifier Services (DOI and handle) • Integration with archival systems Applications: • Modern and historical world maps visualisations • Data API and Geo API services for projects with data • Panel datasets constructor • Time series plot • Treemaps • Pie and chart visualizations • Descriptive statistics tools
  • 5. Major challenges to provide services for researchers ● Maintenance concerns - who will be in charge after project is finished? ● Infrastructure problems - how to install and run tools for researchers? ● Various Interoperability issues - how to leverage data exchange between different systems and services Software updates and bug fixing, licences, technical staff training, legal aspects and so on...
  • 6. The influence of APIs standards on innovation Source: V. Tykhonov “API Economy”
  • 7. Interoperability in EOSC ● Technical interoperability defined as the “ability of different information technology systems and software applications to communicate and exchange data”. It should allow “to accept data from each other and perform a given task in an appropriate and satisfactory manner without the need for extra operator intervention”. ● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data”. ● Organisational interoperability refers to the “way in which organisations align their business processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial goals. Focus on the requirements of the user community by making services available, easily identifiable, accessible and user-focused”. ● Legal interoperability covers “the broader environment of laws, policies, procedures and cooperation agreements” Source: EOSC Interoperability Framework v1.0
  • 8. Open vs Closed Innovation
  • 9. DANS Data Stations - Future Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 10. Dataverse architecture in the nutshell Basic components: Database (postgres), search index (solr) and web application (Glassfish/Payara) Simple but powerful! How about maintenance?
  • 11. Dataverse Docker module (CESSDA Dataverse, 2018) Source: https://github.com/IQSS/dataverse-docker
  • 12. The Cathedral and the Bazaar “The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary (abbreviated CatB) is an essay, and later a book, by Eric S. Raymond on software engineering methods, based on his observations of the Linux kernel development process and his experiences managing an open source project, fetchmail. It examines the struggle between top-down and bottom-up design.” Wikipedia Some important points: ● Smart data structures and dumb code works a lot better than the other way around ● When writing gateway software of any kind, take pains to disturb the data stream as little as possible—and never throw away information unless the recipient forces you to! ● Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected
  • 13. Principle of good enough The principle of good enough or "good enough" principle is a rule in software and systems design. It indicates that consumers will use products that are good enough for their requirements, despite the availability of more advanced technology. Wikipedia The KISS Principle of "Keep it Simple, Stupid” provides a series of design rules, some of them: ● Separate mechanisms from policy ● Write simple programs ● Write transparent programs ● Value developer time over machine time ● Make data complicated when required, not the program ● Build on potential users' expected knowledge ● Write programs which fail in a way that is easy to diagnose ● Prototype software before polishing it ● Make the program and protocols extensible
  • 14. What should be simplified to make Dataverse “good enough”? “One-liner” installation requirements include: ● even users without any technical knowledge should be able to install it ● simple, clear and transparent infrastructure ready for integration (Docker based) ● reverse proxy and load balancer should be set up both locally and on a remote host to run Dataverse website (Nginx/Traefik) Q: How do we cross the chasm? A: Let’s try to capture the mainstream!
  • 15. Using Dataverse to fight against COVID-19 1300+ people registered in the organization 15
  • 16. Jupyter integration: datasets conversion to pandas dataframe Can AI researchers read and reuse data directly from Dataverse in a collaborative way?
  • 17. Crossing the chasm... The technology adoption requires further automation of all processes. Our goal is to deliver production ready Dataverse for the European Open Science Cloud (EOSC): ● SSHOC project: Docker/Kubernetes, common CI/CD pipeline, integration tests, previewers, language localization, external tools ● EOSC Synergy Software Quality Assurance (SqaaS) pipeline integration ● CLARIAH - leveraging metadata schema with CLARIN community, CLARIN tools integration, development common pipelines ● FAIRsFAIR - enabling FAIR Data Points (FDP) in Dataverse ● ODISSEI - using Dataverse as a data registry
  • 18. Services in European Open Science Cloud (EOSC) ● EOSC requires the level 8 of maturity (at least) ● we need the highest quality of software to be accepted as a service ● clear and transparent evaluation of services is essential ● the evidence of technical maturity is the key to success ● the limited warranty will allow to stop out-of- warranty services
  • 19. Running Dataverse in production on Cloud HTTP(S) Load Balancer Kubernetes Engine Container Registry Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment PostgreS QL Service Solr Deployment PostgreSQL Deployment Users Certbot Cronjob Email Relay Deployment Certbot Service Email relay Service Dataverse Service Solr Service
  • 20. Dataverse Kubernetes Project maintained by Oliver Bertuch (FZ Julich) and available in Global Dataverse Community Consortium github (GDCC) Google Cloud, Amazon AWS, Microsoft Azure platforms supported Open Source, community pull requests are welcome http://github.com/IQSS/dataverse-kubernetes
  • 21. SQA process with Selenium tests for Dataverse Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by community to increase reproducibility SQA for the service maturity = unit tests + integration tests 21 Source: SSHOC project, data repositories task WP5.2
  • 22. CI/CD pipeline with SQAaaS (S) 1 2 3 git push Push GCP container registry webhook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 9 7 Run SQA S 8 1. Developer pushes code to GitHub 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. (S) Runs SQA tests and does FAIRness check 5. (S) Issuing digital badge according to the results 6. (S) SQAaaS API triggers appropriate workflow 7. Creates docker image if success 8. Pushes new docker image to container registry 9. Updates the kubernetes deployment 22 Source: EOSC Synergy project
  • 23. Data Commons is essential for integrations Merce Crosas, “Harvard Data Commons”
  • 24. FAIR Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 25. Our goals to increase Dataverse interoperability Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and CVs: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies
  • 26. One metadata field can be linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 27. The FAIR Signposting Profile Herbert Van de Sompel https://hvdsomp.info Two levels of access to Web resources: ● level one provides a concise set of links or a minimal set of links by value in the HTTP header ● level two delivers a complete comprehensive set of links by reference meaning in a standalone document (link set)
  • 28. Dataverse meta(data) in FAIR Data Point (FDP) ● RESTful web service that enables data owners to expose their data sets using rich machine-readable metadata ● Provides standardized descriptions (RDF-based metadata) using controlled vocabularies and ontologies ● FDP spec is public Source: FDP The goal is to run FDP on Dataverse side (DCAT, CVs) and provide metadata export in RDF!
  • 29. F-UJI Automated FAIR Data Assessment Tool
  • 30. Dataverse localization with Weblate ● service to connect files to Weblate in order to translate them in a structured way ● several options for project visibility: accept translations by the crowd, or only give access to a select group of translators. ● Weblate indicates untranslated strings, strings with failing checks, and strings that need approval. ● when new strings are added with an upgrade of Dataverse, Weblate can indicate which strings are new and untranslated.
  • 31. GUI translation with Weblate as a service Source: SSHOC Weblate
  • 32. Dataverse App Store Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML Interoperability: external controlled vocabularies (CESSDA CV Manager) Data processing: NESSTAR DDI migration tool Linked Data: RDF compliance including SPARQL endpoint (FDP) Federated login: eduGAIN, PIONIER ID CLARIN Switchboard integration: Natural Language Processing tools Visualization tools (maps, charts, timelines)
  • 33. Dataverse and CLARIN tools integration
  • 34. Make Data Count Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics Working Group which helped to produce a specification called the COUNTER Code of Practice for Research Data. The following metrics can be downloaded directly from the DataCite hub for datasets hosted by Dataverse installations: ● Total Views for a Dataset ● Unique Views for a Dataset ● Total Downloads for a Dataset ● Downloads for a Dataset ● Citations for a Dataset (via Crossref) Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape monitoring.
  • 35. Metrics for BI and integration with Apache Superset Source: Apache Superset (Open Source)
  • 37. Apache Airflow for Dataverse pipelines ● Intended for acyclic processes, around those processing data with a point of "completion." ● DAG (Directed Acyclic Graph) is a collection of all the tasks organized in a way that reflects their relationships and dependencies ● absolutely essential component for the harvesting and depositing data ● Airflow dashboard allows to get a clear overview and status of all running processes On the roadmap of ODISSEI project!
  • 38. Conclusion Due to the open architecture and the use of open standards, Dataverse team has managed to attract the best people and create a strong community, and finally build a product completely aligned with principles of Open Innovation. Suitable for the future, community-driven, it has all chances to “cross the chasm” and become a prominent FAIR data repository on all continents. Dataverse already has a very rich ecosystem for technological innovation that will allow to integrate tools which don't exist yet. “Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected”...
  • 39. Questions? Slava Tykhonov, Senior Information Scientist vyacheslav.tykhonov@dans.knaw.nl