SlideShare una empresa de Scribd logo
1 de 52
dans.knaw.nl
DANS is een instituut van KNAW en NWO
The world of Docker and Kubernetes
How to create, set up and manage
Kubernetes cluster at DANS: Dataverse pilot
Slava Tykhonov, Senior Information Scientist
Wilko Steinhoff, Senior Software Developer
(DANS-KNAW, The Hague, Netherlands)
11.02.2020
Why do we need Cloud Computing?
“Cloud computing is a style of computing in which scalable and
elastic IT is delivered as a service using Internet technologies.”
“Cloud Computing is transforming the way organisations
consume computer services.”
“We can run all our workload data of applications and
processes online over the internet remotely instead of using
physical hardware and software.”
“It’s less expensive and more secure.”
Dataverse is our Pilot Cloud Service
Dataverse as a FOSS product: good news
• Dataverse is Open Source software
• Great community with more than 100 contributors
• Contributions are coming from all continents
• Maintenance cost reduces as all community members are
using the same software and helping to each other
• Governance models can be reused by different countries
• Innovation in Dataverse community goes very fast
Dataverse as a FOSS product: bad news
• Open Source doesn’t mean Free!
• Consider all required resources: both hardware and human
• Building a service is difficult, maintenance is expensive
• Integration with other services requires the management of
changes and sometimes even not possible
• technical development is fast, the expertise isn’t up-to-date
• requires continuous training and very good communication
between all partners
Dataverse Installation Guide
Instructions
http://guides.dataverse.org/
en/latest/installation/
Before you start: installation
requires preparation!
Installation problems
Dataverse basic infrastructure seems to be very simple:
- application (Java deployed on Glassfish web server)
- database (postgres)
- search engine (SOLR)
If you’ll follow the guide and will do installation manually…
there is a great chance that it will not work.
Why?!
You never know where problem lies...
● OS specific issues
● application specific bugs
● the difference between the
database version(s)
● search engine update(s)
● security patches
● hardware issues
● open/closed ports on your server
It’s even more complicated if you need
to patch the software and update a
working infrastructure every time…
locally, on test/acceptance/production.
Typical infrastructure issues
And after it finally works the security
guy is telling you that all microservices
ports on all servers should be closed…
or there is an update of software
pieces that can break the service
or brand new chinese bot is putting
your service down
or something else is happening...
Do you remember? You have to reproduce and fix it
locally, on test/acceptance/production?
Software Testing Process
Maintenance vs development
Typical outcome: hundred/thousands of hours are lost, $$$,
maintenance efforts dominating over development.
Btw, the picture is clickable….
Quiet software development
That’s how not maintainable projects are typically dying… R.I.P.
FAIRness of Software
Open Source vs Closed Source
Dark side of the Moon
Source: V. Tykhonov, API economy: transformation from closed to open innovation
Open Source paradigm for Sharing economy
Dataverse Unleashed
Dataverse isn’t competing against Figshare, Zenodo,
DSpace, CKAN, EASY or others…
Dataverse is a platform to build new innovative things
together, and to integrate all the other services.
Using Dataverse means you can join the Sharing
Economy in data and speed up own innovation based
on the community developments.
Shared economy in the data landscape
● all partners are running the same basic data infrastructure
● source code is Open Source and shared
● community is making decisions about priorities
● new custom requirements can be implemented
independently by anyone and merged with master
(upstream)
● sustainability of software: not maintained components
usually replaced with well-maintained during the evolution
of the product
● two and more technical solutions of the same problem are
more than welcome
● the maturity of community mean the maturity of software
Do you want to join? Use Docker for your software!
Sometimes innovation means less communication
“Docker offered a way to create independence between the
application and the infrastructure through a standardized
container format that could be created with easy-to-use
tooling.”
David Messina, CMO at Docker
And now honestly ask yourself: how much time you’re spending to talk
and convince sysadmins to enable or install some tools you need?
To another developer working on the same code?
To reproduce the same bug on test/acceptance/production?
Docker features
• Extremely powerful configuration tool
• Allows to install software on any platform (Linux, Mac,
Windows)
• Any software can be installed from Docker as standalone
container or container delivering Microservices (database,
search engine, core service)
• Docker allows to host unlimited amount of the same
software tools on different ports
• Docker can be used to organise multilingual interfaces, for
example
Docker advantages
• Faster development and deployments
• Isolation of running containers allows to scale up apps
• Portability saves time to run the same image on the local
computer or in the cloud
• Snapshotting allows to archive Docker images state
• Resource limitation can be adjusted
Dataverse Docker module
This module was developed in one-year CESSDA DataverseEU
project and aimed for CESSDA Service Providers who have
limited technical resources. DANS led this project.
The goal was to deploy Dataverse software on CESSDA
Technical Infrastructure (Google Cloud). Project was funded
by the CESSDA 2018 workplan.
DataverseEU partners: ADP (Slovenia), AUSSDA (Austria),
GESIS (Germany), SND (Sweden), TARKI (Hungary),
SiencePro (France), UKDA (UK), UniData (Italy), SODA
(Belgium), LSZDA (Latvia), DANS (Netherlands)
Docker deployment with k8s in Clouds
• Google Cloud (policy for CESSDA SaW)
• Microsoft Azure
• Amazon Cloud
• OpenShift Cloud
• local Docker installation (minikube)
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Example: Dataverse as set of Docker microservices
Docker Desktop (Community Edition)
Ideal for developers and small teams looking to get started
with Docker https://www.docker.com/community-edition
Features:
- docker-for-desktop
- docker-compose support
- integrated kubernetes (minikube)
- kitematic: Visual Docker Container Management
Docker Hub
Docker Hub is registry containing images
Example: https://hub.docker.com/_/httpd/
$ docker pull httpd
Push images to Docker Hub: https://docs.docker.com/docker-
cloud/builds/push-images/
$ docker login
$ docker tag my_image $DOCKER_ID_USER/my_image
$ docker push $DOCKER_ID_USER/my_image
Docker concepts
• Containers are runnable artefacts
• Images are copies of containers with filesystems
• Containers can be archived as images and executed in
different clouds
• Images can preserved in repositories
https://act.dataverse.nl/dataset.xhtml?persistentId=hdl:106
95/9VCRBR
• data folders can be hosted outside of containers on
persistent volumes.
Hello world app (Flask application)
Dockerfile https://github.com/DANS-KNAW/parthenos-
widget/blob/master/Dockerfile
FROM python:2.7
MAINTAINER Vyacheslav Tykhonov
COPY . /widget
WORKDIR /widget
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["app.py"]
Docker command line usage
Command line allows to manage containers and images and
execute Docker commands
$ docker help run
$ docker ps
$ docker login
$ docker pull, push, commit
$ docker build, run
$ docker exec
$ docker stop, rm, rmi
Typical Docker pipeline
Install all dependencies and build tool from scratch:
$ docker build -t parthenos:latest .
Run image from command line
$ docker run -p 8081:8081 -name parthenos parthenos
Check if container is running
$ docker ps|grep parthenos
Login inside of the container
$ docker exec -it [CONTAINER_ID] /bin/bash
Copy configuration inside of the container
$ docker cp ./parthenos.config [CONTAINER_ID]:/widget
Copy from container to local folder
$ docker [CONTAINER_ID]:/widget/* ./
Ship “dockerized” app to the world (Docker Hub or another registry)
$ docker push [IMAGE_ID]
Pipeline explanation
Credits: Arun Gupta, Package your Java EE Application using Docker and Kubernetes
Docker archiving process
Easy process to archive running software, metadata and data
separately
https://docs.docker.com/engine/reference/commandline/save/
• postgresql database with metadata and users information
• datasets files in separate folder
• software image with some individual settings
$ docker save -o archive.tar [CONTAINER_ID]
Easy to restore complete system with data and metadata by
Docker composer.
$ docker load archive.tar
Docker Compose
Management tool for Docker configuration for multicontainer solutions
All connections, networks, containers, port specifications stored in one file
(YML specification)
Example (DataverseEU):
http://github.com/IQSS/dataverse-docker
Tool to turn Docker Compose to Kubernetes config called Kompose:
https://github.com/kubernetes/kompose
Usage:
$ docker-compose [something]
Docker Compose is perfect tool to keep the PROVenance of software
(versions control, etc)
Dataverse Docker containers exploration
# Show Docker images
docker images
# Show all running containers
docker ps
# Remove Docker image by container_id (don’t execute)
docker rmi container_id
# Delete old images (don’t execute)
docker rmi `docker images -aq`
# To access Dataverse container, type exit to quit
docker exec -it dataverse /bin/bash
# PostgreSQL container, exit to quit
docker exec -it postgres /bin/bash
# Solr container, exit to quit
docker exec -it solr /bin/bash
# Copy files and folders to the running container
docker cp ./testfile dataverse:/tmp/
# Copy files and folders from the running container to your disk space
docker cp dataverse:/opt/dv/dvinstall.zip /tmp/
# Stop Dataverse container
docker stop dataverse
# Run Dataverse container
docker start dataverse
Dataverse maintenance with Docker
# Open the page with latest Dataverse release https://github.com/IQSS/dataverse/releases
# Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema
docker exec -it dataverse /bin/bash
wget https://github.com/IQSS/dataverse/releases/download/v4.18.1/dataverse-4.18.1.war -
O dataverse.war
asadmin undeploy dataverse
rm -rf glassfish4/glassfish/domains/domain1/generated
asadmin deploy ./dataverse.war
asadmin restart
# After Glassfish will restart go to 0.0.0.0:8085 and check the version of Dataverse
# Remember: you’ll lose all changes in your Docker container after restart!
Maintenance of Docker infrastructure
# Go to hub.docker.com and create an account.
# Login with your credentials, remember your_docker_name
docker login
# Let’s create image out of the running Dataverse container
docker commit dataverse
# New image will be available on top
docker images
# Let’s put a tag on image and update internal Docker registry, replace your_docker_name
docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1
# Push new image to Docker Hub
docker push [your_docker_name]/dataverse:4.18.1
# Go to Docker Hub to check if the repo was updated:
https://hub.docker.com/r/[your_docker_name]/dataverse
# Visit the page https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container-
image-to-docker-hub if your need more information about the update of Docker images
dans.knaw.nl
DANS is an institute of KNAW and NWO
How to set up, configure and manage Kubernetes clusters managed by
DANS. With emphasis on its architecture, ict-support and devops
POC Azure
management
Azure
Best practises in using and managing the DANS Azure-
subscription.
Azure: Cloud computing platform by Microsoft.
Azure@DANS is provided by SURFcumulus.
Cloud resources, like:
⮚-Virtual Machine (VM)
⮚-Storage (disk)
⮚-SQL database
⮚-Kubernetes (AKS)
Kubernetes
Open-source container-orchestration system for
automating application deployment, scaling, and
management.
-Docker container Orchestration.
-Infrastructure as Code
-Use of Health checks, restarting applications.
-(Auto)scaling cluster (horizontally and vertically).
-Controlled use of resources (CPU, Memory).
-Setup application stack for local development.
Best K8S practices
In this project we’ll look into some best K8S
practices for DANS.
Based on issues raised from earlier POC’s.
-Docker@DANS (2018)
-HUC2 POC (2019)
- Cluster Architecture
Application-wide or organisation-wide?
DTAP: Development, Testing, Acceptance and Production.
- How to separate different applications on a cluster.
- Can we separate responsibilities between ICT-Support and
developers?
Supply Persistent Storage classes by ICT-support that can be claimed by
developers.
Use of Role Based Access Control (RBAC).
- Tooling used to develop and deploy to a cluster?
Skaffold (build automation/deployment) and Helm (package manager)
- Use Infrastructure as Code (IaC) to provision and manage
"Azure" cloud infrastructure.
Bash scripts or Terraform.
- How to use "external" resources in a cluster.
SURF-object-storage (SWIFT), VANCIS
- Cluster costs management.
Downscaling a (development) cluster. Resource caps.
- Provide cluster-broad services.
Sending email, Auto-SSL certification, Monitoring (Prometheus),
Pipelining, etc.
Dataverse Cloud architecture
Ingress
HTTP(S) Load Balancer
Kubernetes Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment Dataverse Service
Solr Deployment
Solr
Service
PostgreSQL
Service
PostgreSQL Deployment
Users
Kubernetes Engine
Compute Engine
Dataverse
Service
Kubernetes Cluster
Users
K8S Cluster Node2
Docker
Hub
Container Registry
K8S Cluster Node1
How to scale up Kubernetes horizontally
Kubernetes Engine
Compute Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node1
Users
K8S Cluster Node2
Docker Hub
Container Registry
The importance of Persistent Storage
Docker containers write files to disk (I/O) for state or storage,
both in /data and /docroot folders. If a Docker container is
restarted for some reason, all data will be lost.
Solution: mount Persistent storage into the container on external
disk hosted in the Cloud.
Running Dataverse in production
HTTP(S) Load
Balancer Kubernetes Engine
Container Registry
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment
PostgreS
QL
Service
Solr Deployment
PostgreSQL Deployment
Users
Certbot Cronjob
Email Relay Deployment
Certbot
Service
Email
relay
Service
Dataverse Service
Solr
Service
Continuous deployment pipeline
1
2
3
git
push
Push GCP
container
registry
webh
ook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline
(Jenkinsfile)
75
Run tests
4 6
1. Developer pushes code to Bitbucket
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. Runs tests
5. Creates docker image
6. Pushes the docker image to GCP
container registry
7. Updates the kubernetes deployment
Distributed Dataverse infra on Kubernetes
● Network of Dataverses with central portal to host metadata and
multiple Dataverse nodes
● Testing strategies with Selenium and Cypress
● Unit tests, integration tests and Jenkins CI/CD pipeline
● Running external applications on Kubernetes infrastructure,
OpenAIRE Amnesia tool
● Multiple languages support and maintenance, Weblate as a
service
● Using iRODS to support multiple storages for different datasets
Maintenance of distributed networks
● The maintenance of the distributed applications is very
difficult and expensive
● requires the highest level of service maturity
● increasing the code coverage does not necessarily lead to
more functionality coverage
● writing integration tests even more important than adding
more unit tests
● it’s almost not possible to run distributed services without
the help from community
Quality Assurance (QA) as a community service
Selenium IDE
allows to create
and replay all
UI tests in your
browser
Shared tests
can be reused
by Dataverse
CI/CD pipeline
Let’s work
together on it!
Example of Selenium .side file
● .side is the extension for
the new selenium ide
tests
● json format, every section
describes some action
● template rules can be
used by Selenium
webdriver
● can be easily integrated
in Continuous deployment
pipeline with Jenkins jobs
● running SIDE Runner with
the given parameters can
even test the different
components!
Questions?

Más contenido relacionado

La actualidad más candente

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 

La actualidad más candente (20)

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 

Similar a The world of Docker and Kubernetes

Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
Chris Ciborowski
 

Similar a The world of Docker and Kubernetes (20)

Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
 
Docker In Brief
Docker In BriefDocker In Brief
Docker In Brief
 
VS Code tools for docker
VS Code tools for dockerVS Code tools for docker
VS Code tools for docker
 
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
 
DockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General Session
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday Docker
 
Docker for dev
Docker for devDocker for dev
Docker for dev
 
What is Docker?
What is Docker?What is Docker?
What is Docker?
 
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - PresentationDocker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - Presentation
 
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
 
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 

Más de vty

Más de vty (8)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Último (20)

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

The world of Docker and Kubernetes

  • 1. dans.knaw.nl DANS is een instituut van KNAW en NWO The world of Docker and Kubernetes How to create, set up and manage Kubernetes cluster at DANS: Dataverse pilot Slava Tykhonov, Senior Information Scientist Wilko Steinhoff, Senior Software Developer (DANS-KNAW, The Hague, Netherlands) 11.02.2020
  • 2. Why do we need Cloud Computing? “Cloud computing is a style of computing in which scalable and elastic IT is delivered as a service using Internet technologies.” “Cloud Computing is transforming the way organisations consume computer services.” “We can run all our workload data of applications and processes online over the internet remotely instead of using physical hardware and software.” “It’s less expensive and more secure.” Dataverse is our Pilot Cloud Service
  • 3. Dataverse as a FOSS product: good news • Dataverse is Open Source software • Great community with more than 100 contributors • Contributions are coming from all continents • Maintenance cost reduces as all community members are using the same software and helping to each other • Governance models can be reused by different countries • Innovation in Dataverse community goes very fast
  • 4. Dataverse as a FOSS product: bad news • Open Source doesn’t mean Free! • Consider all required resources: both hardware and human • Building a service is difficult, maintenance is expensive • Integration with other services requires the management of changes and sometimes even not possible • technical development is fast, the expertise isn’t up-to-date • requires continuous training and very good communication between all partners
  • 6. Installation problems Dataverse basic infrastructure seems to be very simple: - application (Java deployed on Glassfish web server) - database (postgres) - search engine (SOLR) If you’ll follow the guide and will do installation manually… there is a great chance that it will not work. Why?!
  • 7. You never know where problem lies... ● OS specific issues ● application specific bugs ● the difference between the database version(s) ● search engine update(s) ● security patches ● hardware issues ● open/closed ports on your server It’s even more complicated if you need to patch the software and update a working infrastructure every time… locally, on test/acceptance/production.
  • 8. Typical infrastructure issues And after it finally works the security guy is telling you that all microservices ports on all servers should be closed… or there is an update of software pieces that can break the service or brand new chinese bot is putting your service down or something else is happening... Do you remember? You have to reproduce and fix it locally, on test/acceptance/production?
  • 10. Maintenance vs development Typical outcome: hundred/thousands of hours are lost, $$$, maintenance efforts dominating over development. Btw, the picture is clickable….
  • 11. Quiet software development That’s how not maintainable projects are typically dying… R.I.P.
  • 12. FAIRness of Software Open Source vs Closed Source
  • 13. Dark side of the Moon Source: V. Tykhonov, API economy: transformation from closed to open innovation
  • 14. Open Source paradigm for Sharing economy
  • 15. Dataverse Unleashed Dataverse isn’t competing against Figshare, Zenodo, DSpace, CKAN, EASY or others… Dataverse is a platform to build new innovative things together, and to integrate all the other services. Using Dataverse means you can join the Sharing Economy in data and speed up own innovation based on the community developments.
  • 16. Shared economy in the data landscape ● all partners are running the same basic data infrastructure ● source code is Open Source and shared ● community is making decisions about priorities ● new custom requirements can be implemented independently by anyone and merged with master (upstream) ● sustainability of software: not maintained components usually replaced with well-maintained during the evolution of the product ● two and more technical solutions of the same problem are more than welcome ● the maturity of community mean the maturity of software Do you want to join? Use Docker for your software!
  • 17. Sometimes innovation means less communication “Docker offered a way to create independence between the application and the infrastructure through a standardized container format that could be created with easy-to-use tooling.” David Messina, CMO at Docker And now honestly ask yourself: how much time you’re spending to talk and convince sysadmins to enable or install some tools you need? To another developer working on the same code? To reproduce the same bug on test/acceptance/production?
  • 18. Docker features • Extremely powerful configuration tool • Allows to install software on any platform (Linux, Mac, Windows) • Any software can be installed from Docker as standalone container or container delivering Microservices (database, search engine, core service) • Docker allows to host unlimited amount of the same software tools on different ports • Docker can be used to organise multilingual interfaces, for example
  • 19. Docker advantages • Faster development and deployments • Isolation of running containers allows to scale up apps • Portability saves time to run the same image on the local computer or in the cloud • Snapshotting allows to archive Docker images state • Resource limitation can be adjusted
  • 20. Dataverse Docker module This module was developed in one-year CESSDA DataverseEU project and aimed for CESSDA Service Providers who have limited technical resources. DANS led this project. The goal was to deploy Dataverse software on CESSDA Technical Infrastructure (Google Cloud). Project was funded by the CESSDA 2018 workplan. DataverseEU partners: ADP (Slovenia), AUSSDA (Austria), GESIS (Germany), SND (Sweden), TARKI (Hungary), SiencePro (France), UKDA (UK), UniData (Italy), SODA (Belgium), LSZDA (Latvia), DANS (Netherlands)
  • 21. Docker deployment with k8s in Clouds • Google Cloud (policy for CESSDA SaW) • Microsoft Azure • Amazon Cloud • OpenShift Cloud • local Docker installation (minikube)
  • 22. dans.knaw.nl DANS is een instituut van KNAW en NWO
  • 23. Example: Dataverse as set of Docker microservices
  • 24. Docker Desktop (Community Edition) Ideal for developers and small teams looking to get started with Docker https://www.docker.com/community-edition Features: - docker-for-desktop - docker-compose support - integrated kubernetes (minikube) - kitematic: Visual Docker Container Management
  • 25. Docker Hub Docker Hub is registry containing images Example: https://hub.docker.com/_/httpd/ $ docker pull httpd Push images to Docker Hub: https://docs.docker.com/docker- cloud/builds/push-images/ $ docker login $ docker tag my_image $DOCKER_ID_USER/my_image $ docker push $DOCKER_ID_USER/my_image
  • 26. Docker concepts • Containers are runnable artefacts • Images are copies of containers with filesystems • Containers can be archived as images and executed in different clouds • Images can preserved in repositories https://act.dataverse.nl/dataset.xhtml?persistentId=hdl:106 95/9VCRBR • data folders can be hosted outside of containers on persistent volumes.
  • 27. Hello world app (Flask application) Dockerfile https://github.com/DANS-KNAW/parthenos- widget/blob/master/Dockerfile FROM python:2.7 MAINTAINER Vyacheslav Tykhonov COPY . /widget WORKDIR /widget RUN pip install -r requirements.txt ENTRYPOINT ["python"] CMD ["app.py"]
  • 28. Docker command line usage Command line allows to manage containers and images and execute Docker commands $ docker help run $ docker ps $ docker login $ docker pull, push, commit $ docker build, run $ docker exec $ docker stop, rm, rmi
  • 29. Typical Docker pipeline Install all dependencies and build tool from scratch: $ docker build -t parthenos:latest . Run image from command line $ docker run -p 8081:8081 -name parthenos parthenos Check if container is running $ docker ps|grep parthenos Login inside of the container $ docker exec -it [CONTAINER_ID] /bin/bash Copy configuration inside of the container $ docker cp ./parthenos.config [CONTAINER_ID]:/widget Copy from container to local folder $ docker [CONTAINER_ID]:/widget/* ./ Ship “dockerized” app to the world (Docker Hub or another registry) $ docker push [IMAGE_ID]
  • 30. Pipeline explanation Credits: Arun Gupta, Package your Java EE Application using Docker and Kubernetes
  • 31. Docker archiving process Easy process to archive running software, metadata and data separately https://docs.docker.com/engine/reference/commandline/save/ • postgresql database with metadata and users information • datasets files in separate folder • software image with some individual settings $ docker save -o archive.tar [CONTAINER_ID] Easy to restore complete system with data and metadata by Docker composer. $ docker load archive.tar
  • 32. Docker Compose Management tool for Docker configuration for multicontainer solutions All connections, networks, containers, port specifications stored in one file (YML specification) Example (DataverseEU): http://github.com/IQSS/dataverse-docker Tool to turn Docker Compose to Kubernetes config called Kompose: https://github.com/kubernetes/kompose Usage: $ docker-compose [something] Docker Compose is perfect tool to keep the PROVenance of software (versions control, etc)
  • 33. Dataverse Docker containers exploration # Show Docker images docker images # Show all running containers docker ps # Remove Docker image by container_id (don’t execute) docker rmi container_id # Delete old images (don’t execute) docker rmi `docker images -aq` # To access Dataverse container, type exit to quit docker exec -it dataverse /bin/bash # PostgreSQL container, exit to quit docker exec -it postgres /bin/bash # Solr container, exit to quit docker exec -it solr /bin/bash # Copy files and folders to the running container docker cp ./testfile dataverse:/tmp/ # Copy files and folders from the running container to your disk space docker cp dataverse:/opt/dv/dvinstall.zip /tmp/ # Stop Dataverse container docker stop dataverse # Run Dataverse container docker start dataverse
  • 34. Dataverse maintenance with Docker # Open the page with latest Dataverse release https://github.com/IQSS/dataverse/releases # Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema docker exec -it dataverse /bin/bash wget https://github.com/IQSS/dataverse/releases/download/v4.18.1/dataverse-4.18.1.war - O dataverse.war asadmin undeploy dataverse rm -rf glassfish4/glassfish/domains/domain1/generated asadmin deploy ./dataverse.war asadmin restart # After Glassfish will restart go to 0.0.0.0:8085 and check the version of Dataverse # Remember: you’ll lose all changes in your Docker container after restart!
  • 35. Maintenance of Docker infrastructure # Go to hub.docker.com and create an account. # Login with your credentials, remember your_docker_name docker login # Let’s create image out of the running Dataverse container docker commit dataverse # New image will be available on top docker images # Let’s put a tag on image and update internal Docker registry, replace your_docker_name docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1 # Push new image to Docker Hub docker push [your_docker_name]/dataverse:4.18.1 # Go to Docker Hub to check if the repo was updated: https://hub.docker.com/r/[your_docker_name]/dataverse # Visit the page https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container- image-to-docker-hub if your need more information about the update of Docker images
  • 36. dans.knaw.nl DANS is an institute of KNAW and NWO How to set up, configure and manage Kubernetes clusters managed by DANS. With emphasis on its architecture, ict-support and devops POC Azure management
  • 37. Azure Best practises in using and managing the DANS Azure- subscription. Azure: Cloud computing platform by Microsoft. Azure@DANS is provided by SURFcumulus. Cloud resources, like: ⮚-Virtual Machine (VM) ⮚-Storage (disk) ⮚-SQL database ⮚-Kubernetes (AKS)
  • 38. Kubernetes Open-source container-orchestration system for automating application deployment, scaling, and management. -Docker container Orchestration. -Infrastructure as Code -Use of Health checks, restarting applications. -(Auto)scaling cluster (horizontally and vertically). -Controlled use of resources (CPU, Memory). -Setup application stack for local development.
  • 39. Best K8S practices In this project we’ll look into some best K8S practices for DANS. Based on issues raised from earlier POC’s. -Docker@DANS (2018) -HUC2 POC (2019)
  • 40. - Cluster Architecture Application-wide or organisation-wide? DTAP: Development, Testing, Acceptance and Production. - How to separate different applications on a cluster. - Can we separate responsibilities between ICT-Support and developers? Supply Persistent Storage classes by ICT-support that can be claimed by developers. Use of Role Based Access Control (RBAC). - Tooling used to develop and deploy to a cluster? Skaffold (build automation/deployment) and Helm (package manager)
  • 41. - Use Infrastructure as Code (IaC) to provision and manage "Azure" cloud infrastructure. Bash scripts or Terraform. - How to use "external" resources in a cluster. SURF-object-storage (SWIFT), VANCIS - Cluster costs management. Downscaling a (development) cluster. Resource caps. - Provide cluster-broad services. Sending email, Auto-SSL certification, Monitoring (Prometheus), Pipelining, etc.
  • 42. Dataverse Cloud architecture Ingress HTTP(S) Load Balancer Kubernetes Engine Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment Dataverse Service Solr Deployment Solr Service PostgreSQL Service PostgreSQL Deployment Users
  • 43. Kubernetes Engine Compute Engine Dataverse Service Kubernetes Cluster Users K8S Cluster Node2 Docker Hub Container Registry K8S Cluster Node1
  • 44. How to scale up Kubernetes horizontally Kubernetes Engine Compute Engine Dataverse Service Kubernetes Cluster K8S Cluster Node1 Users K8S Cluster Node2 Docker Hub Container Registry
  • 45. The importance of Persistent Storage Docker containers write files to disk (I/O) for state or storage, both in /data and /docroot folders. If a Docker container is restarted for some reason, all data will be lost. Solution: mount Persistent storage into the container on external disk hosted in the Cloud.
  • 46. Running Dataverse in production HTTP(S) Load Balancer Kubernetes Engine Container Registry Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment PostgreS QL Service Solr Deployment PostgreSQL Deployment Users Certbot Cronjob Email Relay Deployment Certbot Service Email relay Service Dataverse Service Solr Service
  • 47. Continuous deployment pipeline 1 2 3 git push Push GCP container registry webh ook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 75 Run tests 4 6 1. Developer pushes code to Bitbucket 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. Runs tests 5. Creates docker image 6. Pushes the docker image to GCP container registry 7. Updates the kubernetes deployment
  • 48. Distributed Dataverse infra on Kubernetes ● Network of Dataverses with central portal to host metadata and multiple Dataverse nodes ● Testing strategies with Selenium and Cypress ● Unit tests, integration tests and Jenkins CI/CD pipeline ● Running external applications on Kubernetes infrastructure, OpenAIRE Amnesia tool ● Multiple languages support and maintenance, Weblate as a service ● Using iRODS to support multiple storages for different datasets
  • 49. Maintenance of distributed networks ● The maintenance of the distributed applications is very difficult and expensive ● requires the highest level of service maturity ● increasing the code coverage does not necessarily lead to more functionality coverage ● writing integration tests even more important than adding more unit tests ● it’s almost not possible to run distributed services without the help from community
  • 50. Quality Assurance (QA) as a community service Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by Dataverse CI/CD pipeline Let’s work together on it!
  • 51. Example of Selenium .side file ● .side is the extension for the new selenium ide tests ● json format, every section describes some action ● template rules can be used by Selenium webdriver ● can be easily integrated in Continuous deployment pipeline with Jenkins jobs ● running SIDE Runner with the given parameters can even test the different components!