2. Visualise Covid-19 data
Using the Elastic Stack
Elastic Slovak user group meetup – Virtual edition
Rado Ondáš
23. 09. 2020
3. Before we start
o First virtual meetup, please be patient :)
o Zoom application environment – questions?
o Please mute when not talking
o We are recording this meetup
o Use chat or raise a question
o Feel free to ask at any time
o Ed will assist while I speak
Maintenance and notes about the virtual environment
4. Agenda
Welcome, Introduction and Community contributor program1
Visualise Covid-19 data Using the Elastic Stack - Slovakia3
AMA - Ask me anything4
Wrap up/Survey/What is next5
What is new in Elastic stack2
5. Introduction
o Remote consultancy and support for Customers
o Community engagement
o Training delivery
o Community beats development
o Interested in data sets of any kind – do you have some?
Rado Ondáš - Sr. Support eng. @Elastic - 4 years
6. Elastic Slovak User Group
… the plan from about about a year ago …
o ~5+ meetups
o More speakers & more topic
... ffw to 2020 – reality check ...
o 1st meetup
o Virtual only
o Can we hit one more with your help?
7. Elastic Contributor Program
elastic.co/community/contributor
We're excited to announce the global launch of the Elastic
Contributor Program, which recognizes the hard work of our
awesome contributors!
Start contributing code, presentations, tutorials, and more today to
earn yourself a spot on the leaderboard and the chance to win free
training, Elastic swag, bragging rights, and more.
Elastic Contributor Program
8. Elastic Contributor Program
Prizes and Awards
*Flights and travel expenses are not included, and exchanges for prizes and benefits will not be allowed.
9. Elastic Contributor Program
Contribution Types
● Event Organization
● Presentations
● Written Content
● Video Tutorials
● Translations
● Code
● Contribution Validation
Contributions made between February 1, 2020 and January 31, 2021 are eligible for submission.
10. Agenda
Welcome, Introduction and Community contributor program
Visualise Covid-19 data Using the Elastic Stack - Slovakia3
AMA - Ask me anything4
Wrap up/Survey/What is next5
What is new in Elastic stack
1
2
12. What is new??
It depends
A lot is new everywhere
How to cover the topic from now on?
Where to start?
13. What now and how to handle changes
• In past we covered many features in Elastic Stack
• We even did a demo for some of them
• This is not possible any more
Strategy
• Focus on your needs or interest in the specific group of features
• Do not try to cover all products J
• Join/participate in meetups
• Share your knowledge and
• Ask us/me
Strategy
14. What now and how to handle changes
https://www.elastic.co/
RSS: https://www.elastic.co/blog/feed
Resources:
• Documentation
• Blog posts
• Release blog posts
• Release notes for each product e.g.: Elasticsearch release notes
• News
• Discuss forum
• Webinars
• Community
• Training
– Free training courses: https://www.elastic.co/training/free
Resources
16. Agenda
Welcome, Introduction and Community contributor program
Visualise Covid-19 data Using the Elastic Stack - Slovakia
2
AMA - Ask me anything4
Wrap up/Survey/What is next5
What is new in Elastic stack
1
3
19. Covid-19 – scope for Today
• What is NOT in the scope
– Discussion about the covid-19
– Any conspiration theories and speculations
– Politics, or decisions made
• What IS in the scope
– Data set discussion – yes, not a happy dataset but an actual one
– Architecture design of the pipeline
– How to access the data and ingest
– Visualizations and navigation in the Dashboard
– Any technical discussion
21. How to start – base for the meetup
• covid-19.radoondas.io
• Github repository with quick how-to
– github.com/radoondas/covid-19-Slovakia
• Blog post with more detailed instructions
– radoondas.io/posts/2020/visualise-covid-19-using-elastic-stack/
– Preferred option
Where to start?
22. The architecture
How it works
+----------+ +----------+ +---------------+ +--------+ +------+
| | | | | | | | | |
| CSV file +-----> Logstash +-----> Elasticsearch <-----> Kibana <-----+ User |
| | | | | | | | | |
+----------+ +----------+ +---------------+ +--------+ +------+
• CSV file as data input
• Logstash to process the data
• Elasticsearch as data store and the search engine
• Kibana as visualization tool
Ingest pipelline definition
23. The Data
• Slovakia has no official machine readable data source freely
accessible
– correct me if I am wrong
• Data presented Today
– my own collection located in the Github repository
– Updated daily (if possible)
• Data scraping history
– Early data very hard to get or scrape - chaos
– Often incomplete or incorrect
– Only sources were news, and some official web pages
– Better with korona.gov.sk development over the time
– Official Visualizations with data ‘export’ – nczisk.sk
• To get to the data is still a manual process
Where is the data for Slovakia?
24. Data description
• Columns
• date;city;infected;gender;note_1;note_2;healthy;died;region;age;district
• Description (important highlighted as bold+italic)
date Date - the date of the record
city City - the location of the person infected by covid-19
infected Infected - number of infected
gender Gender, M - male, Ž - female, D - children, X - unknown
note_1 Note 1
note_2 Note 2
healthy Healthy - number of people who recovered from the virus
died Dead - number people who died
region Region
age Age
district District
CSV formatted input
25. The data sample
22.09.2020;Neuvedené;0;X;;;220;0;Neuvedené;;Neuvedené
22.09.2020;Neuvedené;0;M;;;0;1;Trenčiansky;78;Neuvedené
22.09.2020;Neuvedené;61;X;;;0;0;Bratislavský;;Bratislava I
22.09.2020;Neuvedené;26;X;;;0;0;Žilinský;;Tvrdošín
22.09.2020;Neuvedené;20;X;;;0;0;Trnavský;;Skalica
22.09.2020;Neuvedené;16;X;;;0;0;Bratislavský;;Senec
22.09.2020;Neuvedené;16;X;;;0;0;Trnavský;;Trnava
22.09.2020;Neuvedené;15;X;;;0;0;Žilinský;;Námestovo
22.09.2020;Neuvedené;14;X;;;0;0;Banskobystrický;;Banská Bystrica
22.09.2020;Neuvedené;13;X;;;0;0;Košický;;Košice I
22.09.2020;Neuvedené;11;X;;;0;0;Trenčiansky;;Trenčín
22.09.2020;Neuvedené;9;X;;;0;0;Prešovský;;Prešov
22.09.2020;Neuvedené;9;X;;;0;0;Košický;;Trebišov
22.09.2020;Neuvedené;8;X;;;0;0;Trnavský;;Senica
CSV formated
31. Geospatial setup
• Is optional, but the Map in the visualisations will show no data
• What is GDAL library
– Import and manipulation of Geospatial data
• Tutorial requires to setup GDAL environment and import data
– Will be shown in the hands-on
• What GDAL serves in this use case
– Import regions and districts from the ‘shape’ GIS files
– Ability to connect to Elasticsearch directly
– Proper setup of the mapping for the geo indices
GDAL and import of Regions and Districts
32. High level overview
• Get the GH repository
• Spin up Elasticsearch cluster with Kibana
• Setup GDAL library (optional but recomended)
– Import Geospatial data – regions and districts
• Import all data using Logstash
• Import Milestones (will show in the hands-on)
• Import all the visualisations
• Look at the Dashboard
Howto – best to show in the following part
36. Setup – part 1
GH repository, Elasticsearch cluster, Kibana
$ git clone https://github.com/radoondas/covid-
19-slovakia.git
$ cd covid-19-slovakia
# You need a cluster up and running or setup one using the
docker
$ docker-compose up –d
# check if the cluster is up and running
Navigate to http://127.0.0.1:5601/
37. Setup – part 2
GDAL for geospatial data
# for the setup of GDAL read a blog post.
$ ogr2ogr -lco INDEX_NAME=kraje "ES:http://localhost:9200" –lco
NOT_ANALYZED_FIELDS={ALL} "$(pwd)/data/kraje.json"
$ ogr2ogr -lco INDEX_NAME=obce "ES:http://localhost:9200" -lco
NOT_ANALYZED_FIELDS={ALL} "$(pwd)/data/obce.json"
$ ogr2ogr -lco INDEX_NAME=okresy "ES:http://localhost:9200" -lco
NOT_ANALYZED_FIELDS={ALL} "$(pwd)/data/okresy.json“
#I will be using wrapper
# ~/opt/bin/dogr2ogr e.g.
$ ~/opt/bin/dogr2ogr -lco INDEX_NAME=kraje "ES:http://localhost:9200"
–lco NOT_ANALYZED_FIELDS={ALL} "$(pwd)/data/kraje.json"
38. Setup – part 2 – check imported data
GDAL for geospatial data
#Imported indices
GET _cat/indices/obce,kraje,okresy?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open okresy uidA 1 1 79 0 5.1mb 5.1mb
yellow open obce uidB 1 1 2927 0 26.5mb 26.5mb
yellow open kraje uidC 1 1 8 0 1.8mb 1.8mb
# Create Index patterns for Okresy, Obce and Kraje????????
39. Setup – part 3
Annotations data
$ cd data
# Import index template
curl -s -H "Content-Type: application/x-ndjson" –XPUT
"localhost:9200/_template/milestones" --data-binary "@template_milestones.json"; echo
# You should see message: {"acknowledged":true}
# Index actual data
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary
"@milestones.bulk"; echo
# List template
GET _cat/templates/milestones?v
name index_patterns order version composed_of
milestones [milestones] 0
# List index
GET _cat/indices/milestones?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open milestones uidddd 1 1 17 0 7.6kb 7.6kb
40. Ingest the data
Logstash with the configuration and the template
# Execute, and stop when finished
# One time job – every day J
$ docker run --rm -it --network=host
-v $(pwd)/template.json:/tmp/template.json
-v $(pwd)/data/covid-19-slovensko.csv:/tmp/covid-19-slovensko.csv
-v $(pwd)/ls.conf:/usr/share/logstash/pipeline/logstash.conf
docker.elastic.co/logstash/logstash:7.8.1
# Explanation
# template for correct mapping in elasticsearch -v $(pwd)/template.json:/tmp/template.json
# input file to local source file -v $(pwd)/data/covid-19-slovensko.csv:/tmp/covid-19-slovensko.csv
# custom Logstash configuration -v $(pwd)/ls.conf:/usr/share/logstash/pipeline/logstash.conf
41. Check the data
Kibana and the Developer tools
# Check the index
GET _cat/indices/covid-19-sk?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open covid-19-sk uidddd 1 1 1751 0 557.5kb 557.5kb
GET covid-19-sk/_search
42. Almost there
• Go to Kibana -> Stack management -> Saved objects
• Import file data/visualisations.ndjson
• Check Saved objects
• Navigate to Kibana -> Dashboard
• Live basic navigation
Import all the visualisations and dashboard
45. Agenda
Welcome, Introduction and Community contributor program
Visualise Covid-19 data Using the Elastic Stack - Slovakia
2
AMA - Ask me anything
3
Wrap up/Survey/What is next5
What is new in Elastic stack
1
4
47. Agenda
Welcome, Introduction and Community contributor program
Visualise Covid-19 data Using the Elastic Stack - Slovakia
2
AMA - Ask me anything
3
Wrap up/Survey/What is next
4
What is new in Elastic stack
1
5