1. A Gentle Intro to ElasticSearch
Taswar Bhatti
System/Solutions Architect (Ottawa)
GEMALTO
2. Who amI?
System/Solution Architect at Gemalto Ottawa (Microsoft MVP)
I am somewhat of a language geek; I speak a few languages
Kind of like Neo (I KNOW KUNG FU) for languages
2
- Merhaba
- नमस्ते
- 你好
- ہیلو
- Comment ca va?
- ਸਤ ਸਰੀ ਅਕਾਲ
3. 9/14/2018 3
Reuters Top 100: Gemalto rated top Global Tech Leaders
https://www.thomsonreuters.com/en/products-services/technology/top-100.html
4. Agenda
Problem we had and wanted to solve with Elastic Stack
Intro to Elastic Stack (Ecosystem)
Logstash
Kibana
Beats
Elastic Search flows designs that we have considered
Future plans of using Elastic Search
4
5. How doyouTroubleshootorfindyourbugs?
Typically in a distributed environment one has to go through the logs to find out where
the issue is
Could be multiple systems that you have to go through which machine/server generated
the log or monitoring multiple logs
Even monitor firewall logs to find traffic routing through which data center
Chuck Norris never troubleshoot; the trouble kills themselves when they see him
coming
9/14/2018 5
7. OurProblem
We had distributed systems (microservices) that would generate many different types of
logs, in different data centers
We also had authentication audit logs that had to be secure and stored for 1 year
We generate around 2 millions records of audit logs a day, 4TB with replications
We need to generate reports out of our data for customers
We were still using Monolith Solution in some core parts of the application
Growing pains of a successful application
We want to use a centralized scalable logging system for all our logs
9/14/2018 7
9. Alittlehistoryof ElasticSearch
Shay Banon created Compass in 2004
Released Elastic Search 1.0 in 2010
ElasticSearch the company was formed in 2012
Shay wife is still waiting for her receipe app
9/14/2018 9
14. ElasticSearchindices
Elastic organizes document in indices
Lucene writes and maintains the index files
ElasticSearch writes and maintains metadata on top of Lucene
Example: field mappings, index settings and other cluster metadata
9/14/2018 14
16. ElasticConcepts
Cluster : A cluster is a collection of one or more nodes (servers)
Node : A node is a single server that is part of your cluster, stores your data, and
participates in the cluster’s indexing and search capabilities
Index : An index is a collection of documents that have somewhat similar
characteristics. (e.g Product, Customer, etc)
Type : Within an index, you can define one or more types. A type is a logical
category/partition of your index.
Document : A document is a basic unit of information that can be indexed
Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of
your shards
9/14/2018 16
17. Elasticnodes
Master Node : which controls the cluster
Data Node : Data nodes hold data and perform data related operations such as CRUD,
search, and aggregations.
Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order
to transform and enrich the document before indexing
Coordinating Node : only route requests, handle the search reduce phase, and
distribute bulk indexing.
9/14/2018 17
23. LOGSTASH
Ruby application runs under JRuby on the JVM
Collects, parse, enrich data
Horizontally scalable
Apache 2.0 License
Large amount of public plugins written by Community
https://github.com/logstash-plugins
9/14/2018 23
31. Beats
Lightweight shippers written in Golang (Non JVM shops can use them)
They follow unix philosophy; do one specific thing, and do it well
Filebeat : Logfile (think of it tail –f on steroids)
Metricbeat : CPU, Memory (like top), redis, mongodb usage
Packetbeat : Wireshark uses libpcap, monitoring packet http etc
Winlogbeat : Windows event logs to elastic
Dockbeat : Monitoring docker
Large community lots of other beats offered as opensource
9/14/2018 31
34. X-Pack
Elastic commercial offering (This is one of the ways they make money)
X-Pack is an Elastic Stack extension that bundles
Security (https to elastic, password to access Kibana)
Alerting
Monitoring
Reporting
Graph capabilities
Machine Learning
9/14/2018 34
36. Kibana
Visual Application for Elastic Search (JS, Angular, D3)
Powerful frontend for dashboard for visualizing index information from elastic search
Historical data to form charts, graphs etc
Realtime search for index information
9/14/2018 36
39. Designswewentthrough
We started with simple design to measure throughput
One instance of logstash and one instance of ElasticSearch with filebeat
9/14/2018 39
40. DotnetCoreapp
We used a dotnetcore application to generate logs
Serilog to generate into json format and stored on file
Filebeat was installed on the linux machine to ship the logs to logstash
9/14/2018 40
53. Considerationsofdata
Index by day make sense in some cases
In other you may want to index by size rather (Black Friday more traffic than other
days) when Shards are not balance ElasticSearch doesn’t like that
Don’t index everything, if you are not going to search on specific fields mark them as
text
9/14/2018 53
54. FutureConsiderations
Investigate into Elastic Search Machine learning
ElasticSearch with Kafka for cross data center replication
Logstash Centralizex Pipeline for SEIM intergations
9/14/2018 54
Pros
Classic, proven to work
Redis in the middle provides better reliability:
Offloads Logstash shipper's queue
Protects against DC-Azure network outages
Protects ES cluster from high activity peaks
Cons
Logstash on app servers can be heavy (Java required)
Need to scale Redis if traffic overgrows its capacity
Pros
Filebeat is lightweight, no Java
Filebeat has a retry mechanism
Redis adds additional reliability
Cons
Need to scale Redis if traffic overgrows its capacity (RAM)
Filebeat is new, might have glitches
Filebeat is currently not able to handle multi-line log entries
This feature expected to be released in v 1.1
Pros
Less to setup / maintain
Easier to update processing rules in one place (central Logstash)
Easier to make evolve
Could evolve into approach #2 (with Redis)
Compatible with PCI-DSS
Future versions of Logstash will have internal buffer queue (alt. to Redis)
Cons
The central Logstash instance needs to be scaled up/out at some point
Pros
Reuse of existing log pipeline
Cons
Not 100% reliable since UDP is used for transport
Pros
Easier to update processing rules in one place (central Logstash)
More reliable (Logstash protocol)
Cons
Might need to scale central Logstash at some point