SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
ELK
Log processing at Scale
#DevOpsDays 2015, Singapore
@DevOpsDaysSG
Angad Singh
About me
DevOps at Viki, Inc - A global
video streaming site with
subtitles.
Previously a Twitter SRE,
National University of Singapore
Twitter @angadsg,
Github @angad
Elasticsearch - Log Indexing and Searching
Logstash - Log Ingestion plumbing
Kibana - Frontend
{
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Logging
● Useful for debugging
● Catch-all
● Full text searching
● Computationally intensive, harder
to scale
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Alerting and Monitoring at Viki
Deeper level
debugging with
application logs
Success Rate
Alert for
service X
Logs
● Application logs - Stack Traces, Handled Exceptions
● Access Logs - Status codes, URI, HTTP Method at all levels of the stack
● Client Logs - Direct HTTP requests containing log events from client-side
Javascript or Mobile application (android/ios)
● Standardized log format to JSON - easy to add / remove fields.
● Request tracing through various services using Unique-ID at Load Balancer
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
Kibana
● Frontend
● Visualizations,
Dashboards
● Supports Geo
visualizations
● Uses ES REST API
Input
Any Stream
● local file
● queue
● tcp, udp
● twitter
● etc..
Logstash
Filter
Mutation
● add/remove field
● parse as json
● ruby code
● parse geoip
● etc..
Output
● elasticsearch
● redis
● queue
● file
● pagerduty
● etc..
● Golang program that sits next to log files, lumberjack protocol
● Forwards logs from a file to a logstash server
● Removes the need for a buffer (such as redis, or a queue) for
logs pending ingestion to logstash.
● Docker container with volume mounted /var/log.
Configuration stored in Consul.
● Application containers with volume mounted /var/log to
/var/log/docker/<container>/application.log
Logstash Forwarder
Logstash pool with HAProxy
4 x logstash machines, 8 cores, 16 GB
RAM
7 x logstash processes per machine, 5 for
application logs, 2 for HTTP client logs.
Fronted by HAProxy for both lumberjack
protocol as well as HTTP protocol.
Easily scalable by adding more machines
and spinning up more logstash processes.
Application
Service
Container 1
Application
Service
Container 2
Logstash-Forwarder
Container
Mounted /var/log
to
/var/log/docker/
on host
Elasticsearch Hardware
12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks.
20 nodes, 20 shards, 3 replicas (with 1 primary).
Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB.
Average 6k-8k logs per second, peak 25k logs per second.
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Elasticsearch Hardware
● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap
● Sweet spot - 64GB of RAM with half available for Lucene file buffers.
● SSD or RAID 0 (or multiple path directories similar to RAID 0).
● If SSD then set I/O scheduler to deadline instead of cfq.
● RAID0 - no need to worry about disks failing as machines can easily be
replaced due to multiple copies of data.
● Disable swap.
Hardware Tuning
● 20 days of indexes open based on available memory, rest closed - open on
demand
● Field data - cache used while sorting and aggregating data.
● Circuit breaker - cancels requests which require large memory, prevent OOM,
http://elasticsearch:9200/_cache/clear if field data is very close to memory
limit.
● Shards >= Number of nodes
● Lucene forceMerge - minor performance improvements for older indexes
(https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize.
html)
Elasticsearch Configuration
Prevent split brain situation to avoid losing data - set minimum number of master
eligible nodes to (n/2 + 1)
Set higher ulimit for elasticsearch process
Daily cronjob which deletes data older than 90 days, closes indices older than 20
days, optimizes (forceMerge) indices older than 2 days
And also...
Marvel - Official plugin from Elasticsearch
KOPF - Index management plugin
CAT APIs - REST APIs to view cluster information
Curator - Data management
Monitoring
Thanks
email: angad@viki.com
twitter: @angadsg

Más contenido relacionado

La actualidad más candente

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Matt Fuller
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Piotr Findeisen
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stackSimon Hanmer
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisJosé Manuel Ciges Regueiro
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms_mdev_
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics nullowaspmumbai
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With RiakJohn Lynch
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEMJohn Hubbard
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaYann Cluchey
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at TwitterBill Graham
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較Yoshiyasu SAEKI
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at ShowyouJohn Muellerleile
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingJen Aman
 

La actualidad más candente (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 

Similar a Scaling ELK Stack - DevOpsDays Singapore

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsBrettTasker
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®confluent
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Ceph Community
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance ComputingLuciano Mammino
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Syed Shaaf
 

Similar a Scaling ELK Stack - DevOpsDays Singapore (20)

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013
 

Último

APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxgalaxypingy
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptxAsmae Rabhi
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 

Último (20)

APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 

Scaling ELK Stack - DevOpsDays Singapore

  • 1. ELK Log processing at Scale #DevOpsDays 2015, Singapore @DevOpsDaysSG Angad Singh
  • 2. About me DevOps at Viki, Inc - A global video streaming site with subtitles. Previously a Twitter SRE, National University of Singapore Twitter @angadsg, Github @angad
  • 3. Elasticsearch - Log Indexing and Searching Logstash - Log Ingestion plumbing Kibana - Frontend {
  • 4. Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 5. Logging ● Useful for debugging ● Catch-all ● Full text searching ● Computationally intensive, harder to scale Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 6. Alerting and Monitoring at Viki Deeper level debugging with application logs Success Rate Alert for service X
  • 7. Logs ● Application logs - Stack Traces, Handled Exceptions ● Access Logs - Status codes, URI, HTTP Method at all levels of the stack ● Client Logs - Direct HTTP requests containing log events from client-side Javascript or Mobile application (android/ios) ● Standardized log format to JSON - easy to add / remove fields. ● Request tracing through various services using Unique-ID at Load Balancer
  • 8. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash
  • 9. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable
  • 10. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable Kibana ● Frontend ● Visualizations, Dashboards ● Supports Geo visualizations ● Uses ES REST API
  • 11.
  • 12. Input Any Stream ● local file ● queue ● tcp, udp ● twitter ● etc.. Logstash Filter Mutation ● add/remove field ● parse as json ● ruby code ● parse geoip ● etc.. Output ● elasticsearch ● redis ● queue ● file ● pagerduty ● etc..
  • 13. ● Golang program that sits next to log files, lumberjack protocol ● Forwards logs from a file to a logstash server ● Removes the need for a buffer (such as redis, or a queue) for logs pending ingestion to logstash. ● Docker container with volume mounted /var/log. Configuration stored in Consul. ● Application containers with volume mounted /var/log to /var/log/docker/<container>/application.log Logstash Forwarder
  • 14. Logstash pool with HAProxy 4 x logstash machines, 8 cores, 16 GB RAM 7 x logstash processes per machine, 5 for application logs, 2 for HTTP client logs. Fronted by HAProxy for both lumberjack protocol as well as HTTP protocol. Easily scalable by adding more machines and spinning up more logstash processes.
  • 16. Elasticsearch Hardware 12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks. 20 nodes, 20 shards, 3 replicas (with 1 primary). Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB. Average 6k-8k logs per second, peak 25k logs per second. https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
  • 18. ● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap ● Sweet spot - 64GB of RAM with half available for Lucene file buffers. ● SSD or RAID 0 (or multiple path directories similar to RAID 0). ● If SSD then set I/O scheduler to deadline instead of cfq. ● RAID0 - no need to worry about disks failing as machines can easily be replaced due to multiple copies of data. ● Disable swap. Hardware Tuning
  • 19. ● 20 days of indexes open based on available memory, rest closed - open on demand ● Field data - cache used while sorting and aggregating data. ● Circuit breaker - cancels requests which require large memory, prevent OOM, http://elasticsearch:9200/_cache/clear if field data is very close to memory limit. ● Shards >= Number of nodes ● Lucene forceMerge - minor performance improvements for older indexes (https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize. html) Elasticsearch Configuration
  • 20. Prevent split brain situation to avoid losing data - set minimum number of master eligible nodes to (n/2 + 1) Set higher ulimit for elasticsearch process Daily cronjob which deletes data older than 90 days, closes indices older than 20 days, optimizes (forceMerge) indices older than 2 days And also...
  • 21.
  • 22. Marvel - Official plugin from Elasticsearch KOPF - Index management plugin CAT APIs - REST APIs to view cluster information Curator - Data management Monitoring