SlideShare una empresa de Scribd logo
1 de 17
Renzo Tomà, bol.com
1
How bol.com makes sense of its logs,
using the Elastic technology stack.
How bol.com makes sense of its logs,
using the Elastic technology stack.
2
Renzo Tomà, Oct. 29 2015
• Renzo Tomà
• IT Operations engineer at bol.com, large webshop in the Netherlands and Belgium
• Product owner & tech lead for 2 platforms: metrics & logsearch
• Opensource user + contributor
• Husband and dad of 2 cool kids!
3
Please to meet you
ELK powers a Logsearch platform (“grep on steroids”).
Log events from many layers of our infrastructure.
Central user interface for querying: Kibana.
For software developers, system engineers & our security team (~300 potential users).
Supports development & operations co-op (sharing Kibana dashboards = 1 truth).
Bottomline: faster incident resolution = less revenue loss.
4
bol.com & ELK
ELK is a 1st class citizen, since datacenter rebuild go-live in 2014.
Getting feeds from:
• 3 datacenters
• 5 frontend apps, 80+ services
• lots of databases
Log types: Apache and Tomcat access logging, Log4j, PostgreSQL, Oracle, syslog, …
Numbers:
• 1600+ servers emitting log events
• 500-600 million events per day, indexing peaks at 25k/sec
• 23 billion events stored, 14TB * 2 on disk
• We keep 90 days available for search.
5
ELK as 1st class citizen
6
Our high level design
7
Great, but how do those events get into Redis?
In 2013: tail files & ship lines to Logstash over UDP. Lots of grokking.
Logstash (1 instance) unable to process feed in real time => data loss, incomplete events.
Need for speed & simplicity!
• Scale Logstash instances. Use Redis as message bus, to feed multiple Logstash instances.
• Reduce need for complex grok. Format events in a structured format.
In 2015: events get converted into JSON docs at the source. Our shippers run inside JVMs and DBs.
Logstash reads from Redis and decodes events. No more grokking.
Logstash out of work? No. Cleanup, enrichment (IP geo location) and metrics generations (lag, throughput).
8
Struggles in log shipping
Application server access logging (Tomcat):
Inside Tomcat: convert ‘hits’ into JSON doc and send to Redis: https://github.com/bolcom/redis-log-valve
Java application logging (Log4j):
Inside JVM: convert events into JSON doc and send to Redis:
https://github.com/bolcom/log4j-jsonevent-layout + https://github.com/bolcom/log4j-redis-appender
Webserver access logging (Apache):
• Custom LogFormat to output ‘hit’ as JSON: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/
• Apache sends JSON docs to external process, which sends to Redis.
Docker logging:
Shipper container: subscribes to logs for all running containers, convert events into JSON doc and send to
Redis:https://github.com/bolcom/logspout-redis-logstash
Oracle logging:
Inside database: custom PL/SQL package with API, creates JSON docs and send to Redis.
PostgreSQL logging:
Inside database: hooks into logging, convert events into JSON doc and send to Redis: https://github.com/2ndquadrant-it/redislog
9
The logshippers we use
Each Webshop request gets tagged with Request ID.
Webshop is connected to 25 services. Request ID gets attached to all service calls.
It gets logged in many places.
Correlation time!
Search for a Request ID and see:
• initial Webshop request
• all service calls made
Including: order, parameters,
status codes and responsetimes.
10
Special sauce 1/2: the call stack
We have 5 frontend application and 80+ services. Services calling services.
New services get introduced. New connections are made. Canary releases. A/B testing…
Its a living distributed architecture.
We need a map, we can trust!
Let’s build a directed graph.
• Use the Tomcat access logging
• Add “A called B” information
• Elasticsearch aggregation query
• Transform the result and draw graph
11
Special sauce 2/2: the service map
Event emitted for every request a Tomcat Java application processes:
12
Tomcat access log events
{
"@message": ”/v1/get-product/987654321”,
"@source_host": ”pro-catalog-001",
"@fields": {
"agent": "curl/7.43.0",
"role": ”catalog",
"verb": "GET",
"time_in_msec": 2,
"response": 200,
"bytes": 75,
"client": ”10.0.0.1",
"httpversion": "HTTP/1.1",
"time_in_sec": 0,
"timestamp": 1443101965498
}
}
We create a lookup table for our whole datacenter IP space:
“10.0.0.1”: “webshop”
“10.0.0.2”: “catalog”
…
Add new field, using Logstash ‘translate’ filter:
translate {
dictionary_path => ‘ip-to-role-mapping.yaml’
field => ‘client’
destination => ‘client_role’
}
That’s all we need.
13
Enrich events with external data
{
"@message": ”/v1/get-product/987654321”,
"@source_host": ”pro-catalog-001",
"@fields": {
"agent": "curl/7.43.0",
"role": ”catalog",
"verb": "GET",
"time_in_msec": 2,
"response": 200,
"bytes": 75,
"client": ”10.0.0.1",
”client_role": ”webshop",
"httpversion": "HTTP/1.1",
"time_in_sec": 0,
"timestamp": 1443101965498
}
}
14
Searching & transforming
# search query
{
"size": 0,
"query": { … },
"aggs": {
"_apps_": {
"terms": {"field": "role"},
"aggs": {
"_clients_": {
"terms": {"field": "client_role"},
}
}
}
}
}
# search result
{
"hits": { … },
"aggregations": {
"_apps_": {
"buckets": [
{
"_clients_": {
"buckets": [
{
"key": ”catalog",
"doc_count": 1234,
},
…
],
"key": “webshop",
…
}
}
],
}
}
}
# dot file
digraph {
node [shape=box];
“webshop" -> “catalog" [label=1234];
"abc" -> "foo" [label=42];
"foo" -> "bar" [label=13];
…
}
15
That makes sense! (Sort of …)
Names have been obfuscated. Sorry.
16
That makes sense!
Renzo Tomà
rtoma@bol.com
Thanks!

Más contenido relacionado

La actualidad más candente

Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
 

La actualidad más candente (20)

ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
Logstash
LogstashLogstash
Logstash
 
Elk devops
Elk devopsElk devops
Elk devops
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
Monitoring Docker with ELK
Monitoring Docker with ELKMonitoring Docker with ELK
Monitoring Docker with ELK
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
 

Similar a How bol.com makes sense of its logs, using the Elastic technology stack.

Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 

Similar a How bol.com makes sense of its logs, using the Elastic technology stack. (20)

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
High Volume Payments using Mule
High Volume Payments using MuleHigh Volume Payments using Mule
High Volume Payments using Mule
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Otimizando servidores web
Otimizando servidores webOtimizando servidores web
Otimizando servidores web
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
 
2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 

Último

一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 

Último (20)

Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 

How bol.com makes sense of its logs, using the Elastic technology stack.

  • 1. Renzo Tomà, bol.com 1 How bol.com makes sense of its logs, using the Elastic technology stack.
  • 2. How bol.com makes sense of its logs, using the Elastic technology stack. 2 Renzo Tomà, Oct. 29 2015
  • 3. • Renzo Tomà • IT Operations engineer at bol.com, large webshop in the Netherlands and Belgium • Product owner & tech lead for 2 platforms: metrics & logsearch • Opensource user + contributor • Husband and dad of 2 cool kids! 3 Please to meet you
  • 4. ELK powers a Logsearch platform (“grep on steroids”). Log events from many layers of our infrastructure. Central user interface for querying: Kibana. For software developers, system engineers & our security team (~300 potential users). Supports development & operations co-op (sharing Kibana dashboards = 1 truth). Bottomline: faster incident resolution = less revenue loss. 4 bol.com & ELK
  • 5. ELK is a 1st class citizen, since datacenter rebuild go-live in 2014. Getting feeds from: • 3 datacenters • 5 frontend apps, 80+ services • lots of databases Log types: Apache and Tomcat access logging, Log4j, PostgreSQL, Oracle, syslog, … Numbers: • 1600+ servers emitting log events • 500-600 million events per day, indexing peaks at 25k/sec • 23 billion events stored, 14TB * 2 on disk • We keep 90 days available for search. 5 ELK as 1st class citizen
  • 7. 7 Great, but how do those events get into Redis?
  • 8. In 2013: tail files & ship lines to Logstash over UDP. Lots of grokking. Logstash (1 instance) unable to process feed in real time => data loss, incomplete events. Need for speed & simplicity! • Scale Logstash instances. Use Redis as message bus, to feed multiple Logstash instances. • Reduce need for complex grok. Format events in a structured format. In 2015: events get converted into JSON docs at the source. Our shippers run inside JVMs and DBs. Logstash reads from Redis and decodes events. No more grokking. Logstash out of work? No. Cleanup, enrichment (IP geo location) and metrics generations (lag, throughput). 8 Struggles in log shipping
  • 9. Application server access logging (Tomcat): Inside Tomcat: convert ‘hits’ into JSON doc and send to Redis: https://github.com/bolcom/redis-log-valve Java application logging (Log4j): Inside JVM: convert events into JSON doc and send to Redis: https://github.com/bolcom/log4j-jsonevent-layout + https://github.com/bolcom/log4j-redis-appender Webserver access logging (Apache): • Custom LogFormat to output ‘hit’ as JSON: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/ • Apache sends JSON docs to external process, which sends to Redis. Docker logging: Shipper container: subscribes to logs for all running containers, convert events into JSON doc and send to Redis:https://github.com/bolcom/logspout-redis-logstash Oracle logging: Inside database: custom PL/SQL package with API, creates JSON docs and send to Redis. PostgreSQL logging: Inside database: hooks into logging, convert events into JSON doc and send to Redis: https://github.com/2ndquadrant-it/redislog 9 The logshippers we use
  • 10. Each Webshop request gets tagged with Request ID. Webshop is connected to 25 services. Request ID gets attached to all service calls. It gets logged in many places. Correlation time! Search for a Request ID and see: • initial Webshop request • all service calls made Including: order, parameters, status codes and responsetimes. 10 Special sauce 1/2: the call stack
  • 11. We have 5 frontend application and 80+ services. Services calling services. New services get introduced. New connections are made. Canary releases. A/B testing… Its a living distributed architecture. We need a map, we can trust! Let’s build a directed graph. • Use the Tomcat access logging • Add “A called B” information • Elasticsearch aggregation query • Transform the result and draw graph 11 Special sauce 2/2: the service map
  • 12. Event emitted for every request a Tomcat Java application processes: 12 Tomcat access log events { "@message": ”/v1/get-product/987654321”, "@source_host": ”pro-catalog-001", "@fields": { "agent": "curl/7.43.0", "role": ”catalog", "verb": "GET", "time_in_msec": 2, "response": 200, "bytes": 75, "client": ”10.0.0.1", "httpversion": "HTTP/1.1", "time_in_sec": 0, "timestamp": 1443101965498 } }
  • 13. We create a lookup table for our whole datacenter IP space: “10.0.0.1”: “webshop” “10.0.0.2”: “catalog” … Add new field, using Logstash ‘translate’ filter: translate { dictionary_path => ‘ip-to-role-mapping.yaml’ field => ‘client’ destination => ‘client_role’ } That’s all we need. 13 Enrich events with external data { "@message": ”/v1/get-product/987654321”, "@source_host": ”pro-catalog-001", "@fields": { "agent": "curl/7.43.0", "role": ”catalog", "verb": "GET", "time_in_msec": 2, "response": 200, "bytes": 75, "client": ”10.0.0.1", ”client_role": ”webshop", "httpversion": "HTTP/1.1", "time_in_sec": 0, "timestamp": 1443101965498 } }
  • 14. 14 Searching & transforming # search query { "size": 0, "query": { … }, "aggs": { "_apps_": { "terms": {"field": "role"}, "aggs": { "_clients_": { "terms": {"field": "client_role"}, } } } } } # search result { "hits": { … }, "aggregations": { "_apps_": { "buckets": [ { "_clients_": { "buckets": [ { "key": ”catalog", "doc_count": 1234, }, … ], "key": “webshop", … } } ], } } } # dot file digraph { node [shape=box]; “webshop" -> “catalog" [label=1234]; "abc" -> "foo" [label=42]; "foo" -> "bar" [label=13]; … }
  • 15. 15 That makes sense! (Sort of …) Names have been obfuscated. Sorry.