SlideShare a Scribd company logo
1 of 11
Download to read offline
ELASTICSEARCH @OPEN19
14 May 2019
§ Why Elastic?
§ Use cases:
– Elasticsearch for troubleshooting
– Elasticsearch for trending (metrics)
– Monitoring ELK stack
§ Setup
– Implementation diagram
– Details
– HW migration strategy
– Numbers
– Alerting
– Intelligent alerts
OUTLINE
2
– Central location of logs
– To allow easier troubleshooting of infrastructure/apps
– No need to login to different systems to check the logs
> keep logs longer then allowed by local diskspace on app servers
– Implementation
> Partnered with Kangaroot for design/implementation
– Using Ansible for deployment/upgrades
> Entreprise support via Elastic
WHY ELASTIC
3
§ Log analysis of F5 access logs
– Graphs/Alerts on average response times for web
apps
– Heavily used by Operations
§ VMware logs
– vCenter logs for auditing reasons (Oracle
licensing)
– when ESXi crashes you might lose your logs
§ Network & storage device logs
§ Kafka broker monitoring
– {metric,file}beat
§ Monitoring Elastic itself
– Logstash filebeat, Elastic nodes, Kibana nodes,
Elastic cluster health
§ Application logs for developers to allow easier
troubleshooting
– Weblogic, Tomcat, JBoss/WildFly, AEM, …
§ Generate alerts towards entreprise monitoring
solution using watches
§ Replacement of GSA with a custom API with
Elastic backend
USE CASES
4
5
§ ELK implementation diagram
Shipper
Shipper
Indexer
Indexer
Indexer
§ Logstash
– Shipper layer uses 1 pipeline
– Index layer uses multiple pipelines
> Grok filters for parsing logfiles, need some logging standards
Alternative
> Use native json logging format
– Monitoring via x-pack
> destination: Elastic monitoring cluster
§ Kafka
– Monitoring using filebeat/metricbeat
> destination: Elastic cluster, bypassing Logstash/Kafka
§ Kibana
– Using coordination-only node
– Loadbalance queries across Elastic nodes
DETAILS
6
§ Setup new independent cluster on new HW (master nodes, data nodes, kibana)
§ Setup new logstash indexer layer using a unique group_id (different kafka consumer_id)
§ Migrate index patterns, existing roles, index templates, visualizations & dashboards, watches
§ data sources need no modification
§ Data is ingested to both clusters
– Allows for testing new Hardware without impact on current cluster
– data migration of older data if needed using snapshot/restore
– Minimal to no data migration by running in parallel for time of data retention
– Once done => switch Kibana VIP from old Kibana to new Kibana instance
HW MIGRATION STRATEGY
7
§ PRD cluster
– 7 physical warm datanodes
– 3 physical hot datanodes
– 3 dedicated virtual master nodes
§ Currently running version 6.5
§ Retention:
– 30-days of data for infrastructure related logs
– 3 weeks of data for application logs
– Few months for metrics
§ Current replicated datavolume: 32TB
§ Roughly 850 GB/day incoming logs
§ 7000 events/s for F5 access logs => daily replicated volume: 500 to 600 Gb/day
§ 3200 events/s for VMware logs => daily replicated volume: 350 Gb/day
§ 500 events/s for Metricbeat => monthly replicated volume: 400 Gb
NUMBERS
8
§ WATCHER
– Input
> Search (Elastic query)
> Http request
– Trigger
> Time based: when to execute watcher (e.g. every 5min)
– Condition
> When to execute action against
– Action to take if condition is met
> log message to file
> send e-mail
> notification to Chat tool (e.g. Slack)
> Call to Webhook
ALERTING
9
§ Alerts are typically static
– E.g. cpu usage should be below 90%, response times should be below 0.5s
– Not aware of periodicity, e.g. billing cycle, weekends, …
§ Enter machine learning (ML)
– Creates a ML model that recognizes periodicity, can do forecasting
– Anomaly detection, visually identify anomalies using heatmap
– Simple ML jobs
> based on 1 metric
– Multi metric ML jobs:
> split a single time series into multiple time series based on a categorical field.
INTELLIGENT ALERTS
10
THANK YOU

More Related Content

What's hot

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
HostedbyConfluent
 

What's hot (20)

[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
 
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaSEnhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
 
CDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCCDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaC
 
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWSAWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS
 
Implementing an Automated Staging Environment
Implementing an Automated Staging EnvironmentImplementing an Automated Staging Environment
Implementing an Automated Staging Environment
 
Ceilometer Updates - Kilo Edition
Ceilometer Updates - Kilo EditionCeilometer Updates - Kilo Edition
Ceilometer Updates - Kilo Edition
 
Web後端技術的演變
Web後端技術的演變Web後端技術的演變
Web後端技術的演變
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
 
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuOSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystem
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Nova Updates - Kilo Edition
Nova Updates - Kilo EditionNova Updates - Kilo Edition
Nova Updates - Kilo Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Container Management - Federico Simoncelli - ManageIQ Design Summit 2016
Container Management - Federico Simoncelli - ManageIQ Design Summit 2016Container Management - Federico Simoncelli - ManageIQ Design Summit 2016
Container Management - Federico Simoncelli - ManageIQ Design Summit 2016
 
Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
 
Serverless stream processing of Debezium data change events with Knative | De...
Serverless stream processing of Debezium data change events with Knative | De...Serverless stream processing of Debezium data change events with Knative | De...
Serverless stream processing of Debezium data change events with Knative | De...
 
The evolving container landscape
The evolving container landscapeThe evolving container landscape
The evolving container landscape
 

Similar to 4 - Customer story: Telenet

NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
Paresh Patel
 

Similar to 4 - Customer story: Telenet (20)

OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
Securing Hadoop @eBay
Securing Hadoop @eBaySecuring Hadoop @eBay
Securing Hadoop @eBay
 
Getting Started with Splunk
Getting Started with SplunkGetting Started with Splunk
Getting Started with Splunk
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
 
Présentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhPrésentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo Wazuh
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EUBuilding Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
 
10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public Cloud10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public Cloud
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Real-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesReal-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on Kubernetes
 

More from Kangaroot

More from Kangaroot (20)

So you think you know SUSE?
So you think you know SUSE?So you think you know SUSE?
So you think you know SUSE?
 
Live demo: Protect your Data
Live demo: Protect your DataLive demo: Protect your Data
Live demo: Protect your Data
 
RootStack - Devfactory
RootStack - DevfactoryRootStack - Devfactory
RootStack - Devfactory
 
Welcome at OPEN'22
Welcome at OPEN'22Welcome at OPEN'22
Welcome at OPEN'22
 
EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public Sector
 
Deploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native KubernetesDeploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native Kubernetes
 
Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.  Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.
 
Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}
 
NGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headachesNGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headaches
 
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQLKangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
 
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”
 
Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)
 
Hashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public SectorHashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public Sector
 
Kangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontractenKangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontracten
 
Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
 
Kubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by KangarootKubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by Kangaroot
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

4 - Customer story: Telenet

  • 2. § Why Elastic? § Use cases: – Elasticsearch for troubleshooting – Elasticsearch for trending (metrics) – Monitoring ELK stack § Setup – Implementation diagram – Details – HW migration strategy – Numbers – Alerting – Intelligent alerts OUTLINE 2
  • 3. – Central location of logs – To allow easier troubleshooting of infrastructure/apps – No need to login to different systems to check the logs > keep logs longer then allowed by local diskspace on app servers – Implementation > Partnered with Kangaroot for design/implementation – Using Ansible for deployment/upgrades > Entreprise support via Elastic WHY ELASTIC 3
  • 4. § Log analysis of F5 access logs – Graphs/Alerts on average response times for web apps – Heavily used by Operations § VMware logs – vCenter logs for auditing reasons (Oracle licensing) – when ESXi crashes you might lose your logs § Network & storage device logs § Kafka broker monitoring – {metric,file}beat § Monitoring Elastic itself – Logstash filebeat, Elastic nodes, Kibana nodes, Elastic cluster health § Application logs for developers to allow easier troubleshooting – Weblogic, Tomcat, JBoss/WildFly, AEM, … § Generate alerts towards entreprise monitoring solution using watches § Replacement of GSA with a custom API with Elastic backend USE CASES 4
  • 5. 5 § ELK implementation diagram Shipper Shipper Indexer Indexer Indexer
  • 6. § Logstash – Shipper layer uses 1 pipeline – Index layer uses multiple pipelines > Grok filters for parsing logfiles, need some logging standards Alternative > Use native json logging format – Monitoring via x-pack > destination: Elastic monitoring cluster § Kafka – Monitoring using filebeat/metricbeat > destination: Elastic cluster, bypassing Logstash/Kafka § Kibana – Using coordination-only node – Loadbalance queries across Elastic nodes DETAILS 6
  • 7. § Setup new independent cluster on new HW (master nodes, data nodes, kibana) § Setup new logstash indexer layer using a unique group_id (different kafka consumer_id) § Migrate index patterns, existing roles, index templates, visualizations & dashboards, watches § data sources need no modification § Data is ingested to both clusters – Allows for testing new Hardware without impact on current cluster – data migration of older data if needed using snapshot/restore – Minimal to no data migration by running in parallel for time of data retention – Once done => switch Kibana VIP from old Kibana to new Kibana instance HW MIGRATION STRATEGY 7
  • 8. § PRD cluster – 7 physical warm datanodes – 3 physical hot datanodes – 3 dedicated virtual master nodes § Currently running version 6.5 § Retention: – 30-days of data for infrastructure related logs – 3 weeks of data for application logs – Few months for metrics § Current replicated datavolume: 32TB § Roughly 850 GB/day incoming logs § 7000 events/s for F5 access logs => daily replicated volume: 500 to 600 Gb/day § 3200 events/s for VMware logs => daily replicated volume: 350 Gb/day § 500 events/s for Metricbeat => monthly replicated volume: 400 Gb NUMBERS 8
  • 9. § WATCHER – Input > Search (Elastic query) > Http request – Trigger > Time based: when to execute watcher (e.g. every 5min) – Condition > When to execute action against – Action to take if condition is met > log message to file > send e-mail > notification to Chat tool (e.g. Slack) > Call to Webhook ALERTING 9
  • 10. § Alerts are typically static – E.g. cpu usage should be below 90%, response times should be below 0.5s – Not aware of periodicity, e.g. billing cycle, weekends, … § Enter machine learning (ML) – Creates a ML model that recognizes periodicity, can do forecasting – Anomaly detection, visually identify anomalies using heatmap – Simple ML jobs > based on 1 metric – Multi metric ML jobs: > split a single time series into multiple time series based on a categorical field. INTELLIGENT ALERTS 10