SlideShare una empresa de Scribd logo
1 de 30
Volta: Logging, Metrics and
Monitoring as a Service
LN Renganarayana
Technical Director / Architect
Cloud Platform Engineering
ln_renganarayana@symantec.com
twitter: @lrengan
1Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
Outline
• Motivation: data and events are the foundation of business
• Why build a (new) Service?
• What have we built: a (near) real-time data analytics pipeline
• The journey and lessons learned
• Looking ahead: Volta next gen
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
2
Data and events : the foundation
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
3
Picture: “Devops with S for sharing”, Patrick Debois
which features
to build?
what is a good
pricing model?
how fast can I
build?
what is the perf
of my code?
how is the
service?
what is
my
capacity?
what is
my
current
usage?
Why build a (new) service?
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
4
Why build a service?
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
5
Picture: Jim Nisbet & Philip O’Toole
AWS re:Invent 2013 Loggly presentation
Single place for events across the stack
Volta / Cloud Platform Engineering, Symantec
6
Jan 7, 2015
Bare Metal
IaaS (OpenStack)
Platform Services
BP, SP, KV, OBS
Symantec Services & Apps
Volta
Identity
Manager
CI / CD
Common
Services
Volta : Design Goals
• Design for both Developers and Ops
– Make it extremely simple to capture events
– provide powerful search and visualization tools
• Secure, Multi Tenant : well we are Symantec, so Security comes first 
• Scalable : elastically scale with load
• Highly Available: Volta is the eyes & ears for the Operations
• One system for logs, metrics, monitoring & other events
• Build using open source tools and for open sourcing
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
7
What we have built ...
A (near) real-time data analytics pipeline
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
8
Volta Client View
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
9
App
Platform
Services
Writes app
metrics directly
Infrastructure
SNMPVars
expose
metrics
JMX
Pull
Metrics
Push
Metrics
Volta
Shipper
VM
logs
Volta
metrics log events
Alerts&
ConfigUI
Push: StatsD, metrics extension for openstack
Pull: CollectD. Shipper: logstash, moving to Heka
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
10
Kafka cluster
knode1
Keystone
knode2 knode3 knodeN...
log, metric, alert events
Storm cluster
Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies
Elastic
Search
Elastic
SearchRedis
Alerts email &
callbacks
Load Balancer
Client App / Service
s1 s2 s3 s4 ... sn
log & metrics shipper
log, metric & alert events
InfluxDB
InfluxDB
InfluxDB
MetricsStore
Elastic
Search
Elastic
Search
Elastic
Search
LogStore
Authentication, Validation, Alerts Processing
VoltaUndertheHood
Quota
&
Policy
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
11
Kafka cluster
knode1 knode2 knode3 knodeN...
log, metric, alert events
Client App / Service
log & metrics shipper
log, metric & alert events
The Ingest Pipeline
VIP
• Kafka – replicated, fault
tolerant, persistent
message queue
• LogTopic, MetricTopic,
AlertTopic
• each topic is split into
partitions
• per topic retention policy
Event processing and storage
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
12
Storm cluster
Elastic
Search
Elastic
SearchRedis
Alerts email &
callbacks
log, metric & alert events
InfluxDB
InfluxDB
InfluxDB
MetricsStore
Elastic
Search
Elastic
Search
Elastic
Search
LogStore
Authentication, Validation, Alerts Processing
Quota
&
Policy
• alert rules
• [tenantid,
apikey] pairs
• Per tenant per day index
• Index typed fields
• Quota and retention policy
• Tenant id prefixed time series names
• Continuous queries do rollups
• Retention policy through rollups
Multi-tenancy Proxy & UI
Volta / Cloud Platform Engineering, Symantec
13
Keystone
Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies
Elastic
SearchElastic
SearchRedis
Load Balancer
s1 s2 s3 s4 ... sn
InfluxDB
InfluxDB
InfluxDB
MetricsStore
Elastic
SearchElastic
SearchElastic
Search
LogStore
• Intercepts and rewrites queries
to ES and InfluxDB
• Enforces Multi-tenancy
(visibility of events to users)
Security and Multi-tenancy model
• Authentication with Keystone backed by LDAP
– user authentication for Query API and UI
• Multi tenancy with users and groups
– Events have tenant id and apikey
• Cross tenant correlation
– group membership used for cross-tenant event visibility / correlation
• Dashboard sharing
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
14
Retention Policy : Log Events
• ElasticSearch allows powerful querying, but comes at a cost
– Store only logs that would help better operate and trouble shoot
– Use appropriate debug levels (not INFO)
• Fixed quota : 350 GB or 500 GB
• When tenant reaches quota limit, Volta will delete 20 % of old logs to
free up space
• Through wise use of quota you can retain logs for lots of days
• Volta can retain logs for longer duration, for special tenants who need
to store them for compliance / audit
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
15
Metric Events: Retention Policy and Rollups
Naming scheme:
host + “.” + name + “.” + type_if_avail + “.” + retention_period
Retention period: 1 day, 1 week, 1 month, and 3 months:
Names for the example:
● default 1 day: lmm-dev-bastion.memory.used_
● 1 week: lmm-dev_bastion.memory.used_1w
● 1 month: lmm-dev_bastion.memory.used_1m
● 3 months: lmm-dev_bastion.memory.used_3m
rollup precision:
● default 1 day: user defined (highest)
● 1 week: metrics aggregated to 1 minute
● 1 month: metrics aggregated to 5 minutes
● 3 months: metrics aggregated to 1 hour
Naming scheme & retention policies
{
"@version": "1",
"@timestamp": "2014-08-06T19:17:43.000Z",
"host": "lmm-dev-bastion",
"name": "memory",
"collectd_type": "memory",
"type_instance": "used",
"value": 341884928,
"tenant_id": "db5ca8e4c8514fad9f98dbc4d648ee87",
"apikey": "26d85ae3-1e10-4ce4-837a-7a1c8dfc67fb"
}
Sample for metric from collectd
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
16
Alerts : Email and Callbacks
• Alerts can be set using the Alert UI or the REST API
• Alerts can be sent to Email or post Webhook (REST endpoint)
• Webhook provides a good mechanism for integration with external automation and UIs
• Alerts on Log events
– User specifies an alert template using regular expression to match
– Can match one or more fields from a Log event
– Simple and complex expressions
• Alerts on Metric events
– User specifies an alert template using comparison operators
– Can match one or more fields from the Metric event
– Simple and complex expressions
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
17
Current deployment
• Multiple deployments : on bare KVM nodes, on OpenStack VMs
– On KVM nodes: 40+ VMs, 80+ TB storage, many large memory nodes
– Components are deployed in clustered mode for HA
– Some with active/active replication, some with active/passive
• Use by Platform and Infrastructure Services
– Tens of thousands of events per second (seen around 160 K events /sec)
– Hundreds of GBs of data collected and indexed per day
– Queries are currently coming from Kibana and Grafana, in future from APIs
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
18
The Journey and Lessons ...
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
19
Log, metrics and alerts
• log events
– insist on good severity levels,
– enforce quota  induce behavior change 
– watch out for large messages (zip lines from stdout/stderr)
• metric events
– keep users aware of rollups (granularity)
• alerts
– watch out for too simple ones  alert floods
– watch out for complex regex  performance / memory suckers
– encourage metrics based alerts  this is what scales
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
20
Kafka, ES and Storm
• Kafka
– retention policy vs storage space: do the math with ingest & processing rate
– if you are not using auto-rebalance of leaders, keep an eye on the leaders
• Storm
– smaller topologies: easy to update and optimize
– match consumer parallelism (number of partitions) to kafka spouts
– tune number of executor threads to optimal performance
• ElasticSearch:
– aggregate your writes
– heap size <= 32 GB, turn off swap,
– benefits hugely from high iops  use SSDs if you can
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
21
Using Open Source Software : Joy and Frustrations
• Be ready for constant upgrades
– for bug fixes
– to get cool new features: Grafana, Kibana
– for stability, cool stats and visualization: Storm
• InfluxDB clustering maturing
– temporary HA solution (write to 2+ influxDBs)
– waiting for 0.9 release with better clustering
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
22
Eat your own Dog Food
• Volta was a cobbler’s child for a while …
– did not use any system to aggregate logs and metrics!
• Now we are using Volta to collect its logs and metrics
– send logs and metrics from one Volta instance to another
– sending to the same instance is an interesting one!
• Important metrics:
– ingest rate, Storm processing rate, ES / Influx Write latency
– end to end latency of events
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
23
Synthetic Transactions and Tracking SLAs
• Goal: track Service level metrics
– availability to users / business
– latency for operations to users
• Use Synthetic Transactions that exercise a sequence of APIs
– measure success / failure rates
– measure end to end latency
– collect, trend and alert on these
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
24
Deployment & Ops : automate, automate, automate …
• Volta is a collection of services
– use separate repos, deploy small changes
• Lots of configuration parameters : manage consistency
– performance very sensitive to values
– e.g., Heap, number of workers, etc.
• Performance benchmarking
– need to be done for each environment
• CI and Deployment pipeline
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
25
Volta next gen
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
26
Volta Next Gen
• OpenSource Volta
• Refactor Storm
– Split into separate metric and log topologies and batch writes
• Move ES and InfluxDB to higher iops storage (SSDs?)
• Multi-DC support via stream duplication
• Archival into Swift / HDFS
• Anomaly detection using CEP / Storm
• HTTP REST API in front of Kafka
• Deployment automation using OpenStack Murano
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
27
Thank you!
Questions, Comments, Suggestions?
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 28
We are interested in Open Sourcing & Collaborating on Volta.
Interested?
And, we are hiring …. interested?
ln_renganarayana@symantec.com
twitter: @lrengan
Backup Slides
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 29
LMM Metrics Data Model
● name : name of the metric. LMM uses this to store
the metrics and you will use in queries: select
“value” from “load”
● value : value of the metrics at a given time
● @timestamp : time stamp
● host : host name or any other id
● tenant_id : tenant id (keystone)
● apikey : LMM apikey
{
"@version": "1",
"@timestamp": "2014-07-30T00:16:59.000Z",
"name": "cpu",
"host": "demo.symcpe.net",
"plugin_instance": "0",
"collectd_type": "cpu",
"type_instance": "interrupt",
"value": 0,
"tenant_id":"db5ca8e4c8514fad9f98dbc4d648ee87",
"apikey": "26d85ae3-1e10-4ce4-837a-7a1c8dfc67fb"
}
Mandatory fields Sample for metric from collectd
Collectd : name of plugin becomes name of metric. E.g.: cpu or memory
StatsD : users metric name concatenated with metric type by a dot. E.g.: myapp.counter or myapp.gauge
Reserved fields: time, sequence_number Special field: type_instance
Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
30

Más contenido relacionado

La actualidad más candente

Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Databricks
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
Container Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleContainer Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleMichael Mueller
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 

La actualidad más candente (20)

Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Container Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleContainer Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycle
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 

Destacado

Building a Global Multi-Tenant Monitoring Platform
Building a Global Multi-Tenant Monitoring PlatformBuilding a Global Multi-Tenant Monitoring Platform
Building a Global Multi-Tenant Monitoring PlatformAmazon Web Services
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Grier Johnson
 
Heka - Rob Miller
Heka - Rob MillerHeka - Rob Miller
Heka - Rob MillerDevopsdays
 
What Every Organization Should Log And Monitor
What Every Organization Should Log And MonitorWhat Every Organization Should Log And Monitor
What Every Organization Should Log And MonitorAnton Chuvakin
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael CollinsDevopsdays
 
How to Use OWASP Security Logging
How to Use OWASP Security LoggingHow to Use OWASP Security Logging
How to Use OWASP Security LoggingMilton Smith
 
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008Anton Chuvakin
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 
An Introduction to Sensu by Bethany Erskine
An Introduction to Sensu by Bethany Erskine An Introduction to Sensu by Bethany Erskine
An Introduction to Sensu by Bethany Erskine Hakka Labs
 
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
 
WTF is Sensu and Monitoring
WTF is Sensu and MonitoringWTF is Sensu and Monitoring
WTF is Sensu and MonitoringToby Jackson
 
Cf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteCf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteJeff Barrows
 
2014 AWS Re:Invent sharing
2014 AWS Re:Invent sharing2014 AWS Re:Invent sharing
2014 AWS Re:Invent sharingMmik Huang
 
Sensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourSensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourKyle Anderson
 
How Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldHow Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldKyle Anderson
 
Monitoring with sensu
Monitoring with sensuMonitoring with sensu
Monitoring with sensumiquelruizm
 
Quality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business ProcessesQuality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business ProcessesTammo van Lessen
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 

Destacado (20)

Building a Global Multi-Tenant Monitoring Platform
Building a Global Multi-Tenant Monitoring PlatformBuilding a Global Multi-Tenant Monitoring Platform
Building a Global Multi-Tenant Monitoring Platform
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
 
Heka - Rob Miller
Heka - Rob MillerHeka - Rob Miller
Heka - Rob Miller
 
What Every Organization Should Log And Monitor
What Every Organization Should Log And MonitorWhat Every Organization Should Log And Monitor
What Every Organization Should Log And Monitor
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
 
How to Use OWASP Security Logging
How to Use OWASP Security LoggingHow to Use OWASP Security Logging
How to Use OWASP Security Logging
 
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008
Logs for Incident Response and Forensics: Key Issues for GOVCERT.NL 2008
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
An Introduction to Sensu by Bethany Erskine
An Introduction to Sensu by Bethany Erskine An Introduction to Sensu by Bethany Erskine
An Introduction to Sensu by Bethany Erskine
 
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
 
WTF is Sensu and Monitoring
WTF is Sensu and MonitoringWTF is Sensu and Monitoring
WTF is Sensu and Monitoring
 
Cf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteCf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphite
 
Metrics & more
Metrics & more Metrics & more
Metrics & more
 
2014 AWS Re:Invent sharing
2014 AWS Re:Invent sharing2014 AWS Re:Invent sharing
2014 AWS Re:Invent sharing
 
Sensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourSensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided Tour
 
How Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldHow Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA World
 
Logging service design
Logging service designLogging service design
Logging service design
 
Monitoring with sensu
Monitoring with sensuMonitoring with sensu
Monitoring with sensu
 
Quality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business ProcessesQuality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business Processes
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 

Similar a Logging, Metrics and Monitoring as a Service Architecture

Combinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaCombinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaElasticsearch
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaElasticsearch
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Eric Sammer
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!makker_nl
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityElasticsearch
 
Combinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaCombinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaElasticsearch
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeSoftware Guru
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasVMware Tanzu
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxHai Nguyen Duy
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
DevOps Powered by Splunk
DevOps Powered by SplunkDevOps Powered by Splunk
DevOps Powered by SplunkSplunk
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...Datacratic
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Danmark
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
Combinación de logs, métricas y rastreos para observabilidad unificada
Combinación de logs, métricas y rastreos para observabilidad unificadaCombinación de logs, métricas y rastreos para observabilidad unificada
Combinación de logs, métricas y rastreos para observabilidad unificadaElasticsearch
 

Similar a Logging, Metrics and Monitoring as a Service Architecture (20)

Combinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaCombinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizada
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
Combinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizadaCombinación de logs, métricas y seguimiento para una visibilidad centralizada
Combinación de logs, métricas y seguimiento para una visibilidad centralizada
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptx
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
DevOps Powered by Splunk
DevOps Powered by SplunkDevOps Powered by Splunk
DevOps Powered by Splunk
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Combinación de logs, métricas y rastreos para observabilidad unificada
Combinación de logs, métricas y rastreos para observabilidad unificadaCombinación de logs, métricas y rastreos para observabilidad unificada
Combinación de logs, métricas y rastreos para observabilidad unificada
 

Último

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Último (20)

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

Logging, Metrics and Monitoring as a Service Architecture

  • 1. Volta: Logging, Metrics and Monitoring as a Service LN Renganarayana Technical Director / Architect Cloud Platform Engineering ln_renganarayana@symantec.com twitter: @lrengan 1Jan 7, 2015Volta / Cloud Platform Engineering, Symantec
  • 2. Outline • Motivation: data and events are the foundation of business • Why build a (new) Service? • What have we built: a (near) real-time data analytics pipeline • The journey and lessons learned • Looking ahead: Volta next gen Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 2
  • 3. Data and events : the foundation Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 3 Picture: “Devops with S for sharing”, Patrick Debois which features to build? what is a good pricing model? how fast can I build? what is the perf of my code? how is the service? what is my capacity? what is my current usage?
  • 4. Why build a (new) service? Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 4
  • 5. Why build a service? Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 5 Picture: Jim Nisbet & Philip O’Toole AWS re:Invent 2013 Loggly presentation
  • 6. Single place for events across the stack Volta / Cloud Platform Engineering, Symantec 6 Jan 7, 2015 Bare Metal IaaS (OpenStack) Platform Services BP, SP, KV, OBS Symantec Services & Apps Volta Identity Manager CI / CD Common Services
  • 7. Volta : Design Goals • Design for both Developers and Ops – Make it extremely simple to capture events – provide powerful search and visualization tools • Secure, Multi Tenant : well we are Symantec, so Security comes first  • Scalable : elastically scale with load • Highly Available: Volta is the eyes & ears for the Operations • One system for logs, metrics, monitoring & other events • Build using open source tools and for open sourcing Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 7
  • 8. What we have built ... A (near) real-time data analytics pipeline Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 8
  • 9. Volta Client View Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 9 App Platform Services Writes app metrics directly Infrastructure SNMPVars expose metrics JMX Pull Metrics Push Metrics Volta Shipper VM logs Volta metrics log events Alerts& ConfigUI Push: StatsD, metrics extension for openstack Pull: CollectD. Shipper: logstash, moving to Heka
  • 10. Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 10 Kafka cluster knode1 Keystone knode2 knode3 knodeN... log, metric, alert events Storm cluster Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies Elastic Search Elastic SearchRedis Alerts email & callbacks Load Balancer Client App / Service s1 s2 s3 s4 ... sn log & metrics shipper log, metric & alert events InfluxDB InfluxDB InfluxDB MetricsStore Elastic Search Elastic Search Elastic Search LogStore Authentication, Validation, Alerts Processing VoltaUndertheHood Quota & Policy
  • 11. Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 11 Kafka cluster knode1 knode2 knode3 knodeN... log, metric, alert events Client App / Service log & metrics shipper log, metric & alert events The Ingest Pipeline VIP • Kafka – replicated, fault tolerant, persistent message queue • LogTopic, MetricTopic, AlertTopic • each topic is split into partitions • per topic retention policy
  • 12. Event processing and storage Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 12 Storm cluster Elastic Search Elastic SearchRedis Alerts email & callbacks log, metric & alert events InfluxDB InfluxDB InfluxDB MetricsStore Elastic Search Elastic Search Elastic Search LogStore Authentication, Validation, Alerts Processing Quota & Policy • alert rules • [tenantid, apikey] pairs • Per tenant per day index • Index typed fields • Quota and retention policy • Tenant id prefixed time series names • Continuous queries do rollups • Retention policy through rollups
  • 13. Multi-tenancy Proxy & UI Volta / Cloud Platform Engineering, Symantec 13 Keystone Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies Elastic SearchElastic SearchRedis Load Balancer s1 s2 s3 s4 ... sn InfluxDB InfluxDB InfluxDB MetricsStore Elastic SearchElastic SearchElastic Search LogStore • Intercepts and rewrites queries to ES and InfluxDB • Enforces Multi-tenancy (visibility of events to users)
  • 14. Security and Multi-tenancy model • Authentication with Keystone backed by LDAP – user authentication for Query API and UI • Multi tenancy with users and groups – Events have tenant id and apikey • Cross tenant correlation – group membership used for cross-tenant event visibility / correlation • Dashboard sharing Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 14
  • 15. Retention Policy : Log Events • ElasticSearch allows powerful querying, but comes at a cost – Store only logs that would help better operate and trouble shoot – Use appropriate debug levels (not INFO) • Fixed quota : 350 GB or 500 GB • When tenant reaches quota limit, Volta will delete 20 % of old logs to free up space • Through wise use of quota you can retain logs for lots of days • Volta can retain logs for longer duration, for special tenants who need to store them for compliance / audit Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 15
  • 16. Metric Events: Retention Policy and Rollups Naming scheme: host + “.” + name + “.” + type_if_avail + “.” + retention_period Retention period: 1 day, 1 week, 1 month, and 3 months: Names for the example: ● default 1 day: lmm-dev-bastion.memory.used_ ● 1 week: lmm-dev_bastion.memory.used_1w ● 1 month: lmm-dev_bastion.memory.used_1m ● 3 months: lmm-dev_bastion.memory.used_3m rollup precision: ● default 1 day: user defined (highest) ● 1 week: metrics aggregated to 1 minute ● 1 month: metrics aggregated to 5 minutes ● 3 months: metrics aggregated to 1 hour Naming scheme & retention policies { "@version": "1", "@timestamp": "2014-08-06T19:17:43.000Z", "host": "lmm-dev-bastion", "name": "memory", "collectd_type": "memory", "type_instance": "used", "value": 341884928, "tenant_id": "db5ca8e4c8514fad9f98dbc4d648ee87", "apikey": "26d85ae3-1e10-4ce4-837a-7a1c8dfc67fb" } Sample for metric from collectd Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 16
  • 17. Alerts : Email and Callbacks • Alerts can be set using the Alert UI or the REST API • Alerts can be sent to Email or post Webhook (REST endpoint) • Webhook provides a good mechanism for integration with external automation and UIs • Alerts on Log events – User specifies an alert template using regular expression to match – Can match one or more fields from a Log event – Simple and complex expressions • Alerts on Metric events – User specifies an alert template using comparison operators – Can match one or more fields from the Metric event – Simple and complex expressions Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 17
  • 18. Current deployment • Multiple deployments : on bare KVM nodes, on OpenStack VMs – On KVM nodes: 40+ VMs, 80+ TB storage, many large memory nodes – Components are deployed in clustered mode for HA – Some with active/active replication, some with active/passive • Use by Platform and Infrastructure Services – Tens of thousands of events per second (seen around 160 K events /sec) – Hundreds of GBs of data collected and indexed per day – Queries are currently coming from Kibana and Grafana, in future from APIs Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 18
  • 19. The Journey and Lessons ... Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 19
  • 20. Log, metrics and alerts • log events – insist on good severity levels, – enforce quota  induce behavior change  – watch out for large messages (zip lines from stdout/stderr) • metric events – keep users aware of rollups (granularity) • alerts – watch out for too simple ones  alert floods – watch out for complex regex  performance / memory suckers – encourage metrics based alerts  this is what scales Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 20
  • 21. Kafka, ES and Storm • Kafka – retention policy vs storage space: do the math with ingest & processing rate – if you are not using auto-rebalance of leaders, keep an eye on the leaders • Storm – smaller topologies: easy to update and optimize – match consumer parallelism (number of partitions) to kafka spouts – tune number of executor threads to optimal performance • ElasticSearch: – aggregate your writes – heap size <= 32 GB, turn off swap, – benefits hugely from high iops  use SSDs if you can Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 21
  • 22. Using Open Source Software : Joy and Frustrations • Be ready for constant upgrades – for bug fixes – to get cool new features: Grafana, Kibana – for stability, cool stats and visualization: Storm • InfluxDB clustering maturing – temporary HA solution (write to 2+ influxDBs) – waiting for 0.9 release with better clustering Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 22
  • 23. Eat your own Dog Food • Volta was a cobbler’s child for a while … – did not use any system to aggregate logs and metrics! • Now we are using Volta to collect its logs and metrics – send logs and metrics from one Volta instance to another – sending to the same instance is an interesting one! • Important metrics: – ingest rate, Storm processing rate, ES / Influx Write latency – end to end latency of events Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 23
  • 24. Synthetic Transactions and Tracking SLAs • Goal: track Service level metrics – availability to users / business – latency for operations to users • Use Synthetic Transactions that exercise a sequence of APIs – measure success / failure rates – measure end to end latency – collect, trend and alert on these Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 24
  • 25. Deployment & Ops : automate, automate, automate … • Volta is a collection of services – use separate repos, deploy small changes • Lots of configuration parameters : manage consistency – performance very sensitive to values – e.g., Heap, number of workers, etc. • Performance benchmarking – need to be done for each environment • CI and Deployment pipeline Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 25
  • 26. Volta next gen Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 26
  • 27. Volta Next Gen • OpenSource Volta • Refactor Storm – Split into separate metric and log topologies and batch writes • Move ES and InfluxDB to higher iops storage (SSDs?) • Multi-DC support via stream duplication • Archival into Swift / HDFS • Anomaly detection using CEP / Storm • HTTP REST API in front of Kafka • Deployment automation using OpenStack Murano Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 27
  • 28. Thank you! Questions, Comments, Suggestions? Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 28 We are interested in Open Sourcing & Collaborating on Volta. Interested? And, we are hiring …. interested? ln_renganarayana@symantec.com twitter: @lrengan
  • 29. Backup Slides Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 29
  • 30. LMM Metrics Data Model ● name : name of the metric. LMM uses this to store the metrics and you will use in queries: select “value” from “load” ● value : value of the metrics at a given time ● @timestamp : time stamp ● host : host name or any other id ● tenant_id : tenant id (keystone) ● apikey : LMM apikey { "@version": "1", "@timestamp": "2014-07-30T00:16:59.000Z", "name": "cpu", "host": "demo.symcpe.net", "plugin_instance": "0", "collectd_type": "cpu", "type_instance": "interrupt", "value": 0, "tenant_id":"db5ca8e4c8514fad9f98dbc4d648ee87", "apikey": "26d85ae3-1e10-4ce4-837a-7a1c8dfc67fb" } Mandatory fields Sample for metric from collectd Collectd : name of plugin becomes name of metric. E.g.: cpu or memory StatsD : users metric name concatenated with metric type by a dot. E.g.: myapp.counter or myapp.gauge Reserved fields: time, sequence_number Special field: type_instance Jan 7, 2015Volta / Cloud Platform Engineering, Symantec 30

Notas del editor

  1. data driven/informed development, ops, choice : OODA loop what users like? which features to build? how fast can I build it? how is my service running?
  2. What are the use cases? Why build a new service? - as a service: how can I make it someone else' problem? - consumed by services across the stack - scalable and elastic - secure, multi-tenant - splunk was too expensive: new competing open source tech emerging
  3. Everyone starts with... – A bunch of log files (syslog, application specific) – On a bunch of machines • Management consists of doing the simple stuff – Rotate files, compress and delete – Information is there but awkward to find specific events – Weird log retention policies evolve over time
  4. User authentication with Keystone for Query API & UI Tenant id and API key used for events sent to LMM Tenant ids from Keystone and API keys generated by LMM Every event is tagged with a tenant id Log events: tenant id as a field Metric events: tenant id prefixed to the metric name Keystone group membership used for sophisticated cross-tenant event visibility / correlation
  5. show a demo