SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Monitoring Akka with Kamon 1.0
Dr. Steffen Gebert
Insights into the inner workings of an application
become crucial latest when performance and
scalability issues are encountered. This becomes
especially challenging in distributed systems, like
when using Akka cluster.
A popular open-source solution for monitoring on the
JVM in general, and Akka in particular, is Kamon. With
its recently reached 1.0 milestone, it features means
for both metrics collection and tracing of Akka
applications, running both standalone or distributed.
This talk gives an introduction to Kamon 1.0 with a
focus on its metrics features. The basic setup using
Prometheus and Grafana will be described, as well as
an overview over the different modules and its APIs
for implementing custom metrics. The resulting setup
allows to record both, automatically exposed metrics
about Akka’s actor systems, as well as metrics tailored
to the monitored application’s domain and service
level indicators.
Finally, learnings from a first-time user experience of
getting started with Kamon will be reported. The
example of adding instrumentation to EMnify’s
mobile core application will illustrate, how easy it is to
get started and how to kill the Prometheus on a daily
basis.
Abstract
• Steffen
• has a heart beating for infrastructure
• writes code at EMnify
• PhD in computer science, topic: software-based networks
• EMnify
• MVNO focussed on IoT
• runs virtualized mobile core network
• Würzburg/Berlin, Germany
About Me & Us
@StGebert
Slides available at st-g.de/speaking
• Kamon Overview
• Metrics Instrumentation
• Setup: Kamon with Prometheus & Grafana
• Experience at EMnify
• Summary
Agenda
• Our application is slow
• Nagios did not tell us
• APM did
Application Performance Monitoring
Kamon
Kamon
• Open Source
• Monitoring for the JVM
• Integrations for Akka
• Release 1.0 in January 2018
kamon.io / github.com/kamon-io
• Tracing
• Per-request call graph
• Context propagation across nodes
• Exemplary objectives:
• Request profiling
• Understanding call graph
• Metrics
Kamon: Feature Set
Exemplary Trace
• Tracing
• Per-request call graph
• Context propagation across nodes
• Exemplary objectives:
• Request profiling
• Understanding call graph
• Metrics
• Time series data
• Counters / gauges / distributions
• Exemplary objectives:
• Function call counts and latency
• Open DB connections
• User logins
• Generated revenue
Kamon: Feature Set
• Custom Metrics
• added to your code where it
makes sense
• Automatic Instrumentation
• integrations into Akka,
Akka HTTP, Play, JDBC, Servlet
• system and JVM metrics
Metrics
• Counter
• function calls
• customer buying our product
• Gauge
• number of open DB connections
• mailbox size
Custom Metric Types
t
t
• Histogram
• latencies
• shopping cart total prices
• Timer
• latencies
• RangeSampler
• number of open DB connections
• mailbox size
Custom Metric Types (2)
histogram
(single sample)
observations
value10 20 30 40 50
• Kamon.counter("hello.krakow").increment();
• Histogram hist = Kamon.histogram("age");
hist.record(33);
hist.record(21);
• CounterMetric c = Kamon.counter("participants");
Counter cReact = c.refine("conference", "react");
Counter cScala = c.refine("conference", "scala");
cReact.increment(42);
Custom Metrics: Implementation
• Actor system metrics
• processed messages
• active actors
• unhandled messages
• dead letters
• Per actor performance metrics
• processing time (per message)
• time in mailbox
• mailbox sizes
• errors
Kamon Akka
Mailbox
Actor A
Mailbox
Actor B
Mailbox
Actor C
Message
• Metrics related to
• routers
• dispatchers
• executors
• actor groups
• remoting (with kamon-akka-remote)
• Requirement (AOP)
• AspectJ Weaver or
• Kanela (Kamon Agent)
Kamon Akka (2)
Kamon + Prometheus + Grafana
Setup
Related Projects
Targets Time Series DB Dashboard
simple_client
DropWizard Metrics
Micrometer
Commercial Tools
Datadog, Dynatrace, Instana, NewRelic, etc.
• Time Series Database
• collection, storage & query of metrics data
• based on Google's Borgmon, CNCF project
• Pull-based model
• scrapes configured targets
• HTTP endpoints on monitored targets
• Easy deployment
• statically linked Golang binaries
• single YAML config file
• Alertmanager.. for alerting ;-)
Prometheus
• Integrated time series database
• on disk, no external dependency
• fixed retention period, no long-term storage / downsampling
• very efficient storage [1]
• query language PromQL
Prometheus TSDB
[1] Storing 16 bytes at scale, Fabian Reinartz @ PromCon 2017
Setup
Application
Targets
Node Exporter
cAdvisor
Service Discovery
(AWS EC2,
Kubernetes, etc.)
Time Series DB Dashboard
• Exporter output (scraped by Prom via HTTP):
myapp_checkouts{product="sim_4ff"} 42.0
myapp_checkouts{product="sim_embedded"} 5412.0
akka_system_dead_letters_total{system="test"} 224.0
…
• Querying with PromQL
rate(akka_system_dead_letters_total[5m]) 0
// handles counter resets / overflows
Ingesting & Querying
0
• Just a frontend to supply PromQL queries and build dashboards
• Kamon Akka dashboard available at grafana.com/dashboards/4469
Grafana
with Kamon
EMnify's Experience
• Tick interval (Kamon) and scrape frequency (Prometheus)
• both should match!
• usually (?) 30s or 60s
• for load tests, we went for 5s
• hope to go for 15s in production
• Deployment [for development / load tests]
• EC2 instances tagged in CloudFormation plus EC2 service discovery
• started simple (stupid): Prometheus in container on AWS ECS with EFS
Our Experiences with Kamon+Prometheus
Docker automated build config github.com/EMnify/prometheus-docker
• Little CPU resources + NFS storage + high cardinality =
• High cardinality?
• akka_actor_processing_time_seconds_bucket{⏎
class="com.example.SomethingFrequentlyUsed", ⏎
le="0.33", …⏎
path="mystem/some-supervisor/$aX"}
How to Kill Prometheus (Regularly)
• Define actor groups
kamon.akka.actor-groups += "mygroup"
kamon.util.filters {
"akka.tracked-actor" {
excludes = ["mysystem/some-supervisor/*"]
}
mygroup {
includes = ["mysystem/some-supervisor/*"]
}
}
• Delete Prometheus data to recover
• Continue to watch out for metrics with unnamed actors
How to Fix Kamon to Not Kill Prometheus
• Limit the number of samples per scrape:
<scrape_config>
# Per-scrape limit on number of scraped samples that will be accepted.
[ sample_limit: <int> | default = 0 ]
• Watch for limit kicking in:
prometheus_target_scrapes_exceeded_sample_limit_total
How to Fix Prometheus to Not Kill Itself
Bonus: Kamino
• Hosted service
• by Kamon developers
• currently in private beta
• no price tags, yet
• Great user experience for us
• tailored to Akka monitoring
• distributions over time
• still, few rough edges
Kamino Hosted Service
Targets Time Series DB Dashboard
Per-Actor Metrics
Example: Fixing Bottle Neck
restart
deployment
• Kamon offers wide range of APM features
• customized and automated metric collection
• works with both on-prem/OSS and SaaS "backends"
• super friendly community, thanks Ivan!
• distributed tracing
• Monitor your application (from the inside!)
• now!
• better start small
Summary & Conclusion
Find me at the Speaker‘s Roundtable
Questions, please!
Backup
• Data Collection
• Core
• Akka
• Akka Remote
• Akka HTTP
• Play
• JDBC
• Executors
• System Metrics
• Reporting
• Metrics: Prometheus, Kamino
(WIP: Datadog, InfluxDB, statsd)
• Tracing: Zipkin, Jaeger, Kamino
• Logs: Logback
• Context Propagation
• Akka Remote, Akka HTTP, Play
• http4s
Kamon: Modules
Setup with Kamon
JVM
Your ApplicationPort 80
Kamon
Kamon-prometheus Port 9095
Prometheus
Storage
Retrieval PromQL
Port 9090
Node Exporter Port 9100
scrapes
Grafana
*magic*
Prometheus Data Source
Kamon.histogram(
"datavolume",
MeasurementUnit.information().gigabytes(),
DynamicRange.apply(
0, // lowestDiscernibleValue
10000, // highestTrackableValue
2 // significantValueDigits
)
);
Measurement Units / Dynamic Ranges
Prometheus Architecture
• Kamon core trackable values
• highest trackable values for range sampler / histogram
• can be adjusted per metric
• Default Prometheus histogram buckets might not fit
• global default can be adjusted
• PR pending for overriding per metric [1]
Adjusting Value Ranges / Aggregation
[1] kamon-io/kamon-prometheus#12
Histograms
histogram
over timevalue
t
10
30
50
observations
0 max
histogram
(single sample)
observations
value10 20 30 40 50
• Better describe values than
avg/min/max does
• Can be aggregated across nodes
• Usually percentiles/quantiles computed
• Xth percentile: X% of the values lower than <n>
• Median (=50th percentile)
• SLO/SLA candidates 90/95/99th percentile of
response times
https://github.com/improbable-eng/thanos
https://www.slideshare.net/BartomiejPotka/thanos-global-durable-prometheus-monitoring
Thanos: Prometheus Long-Term Storage
Thanos: Global Scale
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 1m
Our Prometheus Config
scrape_configs:
- job_name: prometheus
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:9090
- job_name: kamon
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
sample_limit: 5000
ec2_sd_configs:
- region: eu-west-1
refresh_interval: 1m
port: 9095
relabel_configs:
- source_labels: [__meta_ec2_tag_Environment]
separator: ;
regex: (.*)
target_label: environment
replacement: $1
action: replace
- source_labels: [__meta_ec2_private_ip]
separator: ;
regex: (.*)
target_label: __address__
replacement: ${1}:9095
action: replace
- source_labels: [__meta_ec2_tag_Name]
separator: ;
regex: (.*)
target_label: instance
replacement: ${1}:9095
action: replace
- source_labels: [__meta_ec2_instance_id]
separator: ;
regex: (.*)
target_label: instance_id
replacement: $1
action: replace
- source_labels: [__meta_ec2_tag_Platform]
separator: ;
regex: akka
target_label: platform
replacement: $1
action: keep
- source_labels: [__meta_ec2_tag_AkkaApplication
separator: ;
regex: (.*)
target_label: akka_application
replacement: $1
action: replace
- source_labels: [__meta_ec2_tag_AkkaRole]
separator: ;
regex: (.*)
target_label: akka_role
replacement: $1
action: replace

Más contenido relacionado

La actualidad más candente

Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWSAutomated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Bamdad Dashtban
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014
CloudBees
 

La actualidad más candente (20)

JavaOne 2016 - Pipeline as code
JavaOne 2016 - Pipeline as codeJavaOne 2016 - Pipeline as code
JavaOne 2016 - Pipeline as code
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
 
Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2
 
7 Habits of Highly Effective Jenkins Users
7 Habits of Highly Effective Jenkins Users7 Habits of Highly Effective Jenkins Users
7 Habits of Highly Effective Jenkins Users
 
Let's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateLet's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a Certificate
 
Jenkins & IaC
Jenkins & IaCJenkins & IaC
Jenkins & IaC
 
Testing with Docker
Testing with DockerTesting with Docker
Testing with Docker
 
Continuous Delivery Pipeline with Docker and Jenkins
Continuous Delivery Pipeline with Docker and JenkinsContinuous Delivery Pipeline with Docker and Jenkins
Continuous Delivery Pipeline with Docker and Jenkins
 
CI/CD on Android project via Jenkins Pipeline
CI/CD on Android project via Jenkins PipelineCI/CD on Android project via Jenkins Pipeline
CI/CD on Android project via Jenkins Pipeline
 
Building Jenkins Pipelines at Scale
Building Jenkins Pipelines at ScaleBuilding Jenkins Pipelines at Scale
Building Jenkins Pipelines at Scale
 
Brujug Jenkins pipeline scalability
Brujug Jenkins pipeline scalabilityBrujug Jenkins pipeline scalability
Brujug Jenkins pipeline scalability
 
Jenkins Pipelines
Jenkins PipelinesJenkins Pipelines
Jenkins Pipelines
 
Rails Applications with Docker
Rails Applications with DockerRails Applications with Docker
Rails Applications with Docker
 
Build, Publish, Deploy and Test Docker images and containers with Jenkins Wor...
Build, Publish, Deploy and Test Docker images and containers with Jenkins Wor...Build, Publish, Deploy and Test Docker images and containers with Jenkins Wor...
Build, Publish, Deploy and Test Docker images and containers with Jenkins Wor...
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Building kubectl plugins with Quarkus | DevNation Tech Talk
Building kubectl plugins with Quarkus | DevNation Tech TalkBuilding kubectl plugins with Quarkus | DevNation Tech Talk
Building kubectl plugins with Quarkus | DevNation Tech Talk
 
Building an Extensible, Resumable DSL on Top of Apache Groovy
Building an Extensible, Resumable DSL on Top of Apache GroovyBuilding an Extensible, Resumable DSL on Top of Apache Groovy
Building an Extensible, Resumable DSL on Top of Apache Groovy
 
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWSAutomated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014
 

Similar a Monitoring Akka with Kamon 1.0

Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011
Raymond Roestenburg
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
jucaab
 

Similar a Monitoring Akka with Kamon 1.0 (20)

Microservice Automated Testing on Kubernetes
Microservice Automated Testing on KubernetesMicroservice Automated Testing on Kubernetes
Microservice Automated Testing on Kubernetes
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 
Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Ceilosca
CeiloscaCeilosca
Ceilosca
 
Automating Security in your IaC Pipeline
Automating Security in your IaC PipelineAutomating Security in your IaC Pipeline
Automating Security in your IaC Pipeline
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
TDC Connections 2023 - A High-Speed Data Ingestion Service in Java Using MQTT...
TDC Connections 2023 - A High-Speed Data Ingestion Service in Java Using MQTT...TDC Connections 2023 - A High-Speed Data Ingestion Service in Java Using MQTT...
TDC Connections 2023 - A High-Speed Data Ingestion Service in Java Using MQTT...
 
TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15
 

Más de Steffen Gebert

*.typo3.org - Dienste von und für die Community
*.typo3.org - Dienste von und für die Community*.typo3.org - Dienste von und für die Community
*.typo3.org - Dienste von und für die Community
Steffen Gebert
 
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-CommunityGit & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Steffen Gebert
 

Más de Steffen Gebert (20)

Building an IoT SuperNetwork on top of the AWS Global Infrastructure
Building an IoT SuperNetwork on top of the AWS Global InfrastructureBuilding an IoT SuperNetwork on top of the AWS Global Infrastructure
Building an IoT SuperNetwork on top of the AWS Global Infrastructure
 
Wenn selbst ‘erlaube allen Verkehr von 0.0.0.0/0’ nicht hilft - Verbindungspr...
Wenn selbst ‘erlaube allen Verkehr von 0.0.0.0/0’ nicht hilft - Verbindungspr...Wenn selbst ‘erlaube allen Verkehr von 0.0.0.0/0’ nicht hilft - Verbindungspr...
Wenn selbst ‘erlaube allen Verkehr von 0.0.0.0/0’ nicht hilft - Verbindungspr...
 
Feature Management Platforms
Feature Management PlatformsFeature Management Platforms
Feature Management Platforms
 
Serverless Networking - How We Provide Cloud-Native Connectivity for IoT Devices
Serverless Networking - How We Provide Cloud-Native Connectivity for IoT DevicesServerless Networking - How We Provide Cloud-Native Connectivity for IoT Devices
Serverless Networking - How We Provide Cloud-Native Connectivity for IoT Devices
 
How our Cloudy Mindsets Approached Physical Routers
How our Cloudy Mindsets Approached Physical RoutersHow our Cloudy Mindsets Approached Physical Routers
How our Cloudy Mindsets Approached Physical Routers
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous Delivery
 
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebCleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
 
SDN interfaces and performance analysis of SDN components
SDN interfaces and performance analysis of SDN componentsSDN interfaces and performance analysis of SDN components
SDN interfaces and performance analysis of SDN components
 
Git Power-Workshop
Git Power-WorkshopGit Power-Workshop
Git Power-Workshop
 
The Development Infrastructure of the TYPO3 Project
The Development Infrastructure of the TYPO3 ProjectThe Development Infrastructure of the TYPO3 Project
The Development Infrastructure of the TYPO3 Project
 
Der Weg zu TYPO3 CMS 6.0 und Einblicke in die TYPO3-Entwicklung
Der Weg zu TYPO3 CMS 6.0 und Einblicke in die TYPO3-EntwicklungDer Weg zu TYPO3 CMS 6.0 und Einblicke in die TYPO3-Entwicklung
Der Weg zu TYPO3 CMS 6.0 und Einblicke in die TYPO3-Entwicklung
 
Official typo3.org infrastructure &
the TYPO3 Server Admin Team
Official typo3.org infrastructure &
the TYPO3 Server Admin TeamOfficial typo3.org infrastructure &
the TYPO3 Server Admin Team
Official typo3.org infrastructure &
the TYPO3 Server Admin Team
 
Neuigkeiten aus dem TYPO3-Projekt
Neuigkeiten aus dem TYPO3-ProjektNeuigkeiten aus dem TYPO3-Projekt
Neuigkeiten aus dem TYPO3-Projekt
 
The TYPO3 Server Admin Team
The TYPO3 Server Admin TeamThe TYPO3 Server Admin Team
The TYPO3 Server Admin Team
 
Gerrit Workshop
Gerrit WorkshopGerrit Workshop
Gerrit Workshop
 
Making of: TYPO3
Making of: TYPO3Making of: TYPO3
Making of: TYPO3
 
*.typo3.org - Dienste von und für die Community
*.typo3.org - Dienste von und für die Community*.typo3.org - Dienste von und für die Community
*.typo3.org - Dienste von und für die Community
 
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-CommunityGit & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
 
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-CommunityGit & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
Git & Gerrit: Verteilte Softwareentwicklung und -reviews in der TYPO3-Community
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Monitoring Akka with Kamon 1.0

  • 1. Monitoring Akka with Kamon 1.0 Dr. Steffen Gebert
  • 2. Insights into the inner workings of an application become crucial latest when performance and scalability issues are encountered. This becomes especially challenging in distributed systems, like when using Akka cluster. A popular open-source solution for monitoring on the JVM in general, and Akka in particular, is Kamon. With its recently reached 1.0 milestone, it features means for both metrics collection and tracing of Akka applications, running both standalone or distributed. This talk gives an introduction to Kamon 1.0 with a focus on its metrics features. The basic setup using Prometheus and Grafana will be described, as well as an overview over the different modules and its APIs for implementing custom metrics. The resulting setup allows to record both, automatically exposed metrics about Akka’s actor systems, as well as metrics tailored to the monitored application’s domain and service level indicators. Finally, learnings from a first-time user experience of getting started with Kamon will be reported. The example of adding instrumentation to EMnify’s mobile core application will illustrate, how easy it is to get started and how to kill the Prometheus on a daily basis. Abstract
  • 3. • Steffen • has a heart beating for infrastructure • writes code at EMnify • PhD in computer science, topic: software-based networks • EMnify • MVNO focussed on IoT • runs virtualized mobile core network • Würzburg/Berlin, Germany About Me & Us @StGebert Slides available at st-g.de/speaking
  • 4. • Kamon Overview • Metrics Instrumentation • Setup: Kamon with Prometheus & Grafana • Experience at EMnify • Summary Agenda
  • 5. • Our application is slow • Nagios did not tell us • APM did Application Performance Monitoring
  • 7. Kamon • Open Source • Monitoring for the JVM • Integrations for Akka • Release 1.0 in January 2018 kamon.io / github.com/kamon-io
  • 8. • Tracing • Per-request call graph • Context propagation across nodes • Exemplary objectives: • Request profiling • Understanding call graph • Metrics Kamon: Feature Set
  • 10. • Tracing • Per-request call graph • Context propagation across nodes • Exemplary objectives: • Request profiling • Understanding call graph • Metrics • Time series data • Counters / gauges / distributions • Exemplary objectives: • Function call counts and latency • Open DB connections • User logins • Generated revenue Kamon: Feature Set
  • 11. • Custom Metrics • added to your code where it makes sense • Automatic Instrumentation • integrations into Akka, Akka HTTP, Play, JDBC, Servlet • system and JVM metrics Metrics
  • 12. • Counter • function calls • customer buying our product • Gauge • number of open DB connections • mailbox size Custom Metric Types t t
  • 13. • Histogram • latencies • shopping cart total prices • Timer • latencies • RangeSampler • number of open DB connections • mailbox size Custom Metric Types (2) histogram (single sample) observations value10 20 30 40 50
  • 14. • Kamon.counter("hello.krakow").increment(); • Histogram hist = Kamon.histogram("age"); hist.record(33); hist.record(21); • CounterMetric c = Kamon.counter("participants"); Counter cReact = c.refine("conference", "react"); Counter cScala = c.refine("conference", "scala"); cReact.increment(42); Custom Metrics: Implementation
  • 15. • Actor system metrics • processed messages • active actors • unhandled messages • dead letters • Per actor performance metrics • processing time (per message) • time in mailbox • mailbox sizes • errors Kamon Akka Mailbox Actor A Mailbox Actor B Mailbox Actor C Message
  • 16. • Metrics related to • routers • dispatchers • executors • actor groups • remoting (with kamon-akka-remote) • Requirement (AOP) • AspectJ Weaver or • Kanela (Kamon Agent) Kamon Akka (2)
  • 17. Kamon + Prometheus + Grafana Setup
  • 18. Related Projects Targets Time Series DB Dashboard simple_client DropWizard Metrics Micrometer Commercial Tools Datadog, Dynatrace, Instana, NewRelic, etc.
  • 19. • Time Series Database • collection, storage & query of metrics data • based on Google's Borgmon, CNCF project • Pull-based model • scrapes configured targets • HTTP endpoints on monitored targets • Easy deployment • statically linked Golang binaries • single YAML config file • Alertmanager.. for alerting ;-) Prometheus
  • 20. • Integrated time series database • on disk, no external dependency • fixed retention period, no long-term storage / downsampling • very efficient storage [1] • query language PromQL Prometheus TSDB [1] Storing 16 bytes at scale, Fabian Reinartz @ PromCon 2017
  • 21. Setup Application Targets Node Exporter cAdvisor Service Discovery (AWS EC2, Kubernetes, etc.) Time Series DB Dashboard
  • 22. • Exporter output (scraped by Prom via HTTP): myapp_checkouts{product="sim_4ff"} 42.0 myapp_checkouts{product="sim_embedded"} 5412.0 akka_system_dead_letters_total{system="test"} 224.0 … • Querying with PromQL rate(akka_system_dead_letters_total[5m]) 0 // handles counter resets / overflows Ingesting & Querying 0
  • 23. • Just a frontend to supply PromQL queries and build dashboards • Kamon Akka dashboard available at grafana.com/dashboards/4469 Grafana
  • 25. • Tick interval (Kamon) and scrape frequency (Prometheus) • both should match! • usually (?) 30s or 60s • for load tests, we went for 5s • hope to go for 15s in production • Deployment [for development / load tests] • EC2 instances tagged in CloudFormation plus EC2 service discovery • started simple (stupid): Prometheus in container on AWS ECS with EFS Our Experiences with Kamon+Prometheus Docker automated build config github.com/EMnify/prometheus-docker
  • 26. • Little CPU resources + NFS storage + high cardinality = • High cardinality? • akka_actor_processing_time_seconds_bucket{⏎ class="com.example.SomethingFrequentlyUsed", ⏎ le="0.33", …⏎ path="mystem/some-supervisor/$aX"} How to Kill Prometheus (Regularly)
  • 27. • Define actor groups kamon.akka.actor-groups += "mygroup" kamon.util.filters { "akka.tracked-actor" { excludes = ["mysystem/some-supervisor/*"] } mygroup { includes = ["mysystem/some-supervisor/*"] } } • Delete Prometheus data to recover • Continue to watch out for metrics with unnamed actors How to Fix Kamon to Not Kill Prometheus
  • 28. • Limit the number of samples per scrape: <scrape_config> # Per-scrape limit on number of scraped samples that will be accepted. [ sample_limit: <int> | default = 0 ] • Watch for limit kicking in: prometheus_target_scrapes_exceeded_sample_limit_total How to Fix Prometheus to Not Kill Itself
  • 30. • Hosted service • by Kamon developers • currently in private beta • no price tags, yet • Great user experience for us • tailored to Akka monitoring • distributions over time • still, few rough edges Kamino Hosted Service Targets Time Series DB Dashboard
  • 32. Example: Fixing Bottle Neck restart deployment
  • 33. • Kamon offers wide range of APM features • customized and automated metric collection • works with both on-prem/OSS and SaaS "backends" • super friendly community, thanks Ivan! • distributed tracing • Monitor your application (from the inside!) • now! • better start small Summary & Conclusion
  • 34. Find me at the Speaker‘s Roundtable Questions, please!
  • 35.
  • 37. • Data Collection • Core • Akka • Akka Remote • Akka HTTP • Play • JDBC • Executors • System Metrics • Reporting • Metrics: Prometheus, Kamino (WIP: Datadog, InfluxDB, statsd) • Tracing: Zipkin, Jaeger, Kamino • Logs: Logback • Context Propagation • Akka Remote, Akka HTTP, Play • http4s Kamon: Modules
  • 38. Setup with Kamon JVM Your ApplicationPort 80 Kamon Kamon-prometheus Port 9095 Prometheus Storage Retrieval PromQL Port 9090 Node Exporter Port 9100 scrapes Grafana *magic* Prometheus Data Source
  • 39. Kamon.histogram( "datavolume", MeasurementUnit.information().gigabytes(), DynamicRange.apply( 0, // lowestDiscernibleValue 10000, // highestTrackableValue 2 // significantValueDigits ) ); Measurement Units / Dynamic Ranges
  • 41. • Kamon core trackable values • highest trackable values for range sampler / histogram • can be adjusted per metric • Default Prometheus histogram buckets might not fit • global default can be adjusted • PR pending for overriding per metric [1] Adjusting Value Ranges / Aggregation [1] kamon-io/kamon-prometheus#12
  • 42. Histograms histogram over timevalue t 10 30 50 observations 0 max histogram (single sample) observations value10 20 30 40 50 • Better describe values than avg/min/max does • Can be aggregated across nodes • Usually percentiles/quantiles computed • Xth percentile: X% of the values lower than <n> • Median (=50th percentile) • SLO/SLA candidates 90/95/99th percentile of response times
  • 45. global: scrape_interval: 5s scrape_timeout: 5s evaluation_interval: 1m Our Prometheus Config scrape_configs: - job_name: prometheus scrape_interval: 5s scrape_timeout: 5s metrics_path: /metrics scheme: http static_configs: - targets: - localhost:9090 - job_name: kamon scrape_interval: 5s scrape_timeout: 5s metrics_path: /metrics scheme: http sample_limit: 5000 ec2_sd_configs: - region: eu-west-1 refresh_interval: 1m port: 9095 relabel_configs: - source_labels: [__meta_ec2_tag_Environment] separator: ; regex: (.*) target_label: environment replacement: $1 action: replace - source_labels: [__meta_ec2_private_ip] separator: ; regex: (.*) target_label: __address__ replacement: ${1}:9095 action: replace - source_labels: [__meta_ec2_tag_Name] separator: ; regex: (.*) target_label: instance replacement: ${1}:9095 action: replace - source_labels: [__meta_ec2_instance_id] separator: ; regex: (.*) target_label: instance_id replacement: $1 action: replace - source_labels: [__meta_ec2_tag_Platform] separator: ; regex: akka target_label: platform replacement: $1 action: keep - source_labels: [__meta_ec2_tag_AkkaApplication separator: ; regex: (.*) target_label: akka_application replacement: $1 action: replace - source_labels: [__meta_ec2_tag_AkkaRole] separator: ; regex: (.*) target_label: akka_role replacement: $1 action: replace