SlideShare una empresa de Scribd logo
1 de 31
1
2
Daniel Krook
Senior Certified IT Specialist, IBM
The IBM dashboard for operational metrics
3
We run Cloud Foundry on dozens of OpenStack VMs
Two intranet clusters
In the past year, we’ve learned how to
Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps
NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps
Not counting Dev deployments
All on 50+ Nova Compute nodes
• Keep Cloud Foundry running smoothly
• Discover and prevent impending problems
• Resolve unexpected issues quickly
4
1. Show the key data points we track
2. Show how our metrics dashboard helps us monitor that data
3. Share ideas on how to find better data in NG and beyond
4. Spark discussion on improved visibility for CF admins and customers.
Goals of this lightning talk
We are looking to get better at this, and help the community get better as well.
5
1. The key data
6
What are the important metrics?
Data that can be
tracked over time to see
trends and behaviors
Data that can help
us predict problems
before they happen
DEAs and apps health
 Memory reserved as a proportion of the
memory available
General health of all components
 Health of the virtual machines
 Status of the processes running on them
Database nodes and services
 Number of provisioned services against
capacity available
At the PaaS layer, that means:
7
 Deliver continuous
availability in the cloud
 Proactively solve
problems rather than
react to them
 Understand the behavior
of the system to
automate it
Why do we need metrics?
8
 NATS message bus
• Discover the components to interrogate
• Best for dynamically changing data
Where can we find them?
 Cloud Controller database (CCDB)
• Longer lived data that isn’t in the varz endpoints
9
2. Monitoring that data
10
1. Views of component health
2. Resource usage details
3. Ongoing growth trends
4. Access to logs and raw varz
5. Email notifications
Our metrics dashboard provides…
11
 Components nearing capacity or failure
 Already failed components
 Out of control apps and noisy users
 Active/inactive users and apps
 Growth trends and runtime/service adoption
It helps us find (and fix) problems
It helps us see patterns
12
User and app trends
There is also one unauthenticated page for high level stats
13
DEA list
14
DEA details
15
Service node list
16
Service node details
17
User list
18
User details
19
App list
20
App details
21
Log list
22
Log details
23
Email notifications
24
3. Finding and acting on better data
25
 NG provides granular user/org/space views…
• This enables better BSS potential in terms of QoS and departmental billing
 …But we lost user and app data linkages from the health manager
• Can’t see what DEA my app resides on (not currently enabled in our NG version)
• Can’t see how many apps a user has (replaced by orgs and spaces, but still
valuable to trace)
• See https://github.com/cloudfoundry/cloud_controller_ng/issues/81
 We’d like to restore that data, either surface it
• in varz endpoints (dynamic data, preferred) or
• CC_DB (static data, could be a security concern)
Let’s resolve gaps in data captured from NG
26
 Detect errors in applications that are traceable to users/orgs
• Preemptively reach out to them to see if they need help
• Think customer service and proactive support!
• Can we hook into to BOSH or Jenkins for automation?
 Automate (and expand links to the IaaS and SaaS stacks)
• Self healing systems (out of disk, move apps)
• Self scaling systems (detect when nearing thresholds)
• Evolving topologies (replace unused service nodes with popular ones)
Let’s begin to link metrics to automation
27
 Admins are the primary beneficiary right now
• But data is almost completely read only
• Should we provide UAA based tiers of access to admins?
 Others can and should benefit
• Customers
• End users
• Developers
• Management
• Executives, line of business owners
• Finance
Let’s expand the broadcast of metrics to more users
28
Thanks!
29
The metrics dashboard innovators
Chris Peters Russell Boykin
Doug Davis Wei Feng
30
We’re hiring!
Search Jobs at IBM by:
SmartCloud Application Services
31

Más contenido relacionado

La actualidad más candente

Towards Personalization in Global Digital Health
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital Health
Databricks
 
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
Splunk
 
RapidScale CloudMail
RapidScale CloudMailRapidScale CloudMail
RapidScale CloudMail
RapidScale
 

La actualidad más candente (20)

January 2015 Webinar - Wins and Successes from 2014
January 2015 Webinar -  Wins and Successes from 2014January 2015 Webinar -  Wins and Successes from 2014
January 2015 Webinar - Wins and Successes from 2014
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Towards Personalization in Global Digital Health
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital Health
 
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Affecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of EngagementAffecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of Engagement
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS)
 
RapidScale CloudMail
RapidScale CloudMailRapidScale CloudMail
RapidScale CloudMail
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
 
Event-driven architecture
Event-driven architectureEvent-driven architecture
Event-driven architecture
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive Development
 
SplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSASplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSA
 
SplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - Staples
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Dev ops toronto
Dev ops torontoDev ops toronto
Dev ops toronto
 
Conferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic ObservabilityConferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic Observability
 

Destacado

Best Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support MetricsBest Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support Metrics
dreamforce2006
 

Destacado (14)

Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
 
Best Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support MetricsBest Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support Metrics
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
 
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques WarrenWebinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
 
Regulatory Reporting Dashboard
Regulatory Reporting DashboardRegulatory Reporting Dashboard
Regulatory Reporting Dashboard
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a Metric
 
Stress management in hr
Stress management in hrStress management in hr
Stress management in hr
 
KPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HRKPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HR
 
Microservices with Spring and Cloud Foundry
Microservices with Spring and Cloud FoundryMicroservices with Spring and Cloud Foundry
Microservices with Spring and Cloud Foundry
 
The 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsThe 10 Most Important Banking Metrics
The 10 Most Important Banking Metrics
 
Project Metrics & Measures
Project Metrics & MeasuresProject Metrics & Measures
Project Metrics & Measures
 
Developing Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance IndicatorsDeveloping Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance Indicators
 
Learning Metrics: Building Your Training Scorecard
Learning Metrics: Building Your Training ScorecardLearning Metrics: Building Your Training Scorecard
Learning Metrics: Building Your Training Scorecard
 
KEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATORKEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATOR
 

Similar a The IBM dashboard for operational metrics

Similar a The IBM dashboard for operational metrics (20)

Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summit
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Whitepaper factors to consider commercial infrastructure management vendors
Whitepaper  factors to consider commercial infrastructure management vendorsWhitepaper  factors to consider commercial infrastructure management vendors
Whitepaper factors to consider commercial infrastructure management vendors
 
The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for Hadoop
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Why Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdfWhy Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdf
 
Big Data
Big DataBig Data
Big Data
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesIntroducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Más de Platform CF

The Platform for Building Great Software
The Platform for Building Great SoftwareThe Platform for Building Great Software
The Platform for Building Great Software
Platform CF
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
Platform CF
 
Continuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CIContinuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CI
Platform CF
 
The Journey to Cloud Foundry
The Journey to Cloud FoundryThe Journey to Cloud Foundry
The Journey to Cloud Foundry
Platform CF
 
Pivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry ServicePivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry Service
Platform CF
 
What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?
Platform CF
 
Cloud Foundry at VMware
Cloud Foundry at VMwareCloud Foundry at VMware
Cloud Foundry at VMware
Platform CF
 
Go Within Cloud Foundry
Go Within Cloud FoundryGo Within Cloud Foundry
Go Within Cloud Foundry
Platform CF
 
Continuous Delivery with Cloud Foundry
Continuous Delivery with Cloud FoundryContinuous Delivery with Cloud Foundry
Continuous Delivery with Cloud Foundry
Platform CF
 
From Zero To Factory
From Zero To FactoryFrom Zero To Factory
From Zero To Factory
Platform CF
 
Service Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud ElementsService Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud Elements
Platform CF
 
Cloud Foundry Marketplace Powered by AppDirect
Cloud Foundry MarketplacePowered by AppDirectCloud Foundry MarketplacePowered by AppDirect
Cloud Foundry Marketplace Powered by AppDirect
Platform CF
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
Platform CF
 
Multi-site Architecture Considerations
Multi-site Architecture ConsiderationsMulti-site Architecture Considerations
Multi-site Architecture Considerations
Platform CF
 
Cloud Foundry at NTT
Cloud Foundry at NTTCloud Foundry at NTT
Cloud Foundry at NTT
Platform CF
 
Building Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud ArchitectureBuilding Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud Architecture
Platform CF
 
Extending Cloud Foundry to .NET
Extending Cloud Foundry to .NETExtending Cloud Foundry to .NET
Extending Cloud Foundry to .NET
Platform CF
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at Rakuten
Platform CF
 

Más de Platform CF (19)

The Platform for Building Great Software
The Platform for Building Great SoftwareThe Platform for Building Great Software
The Platform for Building Great Software
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
 
Continuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CIContinuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CI
 
The Journey to Cloud Foundry
The Journey to Cloud FoundryThe Journey to Cloud Foundry
The Journey to Cloud Foundry
 
Pivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry ServicePivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry Service
 
What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?
 
Cloud Foundry at VMware
Cloud Foundry at VMwareCloud Foundry at VMware
Cloud Foundry at VMware
 
Go Within Cloud Foundry
Go Within Cloud FoundryGo Within Cloud Foundry
Go Within Cloud Foundry
 
Continuous Delivery with Cloud Foundry
Continuous Delivery with Cloud FoundryContinuous Delivery with Cloud Foundry
Continuous Delivery with Cloud Foundry
 
From Zero To Factory
From Zero To FactoryFrom Zero To Factory
From Zero To Factory
 
Service Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud ElementsService Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud Elements
 
Cloud Foundry Marketplace Powered by AppDirect
Cloud Foundry MarketplacePowered by AppDirectCloud Foundry MarketplacePowered by AppDirect
Cloud Foundry Marketplace Powered by AppDirect
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
 
Multi-site Architecture Considerations
Multi-site Architecture ConsiderationsMulti-site Architecture Considerations
Multi-site Architecture Considerations
 
Intro to MoPaaS
Intro to MoPaaSIntro to MoPaaS
Intro to MoPaaS
 
Cloud Foundry at NTT
Cloud Foundry at NTTCloud Foundry at NTT
Cloud Foundry at NTT
 
Building Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud ArchitectureBuilding Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud Architecture
 
Extending Cloud Foundry to .NET
Extending Cloud Foundry to .NETExtending Cloud Foundry to .NET
Extending Cloud Foundry to .NET
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at Rakuten
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

The IBM dashboard for operational metrics

  • 1. 1
  • 2. 2 Daniel Krook Senior Certified IT Specialist, IBM The IBM dashboard for operational metrics
  • 3. 3 We run Cloud Foundry on dozens of OpenStack VMs Two intranet clusters In the past year, we’ve learned how to Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps Not counting Dev deployments All on 50+ Nova Compute nodes • Keep Cloud Foundry running smoothly • Discover and prevent impending problems • Resolve unexpected issues quickly
  • 4. 4 1. Show the key data points we track 2. Show how our metrics dashboard helps us monitor that data 3. Share ideas on how to find better data in NG and beyond 4. Spark discussion on improved visibility for CF admins and customers. Goals of this lightning talk We are looking to get better at this, and help the community get better as well.
  • 6. 6 What are the important metrics? Data that can be tracked over time to see trends and behaviors Data that can help us predict problems before they happen DEAs and apps health  Memory reserved as a proportion of the memory available General health of all components  Health of the virtual machines  Status of the processes running on them Database nodes and services  Number of provisioned services against capacity available At the PaaS layer, that means:
  • 7. 7  Deliver continuous availability in the cloud  Proactively solve problems rather than react to them  Understand the behavior of the system to automate it Why do we need metrics?
  • 8. 8  NATS message bus • Discover the components to interrogate • Best for dynamically changing data Where can we find them?  Cloud Controller database (CCDB) • Longer lived data that isn’t in the varz endpoints
  • 10. 10 1. Views of component health 2. Resource usage details 3. Ongoing growth trends 4. Access to logs and raw varz 5. Email notifications Our metrics dashboard provides…
  • 11. 11  Components nearing capacity or failure  Already failed components  Out of control apps and noisy users  Active/inactive users and apps  Growth trends and runtime/service adoption It helps us find (and fix) problems It helps us see patterns
  • 12. 12 User and app trends There is also one unauthenticated page for high level stats
  • 24. 24 3. Finding and acting on better data
  • 25. 25  NG provides granular user/org/space views… • This enables better BSS potential in terms of QoS and departmental billing  …But we lost user and app data linkages from the health manager • Can’t see what DEA my app resides on (not currently enabled in our NG version) • Can’t see how many apps a user has (replaced by orgs and spaces, but still valuable to trace) • See https://github.com/cloudfoundry/cloud_controller_ng/issues/81  We’d like to restore that data, either surface it • in varz endpoints (dynamic data, preferred) or • CC_DB (static data, could be a security concern) Let’s resolve gaps in data captured from NG
  • 26. 26  Detect errors in applications that are traceable to users/orgs • Preemptively reach out to them to see if they need help • Think customer service and proactive support! • Can we hook into to BOSH or Jenkins for automation?  Automate (and expand links to the IaaS and SaaS stacks) • Self healing systems (out of disk, move apps) • Self scaling systems (detect when nearing thresholds) • Evolving topologies (replace unused service nodes with popular ones) Let’s begin to link metrics to automation
  • 27. 27  Admins are the primary beneficiary right now • But data is almost completely read only • Should we provide UAA based tiers of access to admins?  Others can and should benefit • Customers • End users • Developers • Management • Executives, line of business owners • Finance Let’s expand the broadcast of metrics to more users
  • 29. 29 The metrics dashboard innovators Chris Peters Russell Boykin Doug Davis Wei Feng
  • 30. 30 We’re hiring! Search Jobs at IBM by: SmartCloud Application Services
  • 31. 31