SlideShare a Scribd company logo
1 of 22
© 2013 Cloud Technology Partners, Inc. / Confidential
1
Cloud Technology Partners / April 2014 / www.cloudtp.com
Monitoring in the DevOps Era
© 2013 Cloud Technology Partners, Inc. / Confidential
2
About the Presenter
@madgreek65
mikekavis
madgreek65
VP/Principal Architect @ Cloud Technology Partners
Mike Kavis
The Virtualization Practice
madgreek65
DevOps.com
© 2013 Cloud Technology Partners, Inc. / Confidential
3
Topics of Discussion
1. Service Centric Ops
2. Logging Strategies
3. Monitoring Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
4
Service Centric Ops
© 2013 Cloud Technology Partners, Inc. / Confidential
5
What needs to Change?
Shift thinking away from product-centric to service-centric
Operating a Service 24x7x365Shipping Product
© 2013 Cloud Technology Partners, Inc. / Confidential
6
What needs to Change?
Traditional Challenge – Dev needs speed, Ops needs control
Speed
APIs
Security
Compliance
Availability
Auditing
The Great Balancing Act
© 2013 Cloud Technology Partners, Inc. / Confidential
7
What needs to Change?
Shift thinking away from product-centric to service-centric
Old Way New Way
Software is built and shipped Services are running and managed
Development of features are done Services are never done until they are turned off
Product owner focus only on features Product owner owns operational results along with
product feature set
Each silo owns their own area All groups focus on end user satisfaction
Dev must go through Ops to get work done Ops enables Dev to get work done
Ops monitors Apps Ops provides Dev with tools to operate Apps
Reactive monitoring/Ops Proactive monitoring/Ops
Dev, Ops, Security and Product owners must work together throughout the SDLC and have a shared
responsibility for the overall quality and reliability of the services
© 2013 Cloud Technology Partners, Inc. / Confidential
8
What needs to Change?
Whoever prioritizes the backlog must
be accountable for reliability and
quality, not just speed to market
Don’t be a crash test
dummy
Speed to market should
not negatively impact
customer satisfaction!
© 2013 Cloud Technology Partners, Inc. / Confidential
9
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
10
Top Log Use Cases
– Troubleshooting – debugging information and
error messages are collected for analyzing
what is occurring in the production
environment
– Security – tracking all user access, both
successful and unsuccessful access
attempts. Intrusion detection leverages this
information
– Auditing – providing a trail of data for auditors
is extremely important for audits. It is one
thing to have a process flow on paper, it is
another to show real data in the logs
– Monitoring – identifying
trends, anomalies, thresholds and other
variables proactively allow companies to
resolve issues before they become noticeable
and/or critical to the end users
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
11
Centralized Logs
– Pipe logs to Sysout and direct to log
services
– Consider SaaS solutions so logging
service does not go down with apps
(e.g. Splunk)
Best Practices
– Block all developer access to servers
– Direct developers to logging app
instead
– Standard log message codes and
severity codes
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
12
Without standards, Logs are “Garbage in, Garbage out”
– Things to consider
• Logs need to be easy to search
• Logs must be easy to use or people won’t use them
• External consumers of APIs expect standards
– Standard codes
• HTTP Status codes (200, 404, 503, etc.)
• RFC 5424 Severity Levels
– Standard Message Formats
• Settle on a standard format
• Build an API
Logging Strategies
Source Wikipedia http://en.wikipedia.org/wiki/Syslog#Severity_levels
© 2013 Cloud Technology Partners, Inc. / Confidential
13
Best Practices
– Log Everything, Monitor Everything
• Infrastructure logs
• App Stack Logs (OS, app server, database, programming language)
• API logs
• Application logs
• Security logs
• Events, notifications, alerts
• Changes, config mgmt., deployment
• Access
• Patching history, machine images
What to collect
© 2013 Cloud Technology Partners, Inc. / Confidential
14
Common Logging Solutions
Open Source Commercial
© 2013 Cloud Technology Partners, Inc. / Confidential
15
Monitoring Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
16
Nagios is not a Monitoring Strategy
Blind spots can kill you
© 2013 Cloud Technology Partners, Inc. / Confidential
17
What needs to be Monitored?
Data Category Description
Performance Page loads, query times, response times, upload/download speeds,
etc.
Capacity Disk space, memory, CPU, bandwidth, etc.
Uptime Availability (e.g.. Four 9’s)
Throughput Every layer (web, cache, database, network, app stack, etc.)
SLAs Availability, reliability, security, etc.
KPIs Examples: Revenue per minute, Avg concurrent users, etc.
User Metrics Registrations, page views, bounce rates, click rates, etc.
Governance/Compliance Access, permissions, intrusion detection, intrusion prevention, cost
containment, etc.
Log file analysis Predictive analytics, pattern recognition, etc.
© 2013 Cloud Technology Partners, Inc. / Confidential
18
End to end Monitoring is Required
There is no ONE tool that does it all
Application
Presentation
Session
Transport
Network
Data Link
Physical
Infrastructure
Monitoring
User Metrics, KPIs
Web, Browser Metrics
Sessions, Transactions
App Svr, Database, Cache
Packets, Access, Data
Transfer
Bandwidth, Trace
routes, Requests
CPU, Memory, Disk
© 2013 Cloud Technology Partners, Inc. / Confidential
19
Who needs Monitoring/Logging Data?
Actor Purpose
Product Manager Owns Features, reliability, and quality of product
Developers Trace transactions, understand performance/bottlenecks,
troubleshoot issues
Testers Performance and regression testing, requirements
traceability for the “ilities”
Operations Support infrastructure
NOC and Help Desk First level support and customer support
Business Stakeholders Manage key business metrics, understand user behavior,
forecasting, profitability
Deployment team Validate deployment, ensure no negative impact of
deployments
Security team Enforcement of policies, intrusion detection & prevention
Compliance team SLA Management, auditing, customer requests for
information
Customers/Users Account information, real time billing, application specific
metrics
© 2013 Cloud Technology Partners, Inc. / Confidential
20
Synthesized Production Data and Monitoring
Production data that is artificially created to
simulate real users within a system in order to
test and monitor system
features, performance, reliability, and/or
scalability
What is Synthetic Data?
Example Use Cases:
1. Test customer in a live production environment
2. Test user ID in a live production account
3. Netflix’s Simian Army (Purposely creating failures to
test resiliency)
© 2013 Cloud Technology Partners, Inc. / Confidential
21
Think ahead: Create strategies for logging & monitoring
– Log and monitor everything
– Create standards to prevent “Garbage in Garbage out” in your logs
– Put both reactive and proactive monitors in place
– Know what your baseline metrics are and raise alerts when they change
– Be prepared before auditors walk in the door
– Make sure everyone is accountable for reliability and quality
Summary
© 2013 Cloud Technology Partners, Inc. / Confidential
22
Thank you for your time and interest.

More Related Content

What's hot

Learn how Intuit created an application-aware network performance platform
Learn how Intuit created an application-aware network performance platformLearn how Intuit created an application-aware network performance platform
Learn how Intuit created an application-aware network performance platformRiverbed Technology
 
Suffering from “Franken” Monitoring?
Suffering from “Franken” Monitoring?Suffering from “Franken” Monitoring?
Suffering from “Franken” Monitoring?Riverbed Technology
 
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...AppDynamics
 
Metrics That Matter: How to Measure Digital Transformation Success
Metrics That Matter: How to Measure Digital Transformation SuccessMetrics That Matter: How to Measure Digital Transformation Success
Metrics That Matter: How to Measure Digital Transformation SuccessXebiaLabs
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureIan Downard
 
AppSphere 15 - Deep Dive into AppDynamics Application Analytics
AppSphere 15 - Deep Dive into AppDynamics Application AnalyticsAppSphere 15 - Deep Dive into AppDynamics Application Analytics
AppSphere 15 - Deep Dive into AppDynamics Application AnalyticsAppDynamics
 
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16Database Visibility and Troubleshooting Hands-on Lab - AppSphere16
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16AppDynamics
 
Exposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsExposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsRiverbed Technology
 
Building & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeBuilding & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeMeryemElMorabit
 
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)Predix
 
Business Transactions with AppDynamics
Business Transactions with AppDynamicsBusiness Transactions with AppDynamics
Business Transactions with AppDynamicsAppDynamics
 
Integrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and HowIntegrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and HowDevOps.com
 
Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16AppDynamics
 
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...AppDynamics
 
Infrastructure as Code in Large Scale Organizations
Infrastructure as Code in Large Scale OrganizationsInfrastructure as Code in Large Scale Organizations
Infrastructure as Code in Large Scale OrganizationsXebiaLabs
 
Take Control of Application Performance
Take Control of Application PerformanceTake Control of Application Performance
Take Control of Application PerformanceRiverbed Technology
 
Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Riverbed Technology
 
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...AppDynamics
 
E5: Predix Security with ACS & UAA (Predix Transform 2016)
E5: Predix Security with ACS & UAA (Predix Transform 2016)E5: Predix Security with ACS & UAA (Predix Transform 2016)
E5: Predix Security with ACS & UAA (Predix Transform 2016)Predix
 
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)Hewlett Packard Enterprise (HPE) Service Virtualization (SV)
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)Jeffrey Nunn
 

What's hot (20)

Learn how Intuit created an application-aware network performance platform
Learn how Intuit created an application-aware network performance platformLearn how Intuit created an application-aware network performance platform
Learn how Intuit created an application-aware network performance platform
 
Suffering from “Franken” Monitoring?
Suffering from “Franken” Monitoring?Suffering from “Franken” Monitoring?
Suffering from “Franken” Monitoring?
 
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...
How the World Bank Standardized on AppDynamics as its Enterprise-Wide APM Sol...
 
Metrics That Matter: How to Measure Digital Transformation Success
Metrics That Matter: How to Measure Digital Transformation SuccessMetrics That Matter: How to Measure Digital Transformation Success
Metrics That Matter: How to Measure Digital Transformation Success
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in Azure
 
AppSphere 15 - Deep Dive into AppDynamics Application Analytics
AppSphere 15 - Deep Dive into AppDynamics Application AnalyticsAppSphere 15 - Deep Dive into AppDynamics Application Analytics
AppSphere 15 - Deep Dive into AppDynamics Application Analytics
 
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16Database Visibility and Troubleshooting Hands-on Lab - AppSphere16
Database Visibility and Troubleshooting Hands-on Lab - AppSphere16
 
Exposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsExposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance Problems
 
Building & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeBuilding & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscape
 
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)
D02: Performance Engineering and Testing of Predix Apps (Predix Transform 2016)
 
Business Transactions with AppDynamics
Business Transactions with AppDynamicsBusiness Transactions with AppDynamics
Business Transactions with AppDynamics
 
Integrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and HowIntegrating SAP into DevOps Pipelines: Why and How
Integrating SAP into DevOps Pipelines: Why and How
 
Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16
 
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...
PayU's Digital Transformation: Transparency from Dev to Prod, Monitoring Micr...
 
Infrastructure as Code in Large Scale Organizations
Infrastructure as Code in Large Scale OrganizationsInfrastructure as Code in Large Scale Organizations
Infrastructure as Code in Large Scale Organizations
 
Take Control of Application Performance
Take Control of Application PerformanceTake Control of Application Performance
Take Control of Application Performance
 
Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...
 
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
How Q2 eBanking Maximizes Customer Experience for a Hyper-Growth SaaS Platfor...
 
E5: Predix Security with ACS & UAA (Predix Transform 2016)
E5: Predix Security with ACS & UAA (Predix Transform 2016)E5: Predix Security with ACS & UAA (Predix Transform 2016)
E5: Predix Security with ACS & UAA (Predix Transform 2016)
 
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)Hewlett Packard Enterprise (HPE) Service Virtualization (SV)
Hewlett Packard Enterprise (HPE) Service Virtualization (SV)
 

Viewers also liked

The devops approach to monitoring, Open Source and Infrastructure as Code Style
The devops approach to monitoring, Open Source and Infrastructure as Code StyleThe devops approach to monitoring, Open Source and Infrastructure as Code Style
The devops approach to monitoring, Open Source and Infrastructure as Code StyleJulien Pivotto
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesShiva Narayanaswamy
 
Proactive monitoring tools or services - Open Source
Proactive monitoring tools or services - Open Source Proactive monitoring tools or services - Open Source
Proactive monitoring tools or services - Open Source B.A.
 
Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)Andy Sykes
 
No you are not a DevOps engineer
No you are not a DevOps engineerNo you are not a DevOps engineer
No you are not a DevOps engineerMike Kavis
 
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...AppDynamics
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & ServicesCA Technologies
 
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaSanjay Willie
 
Event Management and Monitoring Strategy
Event Management and Monitoring StrategyEvent Management and Monitoring Strategy
Event Management and Monitoring StrategyJames Gingras
 
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016TeamQuest Corporation
 
AWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro TipsAWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro TipsShiva Narayanaswamy
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
IT Infrastructure Monitoring Strategies in Healthcare
IT Infrastructure Monitoring Strategies in HealthcareIT Infrastructure Monitoring Strategies in Healthcare
IT Infrastructure Monitoring Strategies in HealthcareCA Technologies
 
Managing Databases In A DevOps Environment
Managing Databases In A DevOps EnvironmentManaging Databases In A DevOps Environment
Managing Databases In A DevOps EnvironmentRobert Treat
 
Strategic monitoring-system
Strategic monitoring-systemStrategic monitoring-system
Strategic monitoring-systemAnita Sharma
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreSanjay Willie
 

Viewers also liked (20)

The devops approach to monitoring, Open Source and Infrastructure as Code Style
The devops approach to monitoring, Open Source and Infrastructure as Code StyleThe devops approach to monitoring, Open Source and Infrastructure as Code Style
The devops approach to monitoring, Open Source and Infrastructure as Code Style
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 
Proactive monitoring tools or services - Open Source
Proactive monitoring tools or services - Open Source Proactive monitoring tools or services - Open Source
Proactive monitoring tools or services - Open Source
 
Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)
 
Dev ops Monitoring
Dev ops   MonitoringDev ops   Monitoring
Dev ops Monitoring
 
No you are not a DevOps engineer
No you are not a DevOps engineerNo you are not a DevOps engineer
No you are not a DevOps engineer
 
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...
IoT in the Enterprise: Why Your Monitoring Strategy Should Include Connected ...
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
 
Monitoring as a Service
Monitoring as a ServiceMonitoring as a Service
Monitoring as a Service
 
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
 
Event Management and Monitoring Strategy
Event Management and Monitoring StrategyEvent Management and Monitoring Strategy
Event Management and Monitoring Strategy
 
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016
Vendor Selection Matrix - Capacity Management - Top 15 Vendors in 2016
 
AWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro TipsAWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro Tips
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
IT Infrastructure Monitoring Strategies in Healthcare
IT Infrastructure Monitoring Strategies in HealthcareIT Infrastructure Monitoring Strategies in Healthcare
IT Infrastructure Monitoring Strategies in Healthcare
 
Managing Databases In A DevOps Environment
Managing Databases In A DevOps EnvironmentManaging Databases In A DevOps Environment
Managing Databases In A DevOps Environment
 
Strategic monitoring-system
Strategic monitoring-systemStrategic monitoring-system
Strategic monitoring-system
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios Core
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 

Similar to Monitoring in the DevOps Era

Mike Siegler at INCOSE Minneapolis, 2014
Mike Siegler at INCOSE Minneapolis, 2014Mike Siegler at INCOSE Minneapolis, 2014
Mike Siegler at INCOSE Minneapolis, 2014Etherios
 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013David Linthicum
 
VMworld Europe 2014: Preview the Latest Release from AirWatch
VMworld Europe 2014: Preview the Latest Release from AirWatchVMworld Europe 2014: Preview the Latest Release from AirWatch
VMworld Europe 2014: Preview the Latest Release from AirWatchVMworld
 
SoftWatch Overview_short (1)
SoftWatch Overview_short (1)SoftWatch Overview_short (1)
SoftWatch Overview_short (1)Moshe Kozlovski
 
SoftWatch Overview_short (1)
SoftWatch Overview_short (1)SoftWatch Overview_short (1)
SoftWatch Overview_short (1)Dror Leshem
 
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...CA API Management
 
Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013David Linthicum
 
MuleSoft Singapore Meetup - Number 6 - September 24, 2020
MuleSoft Singapore Meetup - Number 6 - September 24, 2020MuleSoft Singapore Meetup - Number 6 - September 24, 2020
MuleSoft Singapore Meetup - Number 6 - September 24, 2020Julian Douch
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Value of Enterprise DevOps
Value of Enterprise DevOpsValue of Enterprise DevOps
Value of Enterprise DevOpsMike Kavis
 
4 Security Guidelines for SharePoint Governance
4 Security Guidelines for SharePoint Governance4 Security Guidelines for SharePoint Governance
4 Security Guidelines for SharePoint GovernanceImperva
 
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...Indus Khaitan
 
Performance Testing
Performance TestingPerformance Testing
Performance TestingvodQA
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp PresentationProlifics
 
IoT Cloud Service & Partner IoT Solution
IoT Cloud Service & Partner IoT Solution IoT Cloud Service & Partner IoT Solution
IoT Cloud Service & Partner IoT Solution harishgaur
 
Advanced Controls access and user security for superusers con8824
Advanced Controls access and user security for superusers con8824Advanced Controls access and user security for superusers con8824
Advanced Controls access and user security for superusers con8824Oracle
 
Privileged Access Management (PAM)
Privileged Access Management (PAM)Privileged Access Management (PAM)
Privileged Access Management (PAM)danb02
 
IGI - Solution presentation-DP
IGI - Solution presentation-DPIGI - Solution presentation-DP
IGI - Solution presentation-DPNeetu Gupta
 
Cloud service api design rules presentation
Cloud service api design rules presentationCloud service api design rules presentation
Cloud service api design rules presentationesebeus
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpseG Innovations
 

Similar to Monitoring in the DevOps Era (20)

Mike Siegler at INCOSE Minneapolis, 2014
Mike Siegler at INCOSE Minneapolis, 2014Mike Siegler at INCOSE Minneapolis, 2014
Mike Siegler at INCOSE Minneapolis, 2014
 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013
 
VMworld Europe 2014: Preview the Latest Release from AirWatch
VMworld Europe 2014: Preview the Latest Release from AirWatchVMworld Europe 2014: Preview the Latest Release from AirWatch
VMworld Europe 2014: Preview the Latest Release from AirWatch
 
SoftWatch Overview_short (1)
SoftWatch Overview_short (1)SoftWatch Overview_short (1)
SoftWatch Overview_short (1)
 
SoftWatch Overview_short (1)
SoftWatch Overview_short (1)SoftWatch Overview_short (1)
SoftWatch Overview_short (1)
 
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...
API Roles In Cloud and Mobile Security - Greg Olsen, IT Manager, Integration ...
 
Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013
 
MuleSoft Singapore Meetup - Number 6 - September 24, 2020
MuleSoft Singapore Meetup - Number 6 - September 24, 2020MuleSoft Singapore Meetup - Number 6 - September 24, 2020
MuleSoft Singapore Meetup - Number 6 - September 24, 2020
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Value of Enterprise DevOps
Value of Enterprise DevOpsValue of Enterprise DevOps
Value of Enterprise DevOps
 
4 Security Guidelines for SharePoint Governance
4 Security Guidelines for SharePoint Governance4 Security Guidelines for SharePoint Governance
4 Security Guidelines for SharePoint Governance
 
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...
Oracle OpenWorld | CON9707 Enterprise Mobile Security Architecture beyond the...
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp Presentation
 
IoT Cloud Service & Partner IoT Solution
IoT Cloud Service & Partner IoT Solution IoT Cloud Service & Partner IoT Solution
IoT Cloud Service & Partner IoT Solution
 
Advanced Controls access and user security for superusers con8824
Advanced Controls access and user security for superusers con8824Advanced Controls access and user security for superusers con8824
Advanced Controls access and user security for superusers con8824
 
Privileged Access Management (PAM)
Privileged Access Management (PAM)Privileged Access Management (PAM)
Privileged Access Management (PAM)
 
IGI - Solution presentation-DP
IGI - Solution presentation-DPIGI - Solution presentation-DP
IGI - Solution presentation-DP
 
Cloud service api design rules presentation
Cloud service api design rules presentationCloud service api design rules presentation
Cloud service api design rules presentation
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
 

More from Mike Kavis

Accelerate your dev ops transformation with continuous automation
Accelerate your dev ops transformation with continuous automationAccelerate your dev ops transformation with continuous automation
Accelerate your dev ops transformation with continuous automationMike Kavis
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container LandMike Kavis
 
Extreme IoT Games
Extreme IoT GamesExtreme IoT Games
Extreme IoT GamesMike Kavis
 
No you are not a DevOps engineer (revisted)
No you are not a DevOps engineer (revisted)No you are not a DevOps engineer (revisted)
No you are not a DevOps engineer (revisted)Mike Kavis
 
Choosing the Right Clouds for your Business
Choosing the Right Clouds for your BusinessChoosing the Right Clouds for your Business
Choosing the Right Clouds for your BusinessMike Kavis
 
The State of IoT
The State of IoTThe State of IoT
The State of IoTMike Kavis
 
The Many Faces of PaaS
The Many Faces of PaaSThe Many Faces of PaaS
The Many Faces of PaaSMike Kavis
 
Outgrowing your-datacenter
Outgrowing your-datacenterOutgrowing your-datacenter
Outgrowing your-datacenterMike Kavis
 
Cloud security design considerations
Cloud security design considerationsCloud security design considerations
Cloud security design considerationsMike Kavis
 
Cloud Computing Design Considerations
Cloud Computing Design ConsiderationsCloud Computing Design Considerations
Cloud Computing Design ConsiderationsMike Kavis
 

More from Mike Kavis (11)

Accelerate your dev ops transformation with continuous automation
Accelerate your dev ops transformation with continuous automationAccelerate your dev ops transformation with continuous automation
Accelerate your dev ops transformation with continuous automation
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container Land
 
Extreme IoT Games
Extreme IoT GamesExtreme IoT Games
Extreme IoT Games
 
No you are not a DevOps engineer (revisted)
No you are not a DevOps engineer (revisted)No you are not a DevOps engineer (revisted)
No you are not a DevOps engineer (revisted)
 
Choosing the Right Clouds for your Business
Choosing the Right Clouds for your BusinessChoosing the Right Clouds for your Business
Choosing the Right Clouds for your Business
 
The State of IoT
The State of IoTThe State of IoT
The State of IoT
 
The Many Faces of PaaS
The Many Faces of PaaSThe Many Faces of PaaS
The Many Faces of PaaS
 
Outgrowing your-datacenter
Outgrowing your-datacenterOutgrowing your-datacenter
Outgrowing your-datacenter
 
Cloud 101
Cloud 101Cloud 101
Cloud 101
 
Cloud security design considerations
Cloud security design considerationsCloud security design considerations
Cloud security design considerations
 
Cloud Computing Design Considerations
Cloud Computing Design ConsiderationsCloud Computing Design Considerations
Cloud Computing Design Considerations
 

Monitoring in the DevOps Era

  • 1. © 2013 Cloud Technology Partners, Inc. / Confidential 1 Cloud Technology Partners / April 2014 / www.cloudtp.com Monitoring in the DevOps Era
  • 2. © 2013 Cloud Technology Partners, Inc. / Confidential 2 About the Presenter @madgreek65 mikekavis madgreek65 VP/Principal Architect @ Cloud Technology Partners Mike Kavis The Virtualization Practice madgreek65 DevOps.com
  • 3. © 2013 Cloud Technology Partners, Inc. / Confidential 3 Topics of Discussion 1. Service Centric Ops 2. Logging Strategies 3. Monitoring Strategies
  • 4. © 2013 Cloud Technology Partners, Inc. / Confidential 4 Service Centric Ops
  • 5. © 2013 Cloud Technology Partners, Inc. / Confidential 5 What needs to Change? Shift thinking away from product-centric to service-centric Operating a Service 24x7x365Shipping Product
  • 6. © 2013 Cloud Technology Partners, Inc. / Confidential 6 What needs to Change? Traditional Challenge – Dev needs speed, Ops needs control Speed APIs Security Compliance Availability Auditing The Great Balancing Act
  • 7. © 2013 Cloud Technology Partners, Inc. / Confidential 7 What needs to Change? Shift thinking away from product-centric to service-centric Old Way New Way Software is built and shipped Services are running and managed Development of features are done Services are never done until they are turned off Product owner focus only on features Product owner owns operational results along with product feature set Each silo owns their own area All groups focus on end user satisfaction Dev must go through Ops to get work done Ops enables Dev to get work done Ops monitors Apps Ops provides Dev with tools to operate Apps Reactive monitoring/Ops Proactive monitoring/Ops Dev, Ops, Security and Product owners must work together throughout the SDLC and have a shared responsibility for the overall quality and reliability of the services
  • 8. © 2013 Cloud Technology Partners, Inc. / Confidential 8 What needs to Change? Whoever prioritizes the backlog must be accountable for reliability and quality, not just speed to market Don’t be a crash test dummy Speed to market should not negatively impact customer satisfaction!
  • 9. © 2013 Cloud Technology Partners, Inc. / Confidential 9 Logging Strategies
  • 10. © 2013 Cloud Technology Partners, Inc. / Confidential 10 Top Log Use Cases – Troubleshooting – debugging information and error messages are collected for analyzing what is occurring in the production environment – Security – tracking all user access, both successful and unsuccessful access attempts. Intrusion detection leverages this information – Auditing – providing a trail of data for auditors is extremely important for audits. It is one thing to have a process flow on paper, it is another to show real data in the logs – Monitoring – identifying trends, anomalies, thresholds and other variables proactively allow companies to resolve issues before they become noticeable and/or critical to the end users Logging Strategies
  • 11. © 2013 Cloud Technology Partners, Inc. / Confidential 11 Centralized Logs – Pipe logs to Sysout and direct to log services – Consider SaaS solutions so logging service does not go down with apps (e.g. Splunk) Best Practices – Block all developer access to servers – Direct developers to logging app instead – Standard log message codes and severity codes Logging Strategies
  • 12. © 2013 Cloud Technology Partners, Inc. / Confidential 12 Without standards, Logs are “Garbage in, Garbage out” – Things to consider • Logs need to be easy to search • Logs must be easy to use or people won’t use them • External consumers of APIs expect standards – Standard codes • HTTP Status codes (200, 404, 503, etc.) • RFC 5424 Severity Levels – Standard Message Formats • Settle on a standard format • Build an API Logging Strategies Source Wikipedia http://en.wikipedia.org/wiki/Syslog#Severity_levels
  • 13. © 2013 Cloud Technology Partners, Inc. / Confidential 13 Best Practices – Log Everything, Monitor Everything • Infrastructure logs • App Stack Logs (OS, app server, database, programming language) • API logs • Application logs • Security logs • Events, notifications, alerts • Changes, config mgmt., deployment • Access • Patching history, machine images What to collect
  • 14. © 2013 Cloud Technology Partners, Inc. / Confidential 14 Common Logging Solutions Open Source Commercial
  • 15. © 2013 Cloud Technology Partners, Inc. / Confidential 15 Monitoring Strategies
  • 16. © 2013 Cloud Technology Partners, Inc. / Confidential 16 Nagios is not a Monitoring Strategy Blind spots can kill you
  • 17. © 2013 Cloud Technology Partners, Inc. / Confidential 17 What needs to be Monitored? Data Category Description Performance Page loads, query times, response times, upload/download speeds, etc. Capacity Disk space, memory, CPU, bandwidth, etc. Uptime Availability (e.g.. Four 9’s) Throughput Every layer (web, cache, database, network, app stack, etc.) SLAs Availability, reliability, security, etc. KPIs Examples: Revenue per minute, Avg concurrent users, etc. User Metrics Registrations, page views, bounce rates, click rates, etc. Governance/Compliance Access, permissions, intrusion detection, intrusion prevention, cost containment, etc. Log file analysis Predictive analytics, pattern recognition, etc.
  • 18. © 2013 Cloud Technology Partners, Inc. / Confidential 18 End to end Monitoring is Required There is no ONE tool that does it all Application Presentation Session Transport Network Data Link Physical Infrastructure Monitoring User Metrics, KPIs Web, Browser Metrics Sessions, Transactions App Svr, Database, Cache Packets, Access, Data Transfer Bandwidth, Trace routes, Requests CPU, Memory, Disk
  • 19. © 2013 Cloud Technology Partners, Inc. / Confidential 19 Who needs Monitoring/Logging Data? Actor Purpose Product Manager Owns Features, reliability, and quality of product Developers Trace transactions, understand performance/bottlenecks, troubleshoot issues Testers Performance and regression testing, requirements traceability for the “ilities” Operations Support infrastructure NOC and Help Desk First level support and customer support Business Stakeholders Manage key business metrics, understand user behavior, forecasting, profitability Deployment team Validate deployment, ensure no negative impact of deployments Security team Enforcement of policies, intrusion detection & prevention Compliance team SLA Management, auditing, customer requests for information Customers/Users Account information, real time billing, application specific metrics
  • 20. © 2013 Cloud Technology Partners, Inc. / Confidential 20 Synthesized Production Data and Monitoring Production data that is artificially created to simulate real users within a system in order to test and monitor system features, performance, reliability, and/or scalability What is Synthetic Data? Example Use Cases: 1. Test customer in a live production environment 2. Test user ID in a live production account 3. Netflix’s Simian Army (Purposely creating failures to test resiliency)
  • 21. © 2013 Cloud Technology Partners, Inc. / Confidential 21 Think ahead: Create strategies for logging & monitoring – Log and monitor everything – Create standards to prevent “Garbage in Garbage out” in your logs – Put both reactive and proactive monitors in place – Know what your baseline metrics are and raise alerts when they change – Be prepared before auditors walk in the door – Make sure everyone is accountable for reliability and quality Summary
  • 22. © 2013 Cloud Technology Partners, Inc. / Confidential 22 Thank you for your time and interest.