SlideShare una empresa de Scribd logo
1 de 18
Delivering Transformation. Together.
SITE RELIABILITY ENGINEERING & AIOPS
Murugan Muthayan
Agility Day – Noida – Aug 2019
AGILE, DEVOPS AND SRE..
2
Agile Development
• Transformed the way software being built
• Collaboration & quicker feedback loop
• Better control, early value
DevOps
• Cultural transformation focused on delivery speed
• Enable automation wherever possible
• Make development and operation process frictionless
Site Reliability Engineering
Focus to improve the reliability of software in production by implementing the best practices in engineering and operations
Tesco Transport Systems Adjustment3
SITE RELIABILITY ENGINEERING
SRE incorporates Engineering, Infrastructure and
Operation aspects to create scalable and reliable
software systems that are highly automatic and
self-healing.
SRE aims at DevOps to NoOps - “what happens
when a software engineer is tasked with what
used to be called operations.” - Ben Treynor,
Founder of Google SRE
The purpose of SRE is to achieve reliability by
implementing the best practices in engineering
and operations.
SRE can be thought of as an extreme
implementation of DevOps.
FOCUS AREA FOR SRE
4
MONITORING
(Performance
Metrics)
ALERTING
(Immediate
notifications)
INFRASTRUCTURE
(Scalability / Limitations)
ENGINEERING
(Application Design)
DEBUGGING
(Log files, code
analysis)
SECURITY
(Vulnerabilities)
BEST PRACTICES
(Documentation &
Training)
SITE RELIABILITY ENGINEER
5
The ideal site reliability engineer is either a software engineer with a good administration
background or a highly skilled system administrator with knowledge of coding and automation –
“Part systems administrator, part second tier support and part developer”
50% cap on the aggregate "ops" work for all SREs. SRE team must spend the remaining 50% of its
time actually doing development activities
An SRE team is responsible for,
• availability,
• latency,
• performance,
• efficiency,
• change management,
• monitoring,
• emergency response,
• capacity planning
6
SRE – SKILLS CHECKLIST
SRE - METRICS & MEASUREMENTS
7
Service Level Indicators that measures failures per request by calculating request latencySLI
Service Level Objectives that sets goals for System availability, performance, success ratesSLO
Service level agreements that are driven from SLO and dictate commercial penaltiesSLA
It is a measure of risk and the amount of headroom you have above the SLAError Budget
Mean time to repair is average time required to repair a failureMTTR
Predicted elapsed time between inherent failures of a system during operationMTTF
TAKE AWAY..
8
..and AIOps takes a further step from SRE towards automating IT operations using
advanced analytics !!!
COGNITIVE LEARNING – INTELLIGENT OPERATIONS (AIOps)
9
Insight Predict
Big Data Machine
Learning
Definition - What Does AIOps Mean?
10
AIOps is a methodology that is on the frontier of enterprise IT
operations. AIOps automates various aspects of IT and utilizes
the power of artificial intelligence to create self-learning
programs that help revolutionize IT services.
It is the application of advanced analytics—in the form of
machine learning (ML) and artificial intelligence (AI), towards
automating operations so that your IT Ops team can move at
the speed that your business expects today.
AIOps refers to multi-layered technology platforms that automate
and enhance IT operations by 1) using analytics and machine
learning to analyze big data collected from various IT operations
tools and devices, in order to 2) automatically spot and react to
issues in real time.
What Will Tomorrow Look Like ?
11
….Function Follows Need
Distributed Computing
Software Defined
Everything
Monitoring
Platforms
ISV Platforms
Patchwork, Open source,
Departmental
Source Events
Custom/Standard/Fixed
~ 100 – 1000 eps
Chaotic, Unstructured
~ 1000 – 100,000 eps
Configuration
Flexible
TBC ~ hours
Chaotic
TBC < 1 second
Infrastructure
Multi vendor
UNIX/IP/Windows client
server
Virtualised/Containers
Fluid/UNIX/Mobile/Micro
Digital
Transformation
Demands DevOps &
elastic
2010 2020
Current and Future Demands
12
Scale
• 105+ Moving Parts
• 106+ Notifications
• 109+ Data Points
• 1012 -> 10120+ Possible Failure Modes
+ Bounded by the estimated information content of the
universe !
Compulsion of Change
Complexity
Reduction in the Unit of compute
Mainframe → Server → VM → Container
Multiple Orders of Magnitude
Increase in Change Cycle
Fully fluid CI/CD Cycle
Traditional IT Ops caught Flat - Footed
13
Overwhelmed by DATA and a lack of INFORMATION
Siloed
teams and
tools
Too
many
alerts
No context
when an
incident
occurs
No
early
warning
DevOps
lacks
proactive
assurance
75-80%
~ 90%
> 45%
> 73%
Many Siloed
War room
IT Ops Priorities Driven by Digital Transformation
14
INCREASE frequency of change, stability and availability of IT services1
REDUCE resource operations workload and INCREASE productivity2
CONSOLDATE tools3
MIGRATE to the cloud4
SUPPORT software-defined services5
SUPPORT microservices based software architecture6
AIOps Agile and Proactive Event-to-Resolution Workflow
15
Early Detection, fewer tickets, reduced MTTR
Industrialised data
ingestion from
multiple sources
Automatically resolves
signals from alert noise
Proactively and
automatically detects
incidents and probable root
causes (reduced MTTD)
Enables collaborative
workflows (reduces
adverse business
impact)
Triggers automation
to restore services
Predictive insights
(reduced support
escalations and
MTTR)
How AIOps makes ITOps Robutst ?
16
• Determine the service health of
mission-critical services or
applications.
• Gain control and visibility to
spiraling consumption of cloud
resources.
• Accelerate MTTR with automated
incident management and real-
time configuration management
database (CMDB) updates.
• Build context-rich data lakes
integrating disparate, third-party
data sources.
AIOps makes Teams Faster, Smarter, and More
Productive
17
Level 0/NOC Operators
• Improve efficiency by consolidating related alerts together
• Reduce catch-n-dispatch activities
Support SMEs & Developers
• Pass incident resolution knowledge to lower support tiers
• Collaborate across complex multi-disciplinary incidents
IT Operations Managers
• Delivery service-level state monitoring
• Improve efficiency and job satisfaction
• Identify and address repeating mundane work with run book automation
• Investigate and problem-solve for frequently repeating P3-P5 incidents
IT Senior Management
• Achieve overall per-alert efforts reduction
• Re -purpose the savings towards business’s bottom line
THANK YOU
18
Any Questions ?

Más contenido relacionado

La actualidad más candente

Stream 3 - VMware Sponsor Presentation
Stream 3 - VMware Sponsor PresentationStream 3 - VMware Sponsor Presentation
Stream 3 - VMware Sponsor Presentation
IBM Business Insight
 

La actualidad más candente (20)

AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
 
AIOps: Your DevOps Co-Pilot
AIOps: Your DevOps Co-PilotAIOps: Your DevOps Co-Pilot
AIOps: Your DevOps Co-Pilot
 
What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
Unifying IT with Outcome-Aware AIOps
Unifying IT with Outcome-Aware AIOps  Unifying IT with Outcome-Aware AIOps
Unifying IT with Outcome-Aware AIOps
 
No Ops? Or Yes, Ops! The Future of Operations in a DevOps World
No Ops? Or Yes, Ops! The Future of Operations in a DevOps WorldNo Ops? Or Yes, Ops! The Future of Operations in a DevOps World
No Ops? Or Yes, Ops! The Future of Operations in a DevOps World
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOps
 
Bringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and ManagementBringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and Management
 
2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and Insights2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and Insights
 
The future of AIOps
The future of AIOpsThe future of AIOps
The future of AIOps
 
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learningSplunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learning
 
AIOps-Driven Network Performance Management: The First Step Toward Self-Heali...
AIOps-Driven Network Performance Management: The First Step Toward Self-Heali...AIOps-Driven Network Performance Management: The First Step Toward Self-Heali...
AIOps-Driven Network Performance Management: The First Step Toward Self-Heali...
 
Context Is Critical for IT Operations - How Rich Data Yields Richer Results
Context Is Critical for IT Operations - How Rich Data Yields Richer Results Context Is Critical for IT Operations - How Rich Data Yields Richer Results
Context Is Critical for IT Operations - How Rich Data Yields Richer Results
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenMeetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
 
AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019
AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019
AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019
 
The 6 Steps to Becoming a Top-Performing Organization in Managing IT Operations
The 6 Steps to Becoming a Top-Performing Organization in Managing IT OperationsThe 6 Steps to Becoming a Top-Performing Organization in Managing IT Operations
The 6 Steps to Becoming a Top-Performing Organization in Managing IT Operations
 
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningAIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
 
Stream 3 - VMware Sponsor Presentation
Stream 3 - VMware Sponsor PresentationStream 3 - VMware Sponsor Presentation
Stream 3 - VMware Sponsor Presentation
 
AIOps Is How We Will Survive DevOps
AIOps Is How We Will Survive DevOpsAIOps Is How We Will Survive DevOps
AIOps Is How We Will Survive DevOps
 

Similar a Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan

Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
Andrew White
 
Integrating DevOps and ITSM for agility in action_v1
Integrating DevOps and ITSM for agility in action_v1Integrating DevOps and ITSM for agility in action_v1
Integrating DevOps and ITSM for agility in action_v1
Aswin Kumar
 
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
ZehraKoker
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
VMware Tanzu
 

Similar a Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan (20)

SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
 
A Study on the Application of Web-Scale IT in Enterprises in IoT Era
A Study on the Application of Web-Scale IT in Enterprises in IoT EraA Study on the Application of Web-Scale IT in Enterprises in IoT Era
A Study on the Application of Web-Scale IT in Enterprises in IoT Era
 
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
9 Ways to Integrate AI in DevOps for Enhanced Efficiency.pdf
9 Ways to Integrate AI in DevOps for Enhanced Efficiency.pdf9 Ways to Integrate AI in DevOps for Enhanced Efficiency.pdf
9 Ways to Integrate AI in DevOps for Enhanced Efficiency.pdf
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
 
How AIOps (Artificial Intelligence in IT Operations) help in improving IT ope...
How AIOps (Artificial Intelligence in IT Operations) help in improving IT ope...How AIOps (Artificial Intelligence in IT Operations) help in improving IT ope...
How AIOps (Artificial Intelligence in IT Operations) help in improving IT ope...
 
Machine Learning to Turbo-Charge the Ops Portion of DevOps
Machine Learning to Turbo-Charge the Ops Portion of DevOpsMachine Learning to Turbo-Charge the Ops Portion of DevOps
Machine Learning to Turbo-Charge the Ops Portion of DevOps
 
Integrating DevOps and ITSM for agility in action_v1
Integrating DevOps and ITSM for agility in action_v1Integrating DevOps and ITSM for agility in action_v1
Integrating DevOps and ITSM for agility in action_v1
 
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
Gain cloud agility with software-defined infrastructure_ A blueprint for opti...
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
SOC Lessons from DevOps and SRE by Anton Chuvakin
SOC Lessons from DevOps and SRE by Anton ChuvakinSOC Lessons from DevOps and SRE by Anton Chuvakin
SOC Lessons from DevOps and SRE by Anton Chuvakin
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
A perspective on cloud computing and enterprise saa s applications
A perspective on cloud computing and enterprise saa s applicationsA perspective on cloud computing and enterprise saa s applications
A perspective on cloud computing and enterprise saa s applications
 
Automated Data Center?
Automated Data Center?Automated Data Center?
Automated Data Center?
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
 
Modernize and Simplify IT Operations Management for DevOps Success
Modernize and Simplify IT Operations Management for DevOps SuccessModernize and Simplify IT Operations Management for DevOps Success
Modernize and Simplify IT Operations Management for DevOps Success
 
Azure Migration .pptx
Azure Migration .pptxAzure Migration .pptx
Azure Migration .pptx
 
How to modernize legacy application infrastructure?
How to modernize legacy application infrastructure?How to modernize legacy application infrastructure?
How to modernize legacy application infrastructure?
 

Más de AgileNetwork

Más de AgileNetwork (20)

ANIn Ahmedabad May 2024 | Sailing the Agile seas Leveraging Business Prioriti...
ANIn Ahmedabad May 2024 | Sailing the Agile seas Leveraging Business Prioriti...ANIn Ahmedabad May 2024 | Sailing the Agile seas Leveraging Business Prioriti...
ANIn Ahmedabad May 2024 | Sailing the Agile seas Leveraging Business Prioriti...
 
ANIn Mumbai May 2024 | Measuring Business Agility by Prashant Neharkar
ANIn Mumbai May 2024 | Measuring Business Agility by Prashant NeharkarANIn Mumbai May 2024 | Measuring Business Agility by Prashant Neharkar
ANIn Mumbai May 2024 | Measuring Business Agility by Prashant Neharkar
 
ANIn Ahmedabad May 2024 | Reusability Using Agile by Pratik Patel
ANIn Ahmedabad May 2024 | Reusability Using Agile by Pratik PatelANIn Ahmedabad May 2024 | Reusability Using Agile by Pratik Patel
ANIn Ahmedabad May 2024 | Reusability Using Agile by Pratik Patel
 
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...
 
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
 
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
ANIn Gurugram April 2024 |Agile Adaptation: Driving Progress in Generative AI...
 
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by KunalANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
ANIn Noida Oct 2023 |AI Usage in Agile Transformation Journey by Kunal
 
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek NandyANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
 
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
ANIn Kolkata April 2024 | AI Enabled Reflection in Agile Delivery by Indranil...
 
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
ANIn Gurugram April 2024 |Can Agile and AI work together? by Pramodkumar Shri...
 
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta NalkeANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
ANIn Pune April 2024 |L&D Accelerating business growth by Mukta Nalke
 
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
ANIn Pune April 2024 | Meeting Modern Learning Needs with Innovation by Ankit...
 
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
ANIn Ahmedabad April 2024 | Powering Big Wins with Small, Agile Teams by Yoge...
 
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna SANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
ANIn Coimbatore March 2024 | Unlocking Agility with Gen AI by Balaprasanna S
 
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
ANIn Coimbatore March 2024 | Agile & AI in Project Management by Dhilipkumar ...
 
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
 
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
ANIn Chennai March 2024 |Oxygenating AI ecosystem with Agility by Gowtham Bal...
 
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh MehtaANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
ANIn Ahmedabad March 2024 | The Power of Retrospection by Rakesh Mehta
 
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
ANIn Pune March 2024 | Customer Stratification for Business Growth by Manish ...
 
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
ANIn Coimbatore July 2023 | Business Agility in Data Science by Dr.Selvaraaju...
 

Último

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan

  • 1. Delivering Transformation. Together. SITE RELIABILITY ENGINEERING & AIOPS Murugan Muthayan Agility Day – Noida – Aug 2019
  • 2. AGILE, DEVOPS AND SRE.. 2 Agile Development • Transformed the way software being built • Collaboration & quicker feedback loop • Better control, early value DevOps • Cultural transformation focused on delivery speed • Enable automation wherever possible • Make development and operation process frictionless Site Reliability Engineering Focus to improve the reliability of software in production by implementing the best practices in engineering and operations
  • 3. Tesco Transport Systems Adjustment3 SITE RELIABILITY ENGINEERING SRE incorporates Engineering, Infrastructure and Operation aspects to create scalable and reliable software systems that are highly automatic and self-healing. SRE aims at DevOps to NoOps - “what happens when a software engineer is tasked with what used to be called operations.” - Ben Treynor, Founder of Google SRE The purpose of SRE is to achieve reliability by implementing the best practices in engineering and operations. SRE can be thought of as an extreme implementation of DevOps.
  • 4. FOCUS AREA FOR SRE 4 MONITORING (Performance Metrics) ALERTING (Immediate notifications) INFRASTRUCTURE (Scalability / Limitations) ENGINEERING (Application Design) DEBUGGING (Log files, code analysis) SECURITY (Vulnerabilities) BEST PRACTICES (Documentation & Training)
  • 5. SITE RELIABILITY ENGINEER 5 The ideal site reliability engineer is either a software engineer with a good administration background or a highly skilled system administrator with knowledge of coding and automation – “Part systems administrator, part second tier support and part developer” 50% cap on the aggregate "ops" work for all SREs. SRE team must spend the remaining 50% of its time actually doing development activities An SRE team is responsible for, • availability, • latency, • performance, • efficiency, • change management, • monitoring, • emergency response, • capacity planning
  • 6. 6 SRE – SKILLS CHECKLIST
  • 7. SRE - METRICS & MEASUREMENTS 7 Service Level Indicators that measures failures per request by calculating request latencySLI Service Level Objectives that sets goals for System availability, performance, success ratesSLO Service level agreements that are driven from SLO and dictate commercial penaltiesSLA It is a measure of risk and the amount of headroom you have above the SLAError Budget Mean time to repair is average time required to repair a failureMTTR Predicted elapsed time between inherent failures of a system during operationMTTF
  • 8. TAKE AWAY.. 8 ..and AIOps takes a further step from SRE towards automating IT operations using advanced analytics !!!
  • 9. COGNITIVE LEARNING – INTELLIGENT OPERATIONS (AIOps) 9 Insight Predict Big Data Machine Learning
  • 10. Definition - What Does AIOps Mean? 10 AIOps is a methodology that is on the frontier of enterprise IT operations. AIOps automates various aspects of IT and utilizes the power of artificial intelligence to create self-learning programs that help revolutionize IT services. It is the application of advanced analytics—in the form of machine learning (ML) and artificial intelligence (AI), towards automating operations so that your IT Ops team can move at the speed that your business expects today. AIOps refers to multi-layered technology platforms that automate and enhance IT operations by 1) using analytics and machine learning to analyze big data collected from various IT operations tools and devices, in order to 2) automatically spot and react to issues in real time.
  • 11. What Will Tomorrow Look Like ? 11 ….Function Follows Need Distributed Computing Software Defined Everything Monitoring Platforms ISV Platforms Patchwork, Open source, Departmental Source Events Custom/Standard/Fixed ~ 100 – 1000 eps Chaotic, Unstructured ~ 1000 – 100,000 eps Configuration Flexible TBC ~ hours Chaotic TBC < 1 second Infrastructure Multi vendor UNIX/IP/Windows client server Virtualised/Containers Fluid/UNIX/Mobile/Micro Digital Transformation Demands DevOps & elastic 2010 2020
  • 12. Current and Future Demands 12 Scale • 105+ Moving Parts • 106+ Notifications • 109+ Data Points • 1012 -> 10120+ Possible Failure Modes + Bounded by the estimated information content of the universe ! Compulsion of Change Complexity Reduction in the Unit of compute Mainframe → Server → VM → Container Multiple Orders of Magnitude Increase in Change Cycle Fully fluid CI/CD Cycle
  • 13. Traditional IT Ops caught Flat - Footed 13 Overwhelmed by DATA and a lack of INFORMATION Siloed teams and tools Too many alerts No context when an incident occurs No early warning DevOps lacks proactive assurance 75-80% ~ 90% > 45% > 73% Many Siloed War room
  • 14. IT Ops Priorities Driven by Digital Transformation 14 INCREASE frequency of change, stability and availability of IT services1 REDUCE resource operations workload and INCREASE productivity2 CONSOLDATE tools3 MIGRATE to the cloud4 SUPPORT software-defined services5 SUPPORT microservices based software architecture6
  • 15. AIOps Agile and Proactive Event-to-Resolution Workflow 15 Early Detection, fewer tickets, reduced MTTR Industrialised data ingestion from multiple sources Automatically resolves signals from alert noise Proactively and automatically detects incidents and probable root causes (reduced MTTD) Enables collaborative workflows (reduces adverse business impact) Triggers automation to restore services Predictive insights (reduced support escalations and MTTR)
  • 16. How AIOps makes ITOps Robutst ? 16 • Determine the service health of mission-critical services or applications. • Gain control and visibility to spiraling consumption of cloud resources. • Accelerate MTTR with automated incident management and real- time configuration management database (CMDB) updates. • Build context-rich data lakes integrating disparate, third-party data sources.
  • 17. AIOps makes Teams Faster, Smarter, and More Productive 17 Level 0/NOC Operators • Improve efficiency by consolidating related alerts together • Reduce catch-n-dispatch activities Support SMEs & Developers • Pass incident resolution knowledge to lower support tiers • Collaborate across complex multi-disciplinary incidents IT Operations Managers • Delivery service-level state monitoring • Improve efficiency and job satisfaction • Identify and address repeating mundane work with run book automation • Investigate and problem-solve for frequently repeating P3-P5 incidents IT Senior Management • Achieve overall per-alert efforts reduction • Re -purpose the savings towards business’s bottom line