SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
Krishnan Subramanian, Chief Research Analyst, Rishidot Research
StackSense Research Brief
Observability and Modern Enterprise
The role of machine learning and artificial intelligence in
Observability
Research Brief: Observability and Modern Enterprise
StackSense.io © Rishidot Research
Summary
The term Observability is fast moving towards the peak of the hype cycle but it is critical to
managing the cloud native architectures in any modern enterprise. In this research brief, we
stake out our position on the evolution of Observability in IT operations and highlight the
potential of using machine learning and artificial intelligence to make Observability more useful
in the context of distributed environments. We will also highlight some considerations for
organizations exploring the use of Machine Learning (ML) or Artificial Intelligence (AI) to manage
Observability data.
Introduction
The term Observability is all rage today among the IT operations, SRE and DevOps but there is
also quite a bit of confusion in the minds of Modern IT Decision Makers on how it fits in their
strategy. Questions like is it Monitoring ++, does it help in security, etc? This research brief is
meant to address the basic questions modern enterprise decision makers have on Observability
and also lay out the nascent but evolving landscape.
Let us start with the Wikipedia definition of the term
In control theory, observability is a measure of how well internal states of a system can
be inferred from knowledge of its external outputs. The observability and controllability
of a system are mathematical duals. The concept of observability was introduced by
Hungarian-American engineer Rudolf E. Kálmán for linear dynamic systems.
Even though it doesn’t directly translate to how the term is used in the context of modern
enterprise IT, it does offer a partial definition of Observability. Observability is about knowing
the internal states of the (distributed) system through the knowledge of the external outputs.
The knowledge about the internal states of a highly distributed system is critical and the only
way IT operations can infer it is through the externally available knowledge including monitoring
data, log data, tracing data, etc..
Our take on what constitutes Observability aligns with Twitter’s original blog post on the topic
● Monitoring
● Alerting/visualization
● Distributed systems tracing infrastructure
● Log aggregation/analytics
Clearly, it goes well beyond monitoring and transforms the traditional approach of IT Operations.
Research Brief: Observability and Modern Enterprise
StackSense.io © Rishidot Research
Observability: Transforming what to why
In the traditional world of IT operations, the focus of monitoring has centered on what is
happening in the system than finding out why it is happening. When you were dealing with
monolithic apps on servers hosted on your data center, such a traditional approach to monitoring
helped solve most of the problems IT operations faced in their daily tasks. With the cloud native
approach powered by containers on the infrastructure and microservices on the application
layer, the modern IT is faced with an increasingly distributed environments where the traditional
ideas about reliability and monitoring breaks down. IT Operations are faced with a more
distributed infrastructure underneath and applications on top. With a focus on resiliency in the
modern distributed computing, it is critical to go beyond what is happening to figure out why
systems are behaving in a certain way.
Observability brings together monitoring data, log data and tracing data to add a correlation and
context so that IT Operations, SRE and DevOps teams can better understand the system
dynamics and take appropriate action pro-actively than the traditional reactive approach. In
other words, Observability helps people to go beyond what is happening in their system to why
something is happening. This knowledge is key to managing cloud native infrastructure and the
applications. Observability helps teams anticipate failures, including grey failures which is
difficult to anticipate with traditional tools and operations. The wealth of knowledge in the
Observability data helps SRE and DevOps teams manage both known failures and grey failures in
a more graceful manner, without impacting the user experience of the end users.
The very shift from what to why requires a mindset change among the folks responsible for
modern IT operations. They shouldn’t consider Observability as
● a new set of tools to use for cloud native environments
● a new way of doing the old things
● a more sophisticated monitoring tool that has better debuggability
● a magic pill that makes failures go away
Instead they should treat Observability as a paradigm that helps them understand the variations
in the underlying dynamics of their systems, helping them “smell” potential failures much before
the failures happen and take remediation measures. Observability helps IP
Operations/SRE/DevOps teams do their jobs as the systems they manage transitions to be more
distributed and complex.
Observability: Going beyond rules-based approach
While the idea of Observability is gaining steam, the way the data is used relies mostly on a more
traditional approach of using existing knowledge of systems and failure domains to set up rules
Research Brief: Observability and Modern Enterprise
StackSense.io © Rishidot Research
that help Operations proactively tackle these failures before they happen. Such an approach is
effective, but it is not scalable, especially as the underlying infrastructure becomes more
distributed and fluid due to the use of containers that are easily portable, edge computing and
IoT devices. The complexity added by these increasingly distributed systems cannot be handled
with just the existing knowledge about failure domains. The use of humans in deciding the
importance of a specific set of failures requires throwing away data that are useless from the
perspective of these failures. This severely limits the operations team from finding grey failures
with potentially catastrophic impacts.
In order to avoid grey failures, it is important to collect data from many more sources and these
data should be correlated to see patterns that are beyond the existing human knowledge. In
order to do that, one shouldn’t be getting rid of “unwanted data” but collect more data from
more sources to add a better context and provide better correlation. Humans cannot process
such large volumes of data in an efficient way. This is where machine learning and artificial
intelligence becomes important.
Observability: Machine Learning to the rescue
Machine learning (and eventually deep learning) can help organizations take advantage of vast
amounts of Observability data to identify grey failures which are otherwise not visible to human
processing. With edge computing and IoT becoming the norm, modern IT’s scope expands
beyond the traditional systems in the data center and the perimeter gets more fluid. In today’s
world, user experience is king, and it is critical for organizations to collect data across all the
devices that play a role in modern applications. Not only the infrastructure and the applications
on top are distributed but the consumption devices of users are also distributed and more global.
In such an environment, using human centric rules as the driving force for seamless user
experience will be of limited help. Such distributed environments not only bring in new type of
challenges but also exaggerate the human blind spots in troubleshooting.
Machine learning, where computers can dig through vast volumes of data, to find patterns that
can be correlated to failure domains and grey failures plays a significant role in Observability.
Without the machine intelligence from vast amounts of data, IT operations/SRE/DevOps teams
are only looking at a subset of problem domains and they are not in a position to avert
catastrophic failures. One good, but very unfortunate, example is the recent engine failure in
Southwest Airlines where human centric approach to aircraft maintenance failed to notice the
grey failure happening due to subsurface flaw. We are not arguing that use of machine learning
would have prevented this accident. We are just making a case that using a large number of
Observability data sources coupled with use of machine learning or deep learning models has
higher chances of detecting such grey failures than traditional human centric approaches with a
limited set of Observability data.
Research Brief: Observability and Modern Enterprise
StackSense.io © Rishidot Research
At Rishidot Research, we feel that the use of Machine Learning and Artificial Intelligence in
Observability is only at the beginning stages now with organizations using them to tackle
problems that are considered as low hanging fruits. We expect that this trend will accelerate in
the next few years with ML/AI engine being mainstream in 2-3 years’ timeframe. The biggest
obstacle to the use of machine learning in Observability data is the lack of training data that can
fit the needs of any organization. We expect this to change in the future with web scale cloud
providers either offering data from their operations as the seed for training the models (think of
open source approach to sharing data) or offer it as a service for different products to train their
models. The other option is to start with human centric approach with data from the end user
organization and, slowly, let the learning engine to learn from the production data. Since it is
early days in the use of machine learning and artificial intelligence on Observability data, there is
no clear-cut prescription for this problem, but we expect this to change in the coming months.
ML/AI in Observability: Some considerations
Whether you are instrumenting your Observability data platform using DIY approach and open
source software or building it using packaged vendor tools, there are some considerations we
want to highlight, and which can help you maximize benefits.
● Build the right mindset and culture. Get IT operations to start thinking about resiliency
over reliability and take advantage of Observability data in rolling our resilient services.
This cultural change is critical in not just managing distributed systems but, also, in using
Observability efficiently
● Stop discarding data and bring together various data sources by breaking down the silos.
Whether it is data from your DevOps pipeline or production environment including edge
locations or end user performance data, it is important to feed from all the sources before
applying machine learning or deep learning models effectively
● Focus on training data. It is difficult to train the models efficiently because the needs of
each organization are different. Using generic data may end up creating more problems.
Understand the training data to see if it can help your organization or start with a rules-
based system and slowly use the organization’s data to train the learning models
● Focus on instrumentation. Instrumentation in the context of Observability is about
increasing Observability. So, focus on building instrumentation into everything from
infrastructure to the code you run. Instrumentation cannot be an afterthought
● Focus on automation and how it can be seamlessly hooked into Observability data to
ensure a more autonomous self-healing and self-evolving system
Research Brief: Observability and Modern Enterprise
StackSense.io © Rishidot Research
Conclusion
As you modernize your IT and start embracing cloud native architectures, Observability is key to
running resilient systems. Machine Learning and Artificial Intelligence have a critical role to play
in enhancing Observability, especially as the perimeter moves towards the edge and IoT devices.
The use of ML and AI in Observability is still in very early stages, but we expect it to become
mainstream in 2-3 years. As a modern enterprise stakeholder, it is important you understand the
role of Observability and how machine learning and AI can shape its future.
StackSense.io Sponsors
SWIM is real-time edge intelligence software. SWIM enables
intelligent data transformation at the edge, by reducing,
analyzing and learning from fast data locally on edge devices.
Learn more about SWIM at swim.ai
CoreStack uses Cloud-as-Code™ approach to empower
Enterprises to accelerate innovation through Frictionless
consumption of cloud services and tools, delivering Multi-
Cloud Governance. Learn more at corestack.io
CloudFabrix simplifies and unifies IT operations and
governance across multi-cloud environments using AIOps.
Learn more at cloudfabrix.com

Más contenido relacionado

La actualidad más candente

Should we fear the cloud?
Should we fear the cloud?Should we fear the cloud?
Should we fear the cloud?Gabe Akisanmi
 
Pdf wp-emc-mozyenterprise-hybrid-cloud-backup
Pdf wp-emc-mozyenterprise-hybrid-cloud-backupPdf wp-emc-mozyenterprise-hybrid-cloud-backup
Pdf wp-emc-mozyenterprise-hybrid-cloud-backuplverb
 
Lessons Learned from ELN & LIMS Implementations
Lessons Learned from ELN & LIMS ImplementationsLessons Learned from ELN & LIMS Implementations
Lessons Learned from ELN & LIMS ImplementationsMark Fortner
 
ThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECMThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECMChristopher Wynder
 
Ecm implementation planning_workshop_hospital_sample
Ecm implementation planning_workshop_hospital_sampleEcm implementation planning_workshop_hospital_sample
Ecm implementation planning_workshop_hospital_sampleChristopher Wynder
 
Is your infrastructure holding you back?
Is your infrastructure holding you back?Is your infrastructure holding you back?
Is your infrastructure holding you back?Gabe Akisanmi
 
rpaper
rpaperrpaper
rpaperimu409
 
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEIAEME Publication
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
Workflow enhances ECM adoption_LaserFicheEpower14
Workflow enhances ECM adoption_LaserFicheEpower14Workflow enhances ECM adoption_LaserFicheEpower14
Workflow enhances ECM adoption_LaserFicheEpower14Christopher Wynder
 
Compliance With Data Security Policies
Compliance With Data Security PoliciesCompliance With Data Security Policies
Compliance With Data Security PoliciesHongyang Wang
 
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...Cognizant
 
Securing a Collaborative Environment
Securing a Collaborative EnvironmentSecuring a Collaborative Environment
Securing a Collaborative EnvironmentJoseph Pidala
 
Bring IT together_2015_ECOOandOASBO
Bring IT together_2015_ECOOandOASBOBring IT together_2015_ECOOandOASBO
Bring IT together_2015_ECOOandOASBOChristopher Wynder
 
Are Your PCs and Laptops Recovery and Discovery Ready?
Are Your PCs and Laptops Recovery and Discovery Ready?Are Your PCs and Laptops Recovery and Discovery Ready?
Are Your PCs and Laptops Recovery and Discovery Ready?Iron Mountain
 
AMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information managementAMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information managementChristopher Wynder
 
Neural networks in accounting and auditing slidecast
Neural networks in accounting and auditing slidecastNeural networks in accounting and auditing slidecast
Neural networks in accounting and auditing slidecastm13chan
 

La actualidad más candente (20)

Should we fear the cloud?
Should we fear the cloud?Should we fear the cloud?
Should we fear the cloud?
 
Pdf wp-emc-mozyenterprise-hybrid-cloud-backup
Pdf wp-emc-mozyenterprise-hybrid-cloud-backupPdf wp-emc-mozyenterprise-hybrid-cloud-backup
Pdf wp-emc-mozyenterprise-hybrid-cloud-backup
 
Lessons Learned from ELN & LIMS Implementations
Lessons Learned from ELN & LIMS ImplementationsLessons Learned from ELN & LIMS Implementations
Lessons Learned from ELN & LIMS Implementations
 
Evanta 2018 msp big 3 tech
Evanta 2018 msp big 3 techEvanta 2018 msp big 3 tech
Evanta 2018 msp big 3 tech
 
ThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECMThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECM
 
Ecm implementation planning_workshop_hospital_sample
Ecm implementation planning_workshop_hospital_sampleEcm implementation planning_workshop_hospital_sample
Ecm implementation planning_workshop_hospital_sample
 
Is your infrastructure holding you back?
Is your infrastructure holding you back?Is your infrastructure holding you back?
Is your infrastructure holding you back?
 
rpaper
rpaperrpaper
rpaper
 
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCE
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
Laserfiche empowercon302 2016
Laserfiche empowercon302 2016Laserfiche empowercon302 2016
Laserfiche empowercon302 2016
 
Workflow enhances ECM adoption_LaserFicheEpower14
Workflow enhances ECM adoption_LaserFicheEpower14Workflow enhances ECM adoption_LaserFicheEpower14
Workflow enhances ECM adoption_LaserFicheEpower14
 
Compliance With Data Security Policies
Compliance With Data Security PoliciesCompliance With Data Security Policies
Compliance With Data Security Policies
 
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...
To Be Digital, Pharma Labs Must Bridge the Gap Between Legacy Systems & Conne...
 
Securing a Collaborative Environment
Securing a Collaborative EnvironmentSecuring a Collaborative Environment
Securing a Collaborative Environment
 
Bring IT together_2015_ECOOandOASBO
Bring IT together_2015_ECOOandOASBOBring IT together_2015_ECOOandOASBO
Bring IT together_2015_ECOOandOASBO
 
Safeguarding the Enterprise
Safeguarding the EnterpriseSafeguarding the Enterprise
Safeguarding the Enterprise
 
Are Your PCs and Laptops Recovery and Discovery Ready?
Are Your PCs and Laptops Recovery and Discovery Ready?Are Your PCs and Laptops Recovery and Discovery Ready?
Are Your PCs and Laptops Recovery and Discovery Ready?
 
AMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information managementAMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information management
 
Neural networks in accounting and auditing slidecast
Neural networks in accounting and auditing slidecastNeural networks in accounting and auditing slidecast
Neural networks in accounting and auditing slidecast
 

Similar a Research brief observability and modern enterprise

Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldNeil Raden
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data securitybalasahebcomp
 
Notes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdfNotes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdfKarishma Chaudhary
 
IT Analytics delivers answers that current IT management tools can't provide
IT Analytics delivers answers that current IT management tools can't provideIT Analytics delivers answers that current IT management tools can't provide
IT Analytics delivers answers that current IT management tools can't provideEvolven Software
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
 
The it department pain
The it department painThe it department pain
The it department painjohn coaxum
 
The it department pain
The it department painThe it department pain
The it department painjohn coaxum
 
Observability A Critical Practice to Enable Digital Transformation
Observability A Critical Practice to Enable Digital TransformationObservability A Critical Practice to Enable Digital Transformation
Observability A Critical Practice to Enable Digital TransformationCloudZenix LLC
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Science Council of America
 
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docxcurwenmichaela
 
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation Tools
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation ToolsAs Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation Tools
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation ToolsDana Gardner
 
A Comprehensive Guide to AIOps Integration in Organizations
A Comprehensive Guide to AIOps Integration in OrganizationsA Comprehensive Guide to AIOps Integration in Organizations
A Comprehensive Guide to AIOps Integration in OrganizationsCloudZenix LLC
 
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?Data IQ Argentina
 
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...Dana Gardner
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdfUSDSI
 
Accenture Tech Vision2011 Report V6 1901
Accenture Tech Vision2011 Report V6 1901Accenture Tech Vision2011 Report V6 1901
Accenture Tech Vision2011 Report V6 1901Ann Honomichl
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Angie Jorgensen
 

Similar a Research brief observability and modern enterprise (20)

Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid World
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
 
Notes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdfNotes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdf
 
IT Analytics delivers answers that current IT management tools can't provide
IT Analytics delivers answers that current IT management tools can't provideIT Analytics delivers answers that current IT management tools can't provide
IT Analytics delivers answers that current IT management tools can't provide
 
Data Analytics - The Insight
Data Analytics - The InsightData Analytics - The Insight
Data Analytics - The Insight
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
 
The it department pain
The it department painThe it department pain
The it department pain
 
The it department pain
The it department painThe it department pain
The it department pain
 
Observability A Critical Practice to Enable Digital Transformation
Observability A Critical Practice to Enable Digital TransformationObservability A Critical Practice to Enable Digital Transformation
Observability A Critical Practice to Enable Digital Transformation
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
 
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation Tools
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation ToolsAs Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation Tools
As Hybrid IT Complexity Ramps Up, Operators Look To Data-Driven Automation Tools
 
A Comprehensive Guide to AIOps Integration in Organizations
A Comprehensive Guide to AIOps Integration in OrganizationsA Comprehensive Guide to AIOps Integration in Organizations
A Comprehensive Guide to AIOps Integration in Organizations
 
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
 
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
Mission Critical Use Cases Show How Analytics Architectures Usher in an Artif...
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
 
Accenture Tech Vision2011 Report V6 1901
Accenture Tech Vision2011 Report V6 1901Accenture Tech Vision2011 Report V6 1901
Accenture Tech Vision2011 Report V6 1901
 
unit-4-notes.pdf
unit-4-notes.pdfunit-4-notes.pdf
unit-4-notes.pdf
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
 

Más de Rishidot Research

Decision Makers Guide: Nomad vs Kubernetes
Decision Makers Guide: Nomad vs KubernetesDecision Makers Guide: Nomad vs Kubernetes
Decision Makers Guide: Nomad vs KubernetesRishidot Research
 
VMs containers and serverless
VMs containers and serverlessVMs containers and serverless
VMs containers and serverlessRishidot Research
 
Serverless Architecture - Beginning of a Trend?
Serverless Architecture - Beginning of a Trend?Serverless Architecture - Beginning of a Trend?
Serverless Architecture - Beginning of a Trend?Rishidot Research
 
Briefing notes: CloudVelocity
Briefing notes:   CloudVelocityBriefing notes:   CloudVelocity
Briefing notes: CloudVelocityRishidot Research
 
Dissecting The PaaS Landscape
Dissecting The PaaS LandscapeDissecting The PaaS Landscape
Dissecting The PaaS LandscapeRishidot Research
 
Rishidot Research Briefing Notes - Ravello Systems
Rishidot Research Briefing Notes - Ravello SystemsRishidot Research Briefing Notes - Ravello Systems
Rishidot Research Briefing Notes - Ravello SystemsRishidot Research
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes CloudscalingRishidot Research
 
Open source and cloud computing
Open source and cloud computingOpen source and cloud computing
Open source and cloud computingRishidot Research
 
Intelligent Platforms: Iterating Beyond Today's PaaS
Intelligent Platforms: Iterating Beyond Today's PaaSIntelligent Platforms: Iterating Beyond Today's PaaS
Intelligent Platforms: Iterating Beyond Today's PaaSRishidot Research
 
Big data and intelligent platforms
Big data and intelligent platformsBig data and intelligent platforms
Big data and intelligent platformsRishidot Research
 
Startups And The Cloud Chain Reaction
Startups And The Cloud Chain ReactionStartups And The Cloud Chain Reaction
Startups And The Cloud Chain ReactionRishidot Research
 
Rishidot Research Briefing Note - AppZero
Rishidot Research Briefing Note - AppZeroRishidot Research Briefing Note - AppZero
Rishidot Research Briefing Note - AppZeroRishidot Research
 
CloudOpen 2012 Slides: Open Source and Federation
CloudOpen 2012 Slides: Open Source and FederationCloudOpen 2012 Slides: Open Source and Federation
CloudOpen 2012 Slides: Open Source and FederationRishidot Research
 

Más de Rishidot Research (16)

Decision Makers Guide: Nomad vs Kubernetes
Decision Makers Guide: Nomad vs KubernetesDecision Makers Guide: Nomad vs Kubernetes
Decision Makers Guide: Nomad vs Kubernetes
 
The promise of multi cloud
The promise of multi cloudThe promise of multi cloud
The promise of multi cloud
 
VMs containers and serverless
VMs containers and serverlessVMs containers and serverless
VMs containers and serverless
 
Serverless Architecture - Beginning of a Trend?
Serverless Architecture - Beginning of a Trend?Serverless Architecture - Beginning of a Trend?
Serverless Architecture - Beginning of a Trend?
 
Briefing notes: CloudVelocity
Briefing notes:   CloudVelocityBriefing notes:   CloudVelocity
Briefing notes: CloudVelocity
 
Dissecting The PaaS Landscape
Dissecting The PaaS LandscapeDissecting The PaaS Landscape
Dissecting The PaaS Landscape
 
Rishidot Research Briefing Notes - Ravello Systems
Rishidot Research Briefing Notes - Ravello SystemsRishidot Research Briefing Notes - Ravello Systems
Rishidot Research Briefing Notes - Ravello Systems
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes Cloudscaling
 
Open source and cloud computing
Open source and cloud computingOpen source and cloud computing
Open source and cloud computing
 
Intelligent Platforms: Iterating Beyond Today's PaaS
Intelligent Platforms: Iterating Beyond Today's PaaSIntelligent Platforms: Iterating Beyond Today's PaaS
Intelligent Platforms: Iterating Beyond Today's PaaS
 
Big data and intelligent platforms
Big data and intelligent platformsBig data and intelligent platforms
Big data and intelligent platforms
 
Startups And The Cloud Chain Reaction
Startups And The Cloud Chain ReactionStartups And The Cloud Chain Reaction
Startups And The Cloud Chain Reaction
 
Importance of OpenStack
Importance of OpenStackImportance of OpenStack
Importance of OpenStack
 
Briefing Notes: Midokura
Briefing Notes: MidokuraBriefing Notes: Midokura
Briefing Notes: Midokura
 
Rishidot Research Briefing Note - AppZero
Rishidot Research Briefing Note - AppZeroRishidot Research Briefing Note - AppZero
Rishidot Research Briefing Note - AppZero
 
CloudOpen 2012 Slides: Open Source and Federation
CloudOpen 2012 Slides: Open Source and FederationCloudOpen 2012 Slides: Open Source and Federation
CloudOpen 2012 Slides: Open Source and Federation
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Research brief observability and modern enterprise

  • 1. Krishnan Subramanian, Chief Research Analyst, Rishidot Research StackSense Research Brief Observability and Modern Enterprise The role of machine learning and artificial intelligence in Observability
  • 2. Research Brief: Observability and Modern Enterprise StackSense.io © Rishidot Research Summary The term Observability is fast moving towards the peak of the hype cycle but it is critical to managing the cloud native architectures in any modern enterprise. In this research brief, we stake out our position on the evolution of Observability in IT operations and highlight the potential of using machine learning and artificial intelligence to make Observability more useful in the context of distributed environments. We will also highlight some considerations for organizations exploring the use of Machine Learning (ML) or Artificial Intelligence (AI) to manage Observability data. Introduction The term Observability is all rage today among the IT operations, SRE and DevOps but there is also quite a bit of confusion in the minds of Modern IT Decision Makers on how it fits in their strategy. Questions like is it Monitoring ++, does it help in security, etc? This research brief is meant to address the basic questions modern enterprise decision makers have on Observability and also lay out the nascent but evolving landscape. Let us start with the Wikipedia definition of the term In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals. The concept of observability was introduced by Hungarian-American engineer Rudolf E. Kálmán for linear dynamic systems. Even though it doesn’t directly translate to how the term is used in the context of modern enterprise IT, it does offer a partial definition of Observability. Observability is about knowing the internal states of the (distributed) system through the knowledge of the external outputs. The knowledge about the internal states of a highly distributed system is critical and the only way IT operations can infer it is through the externally available knowledge including monitoring data, log data, tracing data, etc.. Our take on what constitutes Observability aligns with Twitter’s original blog post on the topic ● Monitoring ● Alerting/visualization ● Distributed systems tracing infrastructure ● Log aggregation/analytics Clearly, it goes well beyond monitoring and transforms the traditional approach of IT Operations.
  • 3. Research Brief: Observability and Modern Enterprise StackSense.io © Rishidot Research Observability: Transforming what to why In the traditional world of IT operations, the focus of monitoring has centered on what is happening in the system than finding out why it is happening. When you were dealing with monolithic apps on servers hosted on your data center, such a traditional approach to monitoring helped solve most of the problems IT operations faced in their daily tasks. With the cloud native approach powered by containers on the infrastructure and microservices on the application layer, the modern IT is faced with an increasingly distributed environments where the traditional ideas about reliability and monitoring breaks down. IT Operations are faced with a more distributed infrastructure underneath and applications on top. With a focus on resiliency in the modern distributed computing, it is critical to go beyond what is happening to figure out why systems are behaving in a certain way. Observability brings together monitoring data, log data and tracing data to add a correlation and context so that IT Operations, SRE and DevOps teams can better understand the system dynamics and take appropriate action pro-actively than the traditional reactive approach. In other words, Observability helps people to go beyond what is happening in their system to why something is happening. This knowledge is key to managing cloud native infrastructure and the applications. Observability helps teams anticipate failures, including grey failures which is difficult to anticipate with traditional tools and operations. The wealth of knowledge in the Observability data helps SRE and DevOps teams manage both known failures and grey failures in a more graceful manner, without impacting the user experience of the end users. The very shift from what to why requires a mindset change among the folks responsible for modern IT operations. They shouldn’t consider Observability as ● a new set of tools to use for cloud native environments ● a new way of doing the old things ● a more sophisticated monitoring tool that has better debuggability ● a magic pill that makes failures go away Instead they should treat Observability as a paradigm that helps them understand the variations in the underlying dynamics of their systems, helping them “smell” potential failures much before the failures happen and take remediation measures. Observability helps IP Operations/SRE/DevOps teams do their jobs as the systems they manage transitions to be more distributed and complex. Observability: Going beyond rules-based approach While the idea of Observability is gaining steam, the way the data is used relies mostly on a more traditional approach of using existing knowledge of systems and failure domains to set up rules
  • 4. Research Brief: Observability and Modern Enterprise StackSense.io © Rishidot Research that help Operations proactively tackle these failures before they happen. Such an approach is effective, but it is not scalable, especially as the underlying infrastructure becomes more distributed and fluid due to the use of containers that are easily portable, edge computing and IoT devices. The complexity added by these increasingly distributed systems cannot be handled with just the existing knowledge about failure domains. The use of humans in deciding the importance of a specific set of failures requires throwing away data that are useless from the perspective of these failures. This severely limits the operations team from finding grey failures with potentially catastrophic impacts. In order to avoid grey failures, it is important to collect data from many more sources and these data should be correlated to see patterns that are beyond the existing human knowledge. In order to do that, one shouldn’t be getting rid of “unwanted data” but collect more data from more sources to add a better context and provide better correlation. Humans cannot process such large volumes of data in an efficient way. This is where machine learning and artificial intelligence becomes important. Observability: Machine Learning to the rescue Machine learning (and eventually deep learning) can help organizations take advantage of vast amounts of Observability data to identify grey failures which are otherwise not visible to human processing. With edge computing and IoT becoming the norm, modern IT’s scope expands beyond the traditional systems in the data center and the perimeter gets more fluid. In today’s world, user experience is king, and it is critical for organizations to collect data across all the devices that play a role in modern applications. Not only the infrastructure and the applications on top are distributed but the consumption devices of users are also distributed and more global. In such an environment, using human centric rules as the driving force for seamless user experience will be of limited help. Such distributed environments not only bring in new type of challenges but also exaggerate the human blind spots in troubleshooting. Machine learning, where computers can dig through vast volumes of data, to find patterns that can be correlated to failure domains and grey failures plays a significant role in Observability. Without the machine intelligence from vast amounts of data, IT operations/SRE/DevOps teams are only looking at a subset of problem domains and they are not in a position to avert catastrophic failures. One good, but very unfortunate, example is the recent engine failure in Southwest Airlines where human centric approach to aircraft maintenance failed to notice the grey failure happening due to subsurface flaw. We are not arguing that use of machine learning would have prevented this accident. We are just making a case that using a large number of Observability data sources coupled with use of machine learning or deep learning models has higher chances of detecting such grey failures than traditional human centric approaches with a limited set of Observability data.
  • 5. Research Brief: Observability and Modern Enterprise StackSense.io © Rishidot Research At Rishidot Research, we feel that the use of Machine Learning and Artificial Intelligence in Observability is only at the beginning stages now with organizations using them to tackle problems that are considered as low hanging fruits. We expect that this trend will accelerate in the next few years with ML/AI engine being mainstream in 2-3 years’ timeframe. The biggest obstacle to the use of machine learning in Observability data is the lack of training data that can fit the needs of any organization. We expect this to change in the future with web scale cloud providers either offering data from their operations as the seed for training the models (think of open source approach to sharing data) or offer it as a service for different products to train their models. The other option is to start with human centric approach with data from the end user organization and, slowly, let the learning engine to learn from the production data. Since it is early days in the use of machine learning and artificial intelligence on Observability data, there is no clear-cut prescription for this problem, but we expect this to change in the coming months. ML/AI in Observability: Some considerations Whether you are instrumenting your Observability data platform using DIY approach and open source software or building it using packaged vendor tools, there are some considerations we want to highlight, and which can help you maximize benefits. ● Build the right mindset and culture. Get IT operations to start thinking about resiliency over reliability and take advantage of Observability data in rolling our resilient services. This cultural change is critical in not just managing distributed systems but, also, in using Observability efficiently ● Stop discarding data and bring together various data sources by breaking down the silos. Whether it is data from your DevOps pipeline or production environment including edge locations or end user performance data, it is important to feed from all the sources before applying machine learning or deep learning models effectively ● Focus on training data. It is difficult to train the models efficiently because the needs of each organization are different. Using generic data may end up creating more problems. Understand the training data to see if it can help your organization or start with a rules- based system and slowly use the organization’s data to train the learning models ● Focus on instrumentation. Instrumentation in the context of Observability is about increasing Observability. So, focus on building instrumentation into everything from infrastructure to the code you run. Instrumentation cannot be an afterthought ● Focus on automation and how it can be seamlessly hooked into Observability data to ensure a more autonomous self-healing and self-evolving system
  • 6. Research Brief: Observability and Modern Enterprise StackSense.io © Rishidot Research Conclusion As you modernize your IT and start embracing cloud native architectures, Observability is key to running resilient systems. Machine Learning and Artificial Intelligence have a critical role to play in enhancing Observability, especially as the perimeter moves towards the edge and IoT devices. The use of ML and AI in Observability is still in very early stages, but we expect it to become mainstream in 2-3 years. As a modern enterprise stakeholder, it is important you understand the role of Observability and how machine learning and AI can shape its future. StackSense.io Sponsors SWIM is real-time edge intelligence software. SWIM enables intelligent data transformation at the edge, by reducing, analyzing and learning from fast data locally on edge devices. Learn more about SWIM at swim.ai CoreStack uses Cloud-as-Code™ approach to empower Enterprises to accelerate innovation through Frictionless consumption of cloud services and tools, delivering Multi- Cloud Governance. Learn more at corestack.io CloudFabrix simplifies and unifies IT operations and governance across multi-cloud environments using AIOps. Learn more at cloudfabrix.com