SlideShare a Scribd company logo
1 of 29
Using Machine Learning to Optimize
DevOps Practices
Building Learning into Monitoring and Feedback
Peter Varhol
About me
• International speaker and writer
• Degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
Agenda
• What is machine learning?
• How is machine learning applied to DevOps?
• Challenges in training these systems
• What constitutes an issue?
• Summary and conclusions
What is Machine Learning?
• Layered algorithms that change parameters based on feedback
from know data
• Can be linear or nonlinear
• Algorithms can be fixed in production or adaptive
• Fixed – algorithms do not adjust once deployed
• Adaptive – algorithms continually adjust to new data
• Usually part of a larger system
Adaptive Systems
• Airline pricing
• Ticket prices change three times a day based on demand
• It can cost less to go farther
• It can cost less later
• Ecommerce systems
• Recommendations try to discern what else you might want
• Can I incentivize you to fill up the plane?
Why Use Adaptive?
• The “right” result will vary over time
• Trying to optimize a particular result
• Revenue
• The problem domain is not static
Confidential, Dynatrace LLC
How Are Fixed Systems Used?
• Transportation
• Self-driving cars
• Aircraft/Drones
• Ecommerce
• Recommendation engines
• Medical
• Diagnosis systems
Why Use Fixed Machine Learning Systems
• The problem domain is static
• The expectations remain constant
• The right answer is known under most conditions
• The original algorithms remain valid over a long period of time
DevOps Practices Generate Data
• During development
• Agile metrics, JIRA issues, test case metrics
• During continuous integration
• System test metrics
• During continuous deployment
• Quality metrics for deployments
• After deployment and into production
• Application availability and performance
• Usage log files
Focus on Monitoring
• Ongoing data on availability and performance
• RUM
• Synthetic tests
• Application monitoring
• Monitoring tackles the back end of DevOps
• Identifying unhealthy trends
• Diagnoses failures and poor performance
• Recommends action
• Fixed or adaptive depends on your goals
Where Do Predictive Analytics Come In?
• Big data makes possible predictions of future events
• Are we going to fail?
• How will we perform with traffic surges?
• As well as past events
• What went wrong and how do we fix it
• We can rely on past data
• Adaptive systems may not perform as well
• Clear goals needed
What Technologies Are Involved?
• Neural networks
• Genetic algorithms
• Rules engines
Neural Networks
• Set of layered algorithms whose variables can be
adjusted via a learning process
• The learning process involves training with
known inputs and outputs
• The algorithms adjust coefficients to converge on
the correct answer (or not)
• You freeze the algorithms and coefficients, and
deploy
• Or you optimize on a particular set of characteristics
A Sample Neural Network
Genetic Algorithms
• Use the principle of natural selection
• Create a range of possible solutions
• Try out each of them
• Choose and combine two of the better
alternatives
• Rinse and repeat as necessary
Bringing in DevOps
• DevOps has data that can be used to train neural networks
• Health of the application
• Trends in application traffic and responsiveness
• Application failure
Machine Learning Helps DevOps
• Decisions are complex
• Why is the CPU maxed?
• What is causing disk thrashing?
• Why did the network slow?
• Why did the application fail?
• Data is massive
• Potentially thousands of data points a day
How Good Are Decisions?
• Expert versus machine
• Given the same data
• In many domains they tie
• With additional data, the human can be better
• But machine learning will get better
• But only as good as the data
We Want to Do Two Things
• Identify trends that may indicate future problems
• Increasing response times
• More page errors
• Diagnose faults once they have happened
• Why did the application fail?
• How can we fix it as quickly as possible?
Fixed Algorithms Work for Some Problems
• Immediate performance and failure identification
• Diagnosis of failures and performance issues
• These are readily identifiable from known data
Adaptive Systems Supplement These Tools
• Predictions of future events
• Performance
• Availability
• The target is moving
• So we need current data to adjust the algorithms
The Machine Helps the DevOps Expert
• The machine learning app provides:
• Early warning on possible performance issues and failures
• Immediate notification of failure or impending failure
• Trend analysis of data to predict unhealthy outcomes
• The machine learning is an assistant
• It can’t fix anything
• It can’t necessarily identify the root cause
What is the Goal?
• We have many ways of monitoring
• Many of them are represented at this conference
• Each measures something a little different
• Latency, response time, availability, network, DNS . . .
• Too much data can be no better than no data at all
• Machine learning can correlate across
measurements
• Focus to eliminate false positives
Intelligent Systems Are Sometimes Wrong
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” is good
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
Testing Machine Learning Systems
• Have objective acceptance criteria
• Test with new data
• Don’t count on all results being accurate
• Understand the architecture of the network as a part of
the testing process
• Communicate the level of confidence you have in the
results to management and users
A Cautionary Tale
• All events are not created equal
• AI systems treat events equally
• A failure of a system during busy season is the same as any other
• DevOps pros know otherwise
• And can exert additional effort in response
• And actually fix the problem
• We can’t automate what we don’t understand
• You need the human in the loop
Confidential, Dynatrace LLC
Conclusions
• DevOps is a natural environment for machine learning
systems
• Any activity that generates data and requires a decision is fair game
• Monitoring is low-hanging fruit
• Fixed systems for failure and diagnosis, adaptive for trend
analysis
Confidential, Dynatrace LLC
References
• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-
whos-to-blame/
• https://pvarhol.wordpress.com/2017/07/22/what-brought-about-
our-ai-revolution/
• https://pvarhol.wordpress.com/2017/06/21/analytics-dont-apply-in-
the-clutch/
Confidential, Dynatrace LLC
Thank You
Peter Varhol
peter@petervarhol.com

More Related Content

What's hot

Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
Druantia
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
Druantia
 

What's hot (20)

Optimizing Java
Optimizing JavaOptimizing Java
Optimizing Java
 
TransPort Workshop
TransPort WorkshopTransPort Workshop
TransPort Workshop
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
What Do We Automate First
What Do We Automate FirstWhat Do We Automate First
What Do We Automate First
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
SharePoint Troubleshooting
SharePoint TroubleshootingSharePoint Troubleshooting
SharePoint Troubleshooting
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Wix Automation - Automation Manager
Wix Automation - Automation ManagerWix Automation - Automation Manager
Wix Automation - Automation Manager
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEX
 
4 pc repair
4 pc repair4 pc repair
4 pc repair
 
Solano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingSolano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in Analytics
 
SHEKHAR VERMA
SHEKHAR VERMASHEKHAR VERMA
SHEKHAR VERMA
 
Digital Testing Approach
Digital Testing ApproachDigital Testing Approach
Digital Testing Approach
 

Viewers also liked

Viewers also liked (9)

Using Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityUsing Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps Maturity
 
The API Side of Monitoring
The API Side of MonitoringThe API Side of Monitoring
The API Side of Monitoring
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
 
Managing the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellManaging the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShell
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?
 
Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?
 
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
 
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsJosh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
 

Similar to Using Machine Learning to Optimize DevOps Practices

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 

Similar to Using Machine Learning to Optimize DevOps Practices (20)

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Making a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanMaking a Mock by Kelsey Shannahan
Making a Mock by Kelsey Shannahan
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldLucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​
 
The Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsThe Analysis Part of Integration Projects
The Analysis Part of Integration Projects
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hell
 
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryAlphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 

More from Peter Varhol

More from Peter Varhol (12)

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Using Machine Learning to Optimize DevOps Practices

  • 1. Using Machine Learning to Optimize DevOps Practices Building Learning into Monitoring and Feedback Peter Varhol
  • 2. About me • International speaker and writer • Degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. Agenda • What is machine learning? • How is machine learning applied to DevOps? • Challenges in training these systems • What constitutes an issue? • Summary and conclusions
  • 4. What is Machine Learning? • Layered algorithms that change parameters based on feedback from know data • Can be linear or nonlinear • Algorithms can be fixed in production or adaptive • Fixed – algorithms do not adjust once deployed • Adaptive – algorithms continually adjust to new data • Usually part of a larger system
  • 5. Adaptive Systems • Airline pricing • Ticket prices change three times a day based on demand • It can cost less to go farther • It can cost less later • Ecommerce systems • Recommendations try to discern what else you might want • Can I incentivize you to fill up the plane?
  • 6. Why Use Adaptive? • The “right” result will vary over time • Trying to optimize a particular result • Revenue • The problem domain is not static Confidential, Dynatrace LLC
  • 7. How Are Fixed Systems Used? • Transportation • Self-driving cars • Aircraft/Drones • Ecommerce • Recommendation engines • Medical • Diagnosis systems
  • 8. Why Use Fixed Machine Learning Systems • The problem domain is static • The expectations remain constant • The right answer is known under most conditions • The original algorithms remain valid over a long period of time
  • 9. DevOps Practices Generate Data • During development • Agile metrics, JIRA issues, test case metrics • During continuous integration • System test metrics • During continuous deployment • Quality metrics for deployments • After deployment and into production • Application availability and performance • Usage log files
  • 10. Focus on Monitoring • Ongoing data on availability and performance • RUM • Synthetic tests • Application monitoring • Monitoring tackles the back end of DevOps • Identifying unhealthy trends • Diagnoses failures and poor performance • Recommends action • Fixed or adaptive depends on your goals
  • 11. Where Do Predictive Analytics Come In? • Big data makes possible predictions of future events • Are we going to fail? • How will we perform with traffic surges? • As well as past events • What went wrong and how do we fix it • We can rely on past data • Adaptive systems may not perform as well • Clear goals needed
  • 12. What Technologies Are Involved? • Neural networks • Genetic algorithms • Rules engines
  • 13. Neural Networks • Set of layered algorithms whose variables can be adjusted via a learning process • The learning process involves training with known inputs and outputs • The algorithms adjust coefficients to converge on the correct answer (or not) • You freeze the algorithms and coefficients, and deploy • Or you optimize on a particular set of characteristics
  • 14. A Sample Neural Network
  • 15. Genetic Algorithms • Use the principle of natural selection • Create a range of possible solutions • Try out each of them • Choose and combine two of the better alternatives • Rinse and repeat as necessary
  • 16. Bringing in DevOps • DevOps has data that can be used to train neural networks • Health of the application • Trends in application traffic and responsiveness • Application failure
  • 17. Machine Learning Helps DevOps • Decisions are complex • Why is the CPU maxed? • What is causing disk thrashing? • Why did the network slow? • Why did the application fail? • Data is massive • Potentially thousands of data points a day
  • 18. How Good Are Decisions? • Expert versus machine • Given the same data • In many domains they tie • With additional data, the human can be better • But machine learning will get better • But only as good as the data
  • 19. We Want to Do Two Things • Identify trends that may indicate future problems • Increasing response times • More page errors • Diagnose faults once they have happened • Why did the application fail? • How can we fix it as quickly as possible?
  • 20. Fixed Algorithms Work for Some Problems • Immediate performance and failure identification • Diagnosis of failures and performance issues • These are readily identifiable from known data
  • 21. Adaptive Systems Supplement These Tools • Predictions of future events • Performance • Availability • The target is moving • So we need current data to adjust the algorithms
  • 22. The Machine Helps the DevOps Expert • The machine learning app provides: • Early warning on possible performance issues and failures • Immediate notification of failure or impending failure • Trend analysis of data to predict unhealthy outcomes • The machine learning is an assistant • It can’t fix anything • It can’t necessarily identify the root cause
  • 23. What is the Goal? • We have many ways of monitoring • Many of them are represented at this conference • Each measures something a little different • Latency, response time, availability, network, DNS . . . • Too much data can be no better than no data at all • Machine learning can correlate across measurements • Focus to eliminate false positives
  • 24. Intelligent Systems Are Sometimes Wrong • The problem domain is ambiguous • There is no single “right” answer • “Close enough” is good • We don’t know quite why the software responds as it does • We can’t easily trace code paths
  • 25. Testing Machine Learning Systems • Have objective acceptance criteria • Test with new data • Don’t count on all results being accurate • Understand the architecture of the network as a part of the testing process • Communicate the level of confidence you have in the results to management and users
  • 26. A Cautionary Tale • All events are not created equal • AI systems treat events equally • A failure of a system during busy season is the same as any other • DevOps pros know otherwise • And can exert additional effort in response • And actually fix the problem • We can’t automate what we don’t understand • You need the human in the loop Confidential, Dynatrace LLC
  • 27. Conclusions • DevOps is a natural environment for machine learning systems • Any activity that generates data and requires a decision is fair game • Monitoring is low-hanging fruit • Fixed systems for failure and diagnosis, adaptive for trend analysis Confidential, Dynatrace LLC

Editor's Notes

  1. These types of software are becoming increasingly common, in areas such as ecommerce, public transportation, automotive, finance, and computer networks. They have the potential to make decisions given sufficiently well-defined inputs and goals. In some instances, they are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
  2. Most machine learning systems are based on neural networks. A neural network is a set of layered algorithms whose variables can be adjusted via a learning process. The learning process involves using known data inputs to create outputs that are then compared with known results. When the algorithms reflect the known results with the desired degree of accuracy, the algebraic coefficients are frozen and production code is generated. Today, this comprises much of what we understand as artificial intelligence.
  3. But there is a type of software where having a defined output is no longer the case. Actually, two types. One is machine learning systems. The second is predictive analytics, or adaptive systems.
  4. Have objective acceptance criteria. Know the amount of error you and your users are willing to accept. Test with new data. Once you’ve trained the network and frozen the architecture and coefficients, use fresh inputs and outputs to verify its accuracy. Don’t count on all results being accurate. That’s just the nature of the beast. And you may have to recommend throwing out the entire network architecture and starting over. Understand the architecture of the network as a part of the testing process. Few if any will be able to actually follow a set of inputs through the network of algorithms, but understanding how the network is constructed will help testers determine if another architecture might produce better results. Communicate the level of confidence you have in the results to management and users. Machine learning systems offer you the unique opportunity to describe confidence in statistical terms, so use them. One important thing to note is that the training data itself could well contain inaccuracies. In this case, because of measurement error, the recorded wind speed and direction could be off or ambiguous. In other cases, the cooling of the filament likely has some error in its measurement.