Treating operational aspects of software as 'non-functional requirements' and 'an Ops problem' rather than a core part of the software product leads to poor live service and unexplained errors in Production.
Traceability, deployability, recoverability, diagnosability, monitorability, and high quality logging are key features of a software system, along with user-visible features surfaced via the UI, or a capability of an API endpoint.
However, many Product Owners understandably feel uneasy about taking on the (necessary) responsibility for prioritising operational features alongside user-visible and API features.
This session brings Scrum Masters and Product Owners up to speed on operational features and covers proven practices for improving operability in an Agile context, empowering Product Owners to make effective prioritisation choices about all kinds of product features, whether user-visible or operational.
How to address operational aspects effectively with Agile practices - Matthew Skelton - Agile In The City 2015
1. How to address operational aspects
effectively with Agile practices
Agile in the City – 20th November 2015
#agileinthecity
Matthew Skelton
Skelton Thatcher Consulting
@matthewpskelton
132. Run Book / Ops Manual
• 1 Table of Contents
• 2 System Overview
• 2.1 Service Overview
• 2.2 Contributing Applications, Daemons, and Windows Services
• 2.3 Hours of Operation
• 2.4 Execution Design
• 2.5 Infrastructure and Network Design
• 2.6 Resilience, Fault Tolerance and High-Availability
• 2.7 Throttling and Partial Shutdown
• 2.8 Required Resources
• 2.9 Expected Traffic and Load
• 2.9.1 Hot or Peak Periods
• 2.9.2 Warm Periods
• 2.9.3 Cool or Quiet Periods
• 2.10 Environmental Differences
• 2.11 Tools
• 3 Security and Access Control
• 4 System Configuration
• 4.1 Configuration Management
• 5 System Backup and Restore
• 5.1 Backup Requirements
• 5.1.1 Special Files
• 5.2 Backup Procedures
• 5.3 Restore Procedures
• 6 Monitoring and Alerting
• 6.1 Error Messages
• 6.2 Events
• 6.3 Health Checks
• 6.4 Other Messages
• 7 Operational Tasks
• 7.1 Deployment
• 7.2 Batch Processing
• 7.3 Power Procedures
• 7.4 Routine Checks
• 7.4.1 System Rebuilds
• 7.5 Troubleshooting
• 8 Maintenance Tasks
• 8.1 Maintenance Procedures
• 8.1.1 Patching
• 8.1.1.1 Normal Cycle
• 8.1.1.2 Zero-Day Vulnerabilities
• 8.1.2 GMT/BST time changes
• 8.1.3 Cleardown Activities
• 8.1.3.1 Log Rotation
• 8.2 Testing
• 8.2.1 Technical Testing
• 8.2.2 Post-Deployment
• 9 Failure and Recovery Procedures
• 9.1 Failover
• 9.2 Recovery
• 9.3 Troubleshooting Failover and Recovery
• 10 Contact Details
133. Run Book / Ops Manual
2.1 Service Overview
2.2 Contributing
Applications,
Daemons, and
Windows Services
2.3 Hours of
Operation
2.4 Execution Design
2.5 Infrastructure and
Network Design
2.6 Resilience, Fault
Tolerance and High-
Availability
2.7 Throttling and
Partial Shutdown
2.8 Required
Resources
2.9 Expected Traffic
and Load
134. Run Book collaboration
Dev team is responsible for the first draft
“But I know nothing about Production!”
Encourages collaboration with Ops team
146. small set of rapid ‘weathervane’
tests for early warning
147. Network testing
iTrinegy network emulators
•Scripted setup and automated test runs
•http://www.itrinegy.com/
Saboteur:
•Network fault injection tool
•https://github.com/tomakehurst/saboteur
148. Security testing
Gauntlt: http://gauntlt.org/
SSL certs
HTTP
SQL injection
…
# nmap-simple.attack
Feature: simple nmap attack to check for open ports
Background:
Given "nmap" is installed
And the following profile:
| name | value |
| hostname | example.com |
Scenario: Check standard web ports
When I launch an "nmap" attack with:
"""
nmap -F <hostname>
"""
Then the output should match /80.tcps+open/
Then the output should not match:
"""
25/tcps+open
"""
149. When I launch an "nmap" attack with:
"""
nmap -F <hostname>
"""
Then the output should match
/80.tcps+open/