Powerful Google developer tools for immediate impact! (2023-24 C)
Eliciting Operations Requirements for Applications
1. NICTA Copyright 2012 From imagination to impact
Eliciting
Operations
Requirements for
Applications
L. Bass, R. Jeffrey, I. Weber,
H. Wada and L. Zhu
2. NICTA Copyright 2012 From imagination to impact
Operations Requirements
● "Through 2015, 80% of outages will be caused
by people and process issues. 50% are caused
by change, config and release" - Gartner
● Devs and Ops are (still) isolated but Ops are
important source of product requirements
○ Before unit-test, less attention paid to "testability"
○ In DevOps era, we should incorporate "operatability"
into products
● Making applications operation process aware!
○ But where requirements come from?
3. NICTA Copyright 2012 From imagination to impact
Overview of Our Study
● Studied sources of operations requirements and
discuss in the context of our spin-out
○ Operations personnel
○ Internal development efforts
○ Operations standards
○ Organizational process descriptions
○ Academic studies
● Model processes and the product
○ Verify if a product satisfies operations
requirements
4. NICTA Copyright 2012 From imagination to impact
Standards and Organizational Process
● Process standards, ISO 15504 or ITIL, are good
source but not specific enough to turn into
product requirements
● Organizational process descriptions tends to
provide more details
○ e.g., resource migration in Amazon Web Services [1]
● We found standards are useful to (1) implement
(automate) into a product, and (2) define a method
to validate the process by operators
[1] media.amazonwebservices.com/AWS_Migrate_Resources_To_New_Region.pdf
5. NICTA Copyright 2012 From imagination to impact
Example Operational Requirement
● CP-6 Alternate Storage Site, NIST 800-53
○ "The organization establishes an alternate storage
site including necessary agreements to permit the
storage and recovery of information system backup
information"
● Derived product requirement
○ "The product shall maintain backup in an alternate
storage site. The product shall provide a method to
assess the recoverability of the system"
● Actual implementation in our product
○ Setup a backup site and a schedule job as part of
product initialization. Otherwise, launch fails
○ Provide a report to assess the quality of backup (e.g,
timestamp, execution time, capacity of disk, ...)
6. NICTA Copyright 2012 From imagination to impact
Academic Studies
● Difference between the environment is the most
common source of upgrade problem [2]
○ Called "hidden dependencies" - incorrect file path,
incorrect network address, library conflict, ...
● Hidden dependencies is a useful list of product
requirements
● Actual implementation in our product
○ e.g., run dependency check at boot. Terminate the
app immediately to prevent fatal issues occurring
later (e.g., getting data corrupted)
○ Boot failure is easy to detect - make Ops happy
[2] T. Dumitras, "Why do upgrades fail and what can we do about it?: towards dependable, online
upgrades in enterprise system", Middleware 2009
7. NICTA Copyright 2012 From imagination to impact
Internal DevOps Experience
● Context: Our spin-out provides a SaaS solution
for replicating resources in AWS
● Issue: Expensive to clean up resources
○ Tests
○ Handle unexpected failures
● "undo" functionality to revert the resource status
to a certain point [3]
○ Easy to run tests
○ Easy to clean up the mess
[3] I. Weber, et. al. "Automatic undo for cloud management via AI planning," HotDep'12
8. NICTA Copyright 2012 From imagination to impact
Towards the formal validation
● Incorporating Ops requirements into
development/product is useful; however, how to
verify the implementation is correct?
● Our on-going work - modeling process and
product together
○ Does the product satisfy ops requirements?
○ The process operates the product as required?
9. NICTA Copyright 2012 From imagination to impact
Example
● Model the mixed-
version upgrading
process
● Version conflict
between clients and
servers over long
running process
● We're evaluating this
method in a real
system
10. NICTA Copyright 2012 From imagination to impact
Conclusion
● Operations including release are a large source
of outages
● To improve the "operatability" of products, we
studied operations requirements
● Future work: validate whether the "operatabiliy"
is satisfied by implementations?