SlideShare a Scribd company logo
1 of 37
The Quality
                                 Attribute of
                                Upgradability
                                  Len Bass with
                       Hiroshi Wada, Ingo Weber, Liming Zhu,
                                   Ross Jeffery




NICTA Copyright 2012        From imagination to impact
About NICTA

National ICT Australia

    • Federal and state funded research
      company established in 2002
    • Largest ICT research resource in
      Australia
    • National impact is an important
      success metric
    • ~700 staff/students working in 5 labs
      across major capital cities
    • 7 university partners                                     NICTA technology is
    • Providing R&D services, knowledge                       in over 1 billion mobile
      transfer to Australian (and global) ICT                                  phones
      industry


                                                                               2
 NICTA Copyright 2012            From imagination to impact
Consider the follow sequence.
• You have prepared an upgrade to an existing large
  enterprise system
   – You have coded it
   – You have tested it
   – It is ready!!
• Alternatively, the IT department (or you) get a package
  from a third party – a vendor or open source – that has
  been coded and tested.
• What happens then?




NICTA Copyright 2012   From imagination to impact           3
Consider the follow sequence.
• You have prepared an upgrade to an existing large
  enterprise system
   – You have coded it
   – You have tested it
   – It is ready!!
• Alternatively, the IT department (or you) get a package
  from a third party – a vendor or open source – that has
  been coded and tested.
• What happens then?
   – ~10% of the time the upgrade will fail.




NICTA Copyright 2012   From imagination to impact           4
This is the upgradability problem
• How do we make upgrading a system less
  problematic?
• Talk outline
        – Characteristics of the upgrade problem
        – FMEA analysis
               • Possible causes of failure
               • Failure prevention, detection, and recovery
        – Relation to existing product and process quality work




NICTA Copyright 2012              From imagination to impact      5
Upgrades to enterprise systems are a very
common occurrence
Upgrade frequency of some common systems


              Application                     Average release interval
              Facebook (platform)             < 7 days
              Google Docs                     <50 days
              Media Wiki                      21 days
              Joomla                          30 days



This frequency would suggest it is important to get
the upgrades correct

NICTA Copyright 2012                From imagination to impact           6
Unfortunately, Upgrades Fail Often
• 4.6-10 component failures each month in three
  large-scale Internet services. Mostly during
  regular maintenance
• Average and maximum failure rates from a
  survey of systems administrators are 8.6% and
  50%.
• Some claim that user visible failures from
  upgrade outweigh user visible failures from
  software errors.



NICTA Copyright 2012   From imagination to impact   7
Why is this?
• Installation is complicated.
        – Installation guides for SAS 9.3 Intelligence, IBM i, Oracle 11g for
          Linux are ~250 pages each
        – Apache description of addresses and ports (one out of 16
          descriptions) has following elements:
            • Choosing and specifying ports for the server to listen to
            • IPv4 and IPv6
            • Protocols
            • Virtual Hosts
        – The number of configuration options that must be set can be
          large
               • Hadoop has 206 options
               • Hbase has 64
        – Many dependencies are not visible until execution


NICTA Copyright 2012                From imagination to impact              8
Provides Research Agenda
• Indeed, the surprise is not that upgrades fail
  8.6% of the time but that they are successful
  91.4% of the time.

• Rich area for research.




NICTA Copyright 2012   From imagination to impact   9
What kind of problem is this - product?
• ISO 25010 provides
        – A quality in use model composed of five
          characteristics (some of which are further subdivided
          into subcharacteristics) that relate to the outcome of
          interaction when a product is used in a particular
          context of use.
        – I.e. is upgradability a quality of the system being
          upgraded?
• The answer is yes.




NICTA Copyright 2012       From imagination to impact              10
What kind of problem is this – process?
• ITIL (Information Technology Infrastructure Library)
   – Change Management aims to ensure that
     standardised methods and procedures are used for
     efficient handling of all changes.
• SPICE – ISO 15504
   – process assessment provides the means of
     characterizing the current practice within an
     organizational unit in terms of the capability of the
     selected processes.
• Is upgradability of quality of the process used to manage
  information technology?
• The answer is yes.

NICTA Copyright 2012   From imagination to impact         11
Upgradability is a hybrid quality problem
• A hybrid quality problem is one in which
  improvement involves both product and process
  and in which the product has process
  awareness.
• Many product centered conferences
     – Dependability
     – Security
     –…
• Some process centered conferences
       – Software Process Improvement
       – SPICE
       – SPEG
       –…
NICTA Copyright 2012     From imagination to impact   12
Hybrid quality improvement is not well
served by the academic community
• Hybrid quality improvement – as we shall see – involves
  close interaction between product, process and tools to
  support the process.
• Venues that should emphasize this interaction include
   – Profes (Product focused Software Development and
     Process Improvement)
   – ASQ (Conference on Quality and Improvement)
• Yet an examination of the CFPs and proceedings for
  these conferences shows a distinction between process
  activities and product characteristics
• We will present the results of a FMEA (Failure Mode and
  Effects Analysis) style analysis for upgradability and then
  return to the hybrid quality issue
NICTA Copyright 2012   From imagination to impact          13
FMEA
• Failure Modes and Effect Analysis is an
  inductive failure analysis for analysis of failure
  modes.
• FMEA involves describing
        – Potential failure modes
        – The severity and likelihood of these failures.
• We will focus on the first portion and generate
  the potential failure modes as well as potential
  prevention, detection, and recovery from these
  failures.
• I.e. we are performing an FMEA style
  analysis, not an FMEA, per se.
NICTA Copyright 2012        From imagination to impact     14
Scenario for Upgradability
• We are concerned with the following scenario
        – Version N+1 of an enterprise system is available for
          deployment.
               • Version N+1 can be deployed by developers
               • Version N+1 can be deployed by the Information Technology
                 Department (The Release Manager if there is one).
        – Version N+1 is completely coded and tested by its
          developers.
• Measures can include
        – Downtime
        – Resources (hardware or personnel) required to
          perform the upgrade
        – Number of failed attempts to install upgrade
NICTA Copyright 2012             From imagination to impact              15
Fundamental goals during upgrade
• The literature identifies four fundamental goals
  while upgrade is occurring.
        – Efficiently manage resources
        – Completely and correctly specify configurations
        – Manage multiple versions to avoid problems with
          version mismatch.
        – Maintain consistency of persistent data.
• Failures are caused by the violation of one of
  these fundamental goals.
        – Our FMEA analysis will look at potential causes for
          violations of one of these goals.


NICTA Copyright 2012       From imagination to impact           16
Activities during an upgrade of a system
• Make the upgrade available.
• Prepare the environment. Ensure that there are
  sufficient resources available for installation and
  that assumed software is available.
• Configuration
• Deployment
• Activation




NICTA Copyright 2012   From imagination to impact   17
Organization of next portion of the
presentation
• For each activity
       ˗ Potential fault (a fault is a failure in waiting)
       ˗ Prevention of the fault
       ˗ Detection of the fault
       ˗ Correction of the fault


• Research opportunity
       • Blank cell
       • Cell with only partial coverage


NICTA Copyright 2012         From imagination to impact      18
Make Upgrade available
   Fault possibility              Prevention                   Detection        Recovery
   Element omitted/included                                    Manifest         Recreate
   incorrectly in installing                                   Bill of lading   distribution
   software


   System corrupted      during                                Hash code,       Retransmit
   movement                                                    checksum


   Source of distribution from                                 Digital
   an untrusted site                                           signature


   Forgotten/misplaced                                                          Separate secret
   credentials                                                                  Independent
                                                                                channel for new
                                                                                credentials

   Credential verifier                                                          Codify
   unavailable                                                                  acceptable
                                                                                credentials in
                                                                                distribution
NICTA Copyright 2012              From imagination to impact                                     19
Prepare environment
   Fault possibility                 Prevention                   Detection        Recovery
   Incorrect versions of support     Include version              Encode hash of
   libraries                         number in                    APIs
                                     specification
                                     Utilize services
                                     to announce
                                     incompatibilities
   Multiple versions of support      Include version
   libraries simultaneously required number in name
                                     Libraries expose
                                     version numbers
                                     Linkers version
                                     aware
   Insufficient resources            Rolling Upgrade



   Schema modification on            Convert data to
   database                          new schema
                                     prior to upgrade




NICTA Copyright 2012                 From imagination to impact                               20
Configuration
     Fault possibility      Prevention                     Detection      Recovery

   Missing parameter       Parameter
                           database
                           Parameter built
                           into tool
                           Static analysis
                           of code


   Incorrectly specified   Abstract                       Check
   parameter               specification                  syntax
                                                          Validate
                                                          against a
                                                          specification


   Inconsistent                                           Constraint
   parameters                                             checker
NICTA Copyright 2012         From imagination to impact                              21
Deployment
 Fault possibility        Prevention                Detection   Recovery

 Insufficient resources   Pre-allocate
                          during
                          preparation
                          Rolling
                          upgrade



 Inconsistent hardware    Verify during
                          preparation


 Operator error                                                 Undo
                                                                mechanism



NICTA Copyright 2012         From imagination to impact                     22
Activation
    Fault possibility   Prevention           Detection    Recovery

    Discovered hidden                        Monitoring   Recovery
    dependency                                            block
    Multiple            Separation     Version
    simultaneous        Dynamic        aware code
    versions            Software       and data
                        Update
                        Automatic
                        translation of
                        data when
                        old schema is
                        used
                        Version
                        aware load
                        balancer



NICTA Copyright 2012         From imagination to impact              23
Our activities in this space so far (green
cells)
• Mixed version race condition solution
• Operator undo




NICTA Copyright 2012   From imagination to impact   24
What is the “mixed version race condition”
• Common practice when pushing an upgrade to a
  large number of servers is to perform the
  upgrades one (or several) servers at a time
• This means that version N+1 (the new version)
  will be available on some servers and version N
  (the old version) will be available on other
  servers.
• Suppose version N+1 has functionality not
  available in version N



NICTA Copyright 2012   From imagination to impact   25
Now consider the following sequence
1. A client (browser) issues a request that is
   routed by the load balancer to an instance of
   version N+1
2. Version N+1 sends JavaScript assuming new
   functionality back to the client.
3. Client sends an AJAX request that utilizes new
   functionality and the load balancer routes it to
   an instance of version N.
4. Error because version N does not have the new
   functionality.


NICTA Copyright 2012   From imagination to impact   26
Mixed Version Race Condition
             Client (browser)                                      Server
                                                                       1 Start
                                                                            rolling
                                                                            upgrade
                       2
                                       Initial request
                           HTTP reply with                                  New
                           embedded JavaScript                          3   Version

                       4                 AJAX callback
                                                                            Old
                                                                       5    Version

                                                                     X ERROR




NICTA Copyright 2012                  From imagination to impact                      27
What does the solution involve?
1. Label communication between instances and
   the client with version information
2. Modify load balancer so that messages are
   routed to an appropriate version
3. Modify load balancer so that messages are
   balanced across all child instances.




NICTA Copyright 2012   From imagination to impact   28
Why is this a hard problem?
• Large installations have multiple distributed load balancers that must
  be kept in synch. I.e. some load balancers may know about new
  version and some may not
• Not enough to put version number in message
        – Suppose second request goes to a load balancer that does not yet know about
          version N+1.
• Must keep messages balanced so that all servers handle roughly
  the same number of requests.

                                 /service                                      /service




                   /service/vN                    /service/vN+1                     /service/vN




            server         server           server            server       server           server

NICTA Copyright 2012                          From imagination to impact                             29
Operator undo
• After perofmring an operation in AWS, may want
  to go back to original state – i.e. Undo the
  operation
• Not always that straight-forward:
        – Attaching volume is no problem while the instance is
          running, detaching might be problematic
        – Creating / changing auto-scaling rules has effect on
          number of running instances
               • Cannot terminate additional instances, as the rule would
                 create new ones!
        – Deleted / terminated / released resources are gone!


NICTA Copyright 2012              From imagination to impact                30
Undo for System Operators
                       Administrator




                                 begin-                 do
                                                       do
                                                      do                 rollback
                              transaction




              + commit
              + pseudo-delete




NICTA Copyright 2012                        From imagination to impact              31
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                     rollback
                              transaction




                             Sense cloud                                     Sense cloud
                           resources states                                resources states




                       Undo System

NICTA Copyright 2012                          From imagination to impact                      32
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                            rollback
                              transaction




                             Sense cloud                                        Sense cloud
                           resources states                                   resources states




                                                                Goal
                                                                 Goal      Initial
                                                                            Initial
                                                                state
                                                                state      state
                                                                            state




                       Undo System

NICTA Copyright 2012                          From imagination to impact                         33
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                            rollback
                              transaction




                             Sense cloud                                        Sense cloud
                           resources states                                   resources states




                                                                Goal
                                                                 Goal      Initial
                                                                            Initial               Set of
                                                                                                  Set of
                                                                state
                                                                state      state
                                                                            state                actions
                                                                                                 actions




                                 Execute         Generate code                         Plan



                       Undo System

NICTA Copyright 2012                          From imagination to impact                                   34
Upgradability as a process&product quality
• Architecture of the system being upgraded can
  affect the process of installation
        – Suppose the system checks for version information
          from dependent libraries. Then the process must
          encompass descriptions of what to do if an error
          condition occurs.
• Process of upgrade can affect the architecture of
  the product.
        – Suppose the process is supported by a tool that
          checks the health of the installation of version N+1.
          Then the system must make visible the information
          used by this tool.

NICTA Copyright 2012        From imagination to impact            35
Summary
• Upgrade is an important problem
        – Upgrade failures affect user satisfaction
        – Upgrade failures happen frequently
• Upgrade involves the interaction of product and
  process quality issues.
        – Communities are focussed on improving the quality of
          the process or the product. Not the joint
          process/product quality.
• Multiple opportunities for research exist.



NICTA Copyright 2012        From imagination to impact       36
Q&A


                       Thank You!


Research study opportunities in dependable cloud computing:
• Software Architecture
• Data Management
• Performance Engineering
• Autonomic Computing

 To find out more, send your CV and undergraduate details to
                    students@nicta.com.au
NICTA Copyright 2012   From imagination to impact        37

More Related Content

What's hot

A Software Engineering Perspective on SDN Programmability
A Software Engineering Perspective on SDN ProgrammabilityA Software Engineering Perspective on SDN Programmability
A Software Engineering Perspective on SDN ProgrammabilityFelipe Alencar
 
Securing Your Infrastructure: Identity Management and Data Protection
Securing Your Infrastructure: Identity Management and Data ProtectionSecuring Your Infrastructure: Identity Management and Data Protection
Securing Your Infrastructure: Identity Management and Data ProtectionLumension
 
InterVision
InterVisionInterVision
InterVisionqtomlin
 
Chuck_Roden_Resume
Chuck_Roden_ResumeChuck_Roden_Resume
Chuck_Roden_ResumeChuck Roden
 
Chuck_Roden_Resume
Chuck_Roden_ResumeChuck_Roden_Resume
Chuck_Roden_ResumeChuck Roden
 
Sally godfreyheatherrarick
Sally godfreyheatherrarickSally godfreyheatherrarick
Sally godfreyheatherrarickNASAPMC
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMBoni Bruno
 
Troubleshooting the Most Common Citrix Complaints for Remote Workers
Troubleshooting the Most Common Citrix Complaints for Remote WorkersTroubleshooting the Most Common Citrix Complaints for Remote Workers
Troubleshooting the Most Common Citrix Complaints for Remote WorkerseG Innovations
 
Softchoice Webinar: IBM PureSystems launch
 Softchoice Webinar: IBM PureSystems launch Softchoice Webinar: IBM PureSystems launch
Softchoice Webinar: IBM PureSystems launchSoftchoice Corporation
 
Continuous Integration and Deployment on Rational Development and Test Enviro...
Continuous Integration and Deployment on Rational Development and Test Enviro...Continuous Integration and Deployment on Rational Development and Test Enviro...
Continuous Integration and Deployment on Rational Development and Test Enviro...DevOps for Enterprise Systems
 
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Matthew Skelton
 
An Easy To Deploy Penetration Testing Platform
An Easy To Deploy Penetration Testing PlatformAn Easy To Deploy Penetration Testing Platform
An Easy To Deploy Penetration Testing PlatformBo-Chun Peng
 
PLM World Conference 2007
PLM World Conference 2007PLM World Conference 2007
PLM World Conference 2007Matt Tremmel
 
Software/System Development Life Cycle
Software/System Development Life CycleSoftware/System Development Life Cycle
Software/System Development Life CycleHem Pokhrel
 
E2 Manage Tech Design Implementation General 2010
E2 Manage Tech Design Implementation General 2010E2 Manage Tech Design Implementation General 2010
E2 Manage Tech Design Implementation General 2010bdwwork
 

What's hot (20)

A Software Engineering Perspective on SDN Programmability
A Software Engineering Perspective on SDN ProgrammabilityA Software Engineering Perspective on SDN Programmability
A Software Engineering Perspective on SDN Programmability
 
Securing Your Infrastructure: Identity Management and Data Protection
Securing Your Infrastructure: Identity Management and Data ProtectionSecuring Your Infrastructure: Identity Management and Data Protection
Securing Your Infrastructure: Identity Management and Data Protection
 
Job Postings
Job PostingsJob Postings
Job Postings
 
InterVision
InterVisionInterVision
InterVision
 
Chuck_Roden_Resume
Chuck_Roden_ResumeChuck_Roden_Resume
Chuck_Roden_Resume
 
CISQ Introduction & Objectives - Dr. Bill Curtis
CISQ Introduction & Objectives - Dr. Bill CurtisCISQ Introduction & Objectives - Dr. Bill Curtis
CISQ Introduction & Objectives - Dr. Bill Curtis
 
Chuck_Roden_Resume
Chuck_Roden_ResumeChuck_Roden_Resume
Chuck_Roden_Resume
 
Workshop APM in a Cloud & Virtualized environment
Workshop APM in a Cloud & Virtualized environmentWorkshop APM in a Cloud & Virtualized environment
Workshop APM in a Cloud & Virtualized environment
 
Sally godfreyheatherrarick
Sally godfreyheatherrarickSally godfreyheatherrarick
Sally godfreyheatherrarick
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APM
 
Troubleshooting the Most Common Citrix Complaints for Remote Workers
Troubleshooting the Most Common Citrix Complaints for Remote WorkersTroubleshooting the Most Common Citrix Complaints for Remote Workers
Troubleshooting the Most Common Citrix Complaints for Remote Workers
 
Softchoice Webinar: IBM PureSystems launch
 Softchoice Webinar: IBM PureSystems launch Softchoice Webinar: IBM PureSystems launch
Softchoice Webinar: IBM PureSystems launch
 
Continuous Integration and Deployment on Rational Development and Test Enviro...
Continuous Integration and Deployment on Rational Development and Test Enviro...Continuous Integration and Deployment on Rational Development and Test Enviro...
Continuous Integration and Deployment on Rational Development and Test Enviro...
 
Akant_Kukreja
Akant_KukrejaAkant_Kukreja
Akant_Kukreja
 
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
 
An Easy To Deploy Penetration Testing Platform
An Easy To Deploy Penetration Testing PlatformAn Easy To Deploy Penetration Testing Platform
An Easy To Deploy Penetration Testing Platform
 
PLM World Conference 2007
PLM World Conference 2007PLM World Conference 2007
PLM World Conference 2007
 
Software/System Development Life Cycle
Software/System Development Life CycleSoftware/System Development Life Cycle
Software/System Development Life Cycle
 
Resume
ResumeResume
Resume
 
E2 Manage Tech Design Implementation General 2010
E2 Manage Tech Design Implementation General 2010E2 Manage Tech Design Implementation General 2010
E2 Manage Tech Design Implementation General 2010
 

Similar to Upgradability: A Hybrid Quality Problem

AMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMAMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMMatt Wright
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - finalAndrew White
 
A DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleA DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleSanjeev Sharma
 
Introduction to Software Evolution: The Software Volcano
Introduction to Software Evolution: The Software VolcanoIntroduction to Software Evolution: The Software Volcano
Introduction to Software Evolution: The Software VolcanoDevnology
 
Principles of software architecture design
Principles of software architecture designPrinciples of software architecture design
Principles of software architecture designLen Bass
 
Operationalize all the Network Things
Operationalize all the Network ThingsOperationalize all the Network Things
Operationalize all the Network ThingsF5 Networks
 
Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery AutomationIl paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery AutomationHP Enterprise Italia
 
devops in iot solution development final
devops in iot solution development finaldevops in iot solution development final
devops in iot solution development finalSPIN Chennai
 
Intoduction to software engineering part 1
Intoduction to software engineering part 1Intoduction to software engineering part 1
Intoduction to software engineering part 1Rupesh Vaishnav
 
Enterprise Dev Ops At Scale
Enterprise Dev Ops At ScaleEnterprise Dev Ops At Scale
Enterprise Dev Ops At ScaleWesley Pullen
 
VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011asedha
 
Cyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemCyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemRogue Wave Software
 
Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery Automation Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery Automation HP Enterprise Italia
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science PlatformDecision Science Community
 
Challenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryChallenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryDBmaestro - Database DevOps
 
Error in hadoop
Error in hadoopError in hadoop
Error in hadoopLen Bass
 
Pressman ch-1-software
Pressman ch-1-softwarePressman ch-1-software
Pressman ch-1-softwareAlenaDion
 
Unit_1(Software and Software Engineering).pptx
Unit_1(Software and Software Engineering).pptxUnit_1(Software and Software Engineering).pptx
Unit_1(Software and Software Engineering).pptxtaxegap762
 

Similar to Upgradability: A Hybrid Quality Problem (20)

AMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMAMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPM
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
 
A DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleA DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scale
 
Introduction to Software Evolution: The Software Volcano
Introduction to Software Evolution: The Software VolcanoIntroduction to Software Evolution: The Software Volcano
Introduction to Software Evolution: The Software Volcano
 
Principles of software architecture design
Principles of software architecture designPrinciples of software architecture design
Principles of software architecture design
 
Operationalize all the Network Things
Operationalize all the Network ThingsOperationalize all the Network Things
Operationalize all the Network Things
 
Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery AutomationIl paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery Automation
 
devops in iot solution development final
devops in iot solution development finaldevops in iot solution development final
devops in iot solution development final
 
Software Lifecycle
Software LifecycleSoftware Lifecycle
Software Lifecycle
 
Intoduction to software engineering part 1
Intoduction to software engineering part 1Intoduction to software engineering part 1
Intoduction to software engineering part 1
 
Enterprise Dev Ops At Scale
Enterprise Dev Ops At ScaleEnterprise Dev Ops At Scale
Enterprise Dev Ops At Scale
 
VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011
 
Cyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemCyber security - It starts with the embedded system
Cyber security - It starts with the embedded system
 
Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery Automation Il paradigma DevOps e Continuous Delivery Automation
Il paradigma DevOps e Continuous Delivery Automation
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
Challenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryChallenges and best practices of database continuous delivery
Challenges and best practices of database continuous delivery
 
Error in hadoop
Error in hadoopError in hadoop
Error in hadoop
 
Pressman ch-1-software
Pressman ch-1-softwarePressman ch-1-software
Pressman ch-1-software
 
Tell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you areTell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you are
 
Unit_1(Software and Software Engineering).pptx
Unit_1(Software and Software Engineering).pptxUnit_1(Software and Software Engineering).pptx
Unit_1(Software and Software Engineering).pptx
 

More from Len Bass

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabusLen Bass
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020Len Bass
 
11 secure development
11  secure development 11  secure development
11 secure development Len Bass
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery Len Bass
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction Len Bass
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline Len Bass
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management Len Bass
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architectureLen Bass
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure securityLen Bass
 
4 container management
4  container management4  container management
4 container managementLen Bass
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud Len Bass
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machinesLen Bass
 
2 networking
2 networking2 networking
2 networkingLen Bass
 
Quantum talk
Quantum talkQuantum talk
Quantum talkLen Bass
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorialLen Bass
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devopsLen Bass
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchainsLen Bass
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchainLen Bass
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipelineLen Bass
 

More from Len Bass (20)

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabus
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020
 
11 secure development
11  secure development 11  secure development
11 secure development
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure security
 
4 container management
4  container management4  container management
4 container management
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machines
 
2 networking
2 networking2 networking
2 networking
 
Quantum talk
Quantum talkQuantum talk
Quantum talk
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorial
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devops
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchains
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchain
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipeline
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Upgradability: A Hybrid Quality Problem

  • 1. The Quality Attribute of Upgradability Len Bass with Hiroshi Wada, Ingo Weber, Liming Zhu, Ross Jeffery NICTA Copyright 2012 From imagination to impact
  • 2. About NICTA National ICT Australia • Federal and state funded research company established in 2002 • Largest ICT research resource in Australia • National impact is an important success metric • ~700 staff/students working in 5 labs across major capital cities • 7 university partners NICTA technology is • Providing R&D services, knowledge in over 1 billion mobile transfer to Australian (and global) ICT phones industry 2 NICTA Copyright 2012 From imagination to impact
  • 3. Consider the follow sequence. • You have prepared an upgrade to an existing large enterprise system – You have coded it – You have tested it – It is ready!! • Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested. • What happens then? NICTA Copyright 2012 From imagination to impact 3
  • 4. Consider the follow sequence. • You have prepared an upgrade to an existing large enterprise system – You have coded it – You have tested it – It is ready!! • Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested. • What happens then? – ~10% of the time the upgrade will fail. NICTA Copyright 2012 From imagination to impact 4
  • 5. This is the upgradability problem • How do we make upgrading a system less problematic? • Talk outline – Characteristics of the upgrade problem – FMEA analysis • Possible causes of failure • Failure prevention, detection, and recovery – Relation to existing product and process quality work NICTA Copyright 2012 From imagination to impact 5
  • 6. Upgrades to enterprise systems are a very common occurrence Upgrade frequency of some common systems Application Average release interval Facebook (platform) < 7 days Google Docs <50 days Media Wiki 21 days Joomla 30 days This frequency would suggest it is important to get the upgrades correct NICTA Copyright 2012 From imagination to impact 6
  • 7. Unfortunately, Upgrades Fail Often • 4.6-10 component failures each month in three large-scale Internet services. Mostly during regular maintenance • Average and maximum failure rates from a survey of systems administrators are 8.6% and 50%. • Some claim that user visible failures from upgrade outweigh user visible failures from software errors. NICTA Copyright 2012 From imagination to impact 7
  • 8. Why is this? • Installation is complicated. – Installation guides for SAS 9.3 Intelligence, IBM i, Oracle 11g for Linux are ~250 pages each – Apache description of addresses and ports (one out of 16 descriptions) has following elements: • Choosing and specifying ports for the server to listen to • IPv4 and IPv6 • Protocols • Virtual Hosts – The number of configuration options that must be set can be large • Hadoop has 206 options • Hbase has 64 – Many dependencies are not visible until execution NICTA Copyright 2012 From imagination to impact 8
  • 9. Provides Research Agenda • Indeed, the surprise is not that upgrades fail 8.6% of the time but that they are successful 91.4% of the time. • Rich area for research. NICTA Copyright 2012 From imagination to impact 9
  • 10. What kind of problem is this - product? • ISO 25010 provides – A quality in use model composed of five characteristics (some of which are further subdivided into subcharacteristics) that relate to the outcome of interaction when a product is used in a particular context of use. – I.e. is upgradability a quality of the system being upgraded? • The answer is yes. NICTA Copyright 2012 From imagination to impact 10
  • 11. What kind of problem is this – process? • ITIL (Information Technology Infrastructure Library) – Change Management aims to ensure that standardised methods and procedures are used for efficient handling of all changes. • SPICE – ISO 15504 – process assessment provides the means of characterizing the current practice within an organizational unit in terms of the capability of the selected processes. • Is upgradability of quality of the process used to manage information technology? • The answer is yes. NICTA Copyright 2012 From imagination to impact 11
  • 12. Upgradability is a hybrid quality problem • A hybrid quality problem is one in which improvement involves both product and process and in which the product has process awareness. • Many product centered conferences – Dependability – Security –… • Some process centered conferences – Software Process Improvement – SPICE – SPEG –… NICTA Copyright 2012 From imagination to impact 12
  • 13. Hybrid quality improvement is not well served by the academic community • Hybrid quality improvement – as we shall see – involves close interaction between product, process and tools to support the process. • Venues that should emphasize this interaction include – Profes (Product focused Software Development and Process Improvement) – ASQ (Conference on Quality and Improvement) • Yet an examination of the CFPs and proceedings for these conferences shows a distinction between process activities and product characteristics • We will present the results of a FMEA (Failure Mode and Effects Analysis) style analysis for upgradability and then return to the hybrid quality issue NICTA Copyright 2012 From imagination to impact 13
  • 14. FMEA • Failure Modes and Effect Analysis is an inductive failure analysis for analysis of failure modes. • FMEA involves describing – Potential failure modes – The severity and likelihood of these failures. • We will focus on the first portion and generate the potential failure modes as well as potential prevention, detection, and recovery from these failures. • I.e. we are performing an FMEA style analysis, not an FMEA, per se. NICTA Copyright 2012 From imagination to impact 14
  • 15. Scenario for Upgradability • We are concerned with the following scenario – Version N+1 of an enterprise system is available for deployment. • Version N+1 can be deployed by developers • Version N+1 can be deployed by the Information Technology Department (The Release Manager if there is one). – Version N+1 is completely coded and tested by its developers. • Measures can include – Downtime – Resources (hardware or personnel) required to perform the upgrade – Number of failed attempts to install upgrade NICTA Copyright 2012 From imagination to impact 15
  • 16. Fundamental goals during upgrade • The literature identifies four fundamental goals while upgrade is occurring. – Efficiently manage resources – Completely and correctly specify configurations – Manage multiple versions to avoid problems with version mismatch. – Maintain consistency of persistent data. • Failures are caused by the violation of one of these fundamental goals. – Our FMEA analysis will look at potential causes for violations of one of these goals. NICTA Copyright 2012 From imagination to impact 16
  • 17. Activities during an upgrade of a system • Make the upgrade available. • Prepare the environment. Ensure that there are sufficient resources available for installation and that assumed software is available. • Configuration • Deployment • Activation NICTA Copyright 2012 From imagination to impact 17
  • 18. Organization of next portion of the presentation • For each activity ˗ Potential fault (a fault is a failure in waiting) ˗ Prevention of the fault ˗ Detection of the fault ˗ Correction of the fault • Research opportunity • Blank cell • Cell with only partial coverage NICTA Copyright 2012 From imagination to impact 18
  • 19. Make Upgrade available Fault possibility Prevention Detection Recovery Element omitted/included Manifest Recreate incorrectly in installing Bill of lading distribution software System corrupted during Hash code, Retransmit movement checksum Source of distribution from Digital an untrusted site signature Forgotten/misplaced Separate secret credentials Independent channel for new credentials Credential verifier Codify unavailable acceptable credentials in distribution NICTA Copyright 2012 From imagination to impact 19
  • 20. Prepare environment Fault possibility Prevention Detection Recovery Incorrect versions of support Include version Encode hash of libraries number in APIs specification Utilize services to announce incompatibilities Multiple versions of support Include version libraries simultaneously required number in name Libraries expose version numbers Linkers version aware Insufficient resources Rolling Upgrade Schema modification on Convert data to database new schema prior to upgrade NICTA Copyright 2012 From imagination to impact 20
  • 21. Configuration Fault possibility Prevention Detection Recovery Missing parameter Parameter database Parameter built into tool Static analysis of code Incorrectly specified Abstract Check parameter specification syntax Validate against a specification Inconsistent Constraint parameters checker NICTA Copyright 2012 From imagination to impact 21
  • 22. Deployment Fault possibility Prevention Detection Recovery Insufficient resources Pre-allocate during preparation Rolling upgrade Inconsistent hardware Verify during preparation Operator error Undo mechanism NICTA Copyright 2012 From imagination to impact 22
  • 23. Activation Fault possibility Prevention Detection Recovery Discovered hidden Monitoring Recovery dependency block Multiple Separation Version simultaneous Dynamic aware code versions Software and data Update Automatic translation of data when old schema is used Version aware load balancer NICTA Copyright 2012 From imagination to impact 23
  • 24. Our activities in this space so far (green cells) • Mixed version race condition solution • Operator undo NICTA Copyright 2012 From imagination to impact 24
  • 25. What is the “mixed version race condition” • Common practice when pushing an upgrade to a large number of servers is to perform the upgrades one (or several) servers at a time • This means that version N+1 (the new version) will be available on some servers and version N (the old version) will be available on other servers. • Suppose version N+1 has functionality not available in version N NICTA Copyright 2012 From imagination to impact 25
  • 26. Now consider the following sequence 1. A client (browser) issues a request that is routed by the load balancer to an instance of version N+1 2. Version N+1 sends JavaScript assuming new functionality back to the client. 3. Client sends an AJAX request that utilizes new functionality and the load balancer routes it to an instance of version N. 4. Error because version N does not have the new functionality. NICTA Copyright 2012 From imagination to impact 26
  • 27. Mixed Version Race Condition Client (browser) Server 1 Start rolling upgrade 2 Initial request HTTP reply with New embedded JavaScript 3 Version 4 AJAX callback Old 5 Version X ERROR NICTA Copyright 2012 From imagination to impact 27
  • 28. What does the solution involve? 1. Label communication between instances and the client with version information 2. Modify load balancer so that messages are routed to an appropriate version 3. Modify load balancer so that messages are balanced across all child instances. NICTA Copyright 2012 From imagination to impact 28
  • 29. Why is this a hard problem? • Large installations have multiple distributed load balancers that must be kept in synch. I.e. some load balancers may know about new version and some may not • Not enough to put version number in message – Suppose second request goes to a load balancer that does not yet know about version N+1. • Must keep messages balanced so that all servers handle roughly the same number of requests. /service /service /service/vN /service/vN+1 /service/vN server server server server server server NICTA Copyright 2012 From imagination to impact 29
  • 30. Operator undo • After perofmring an operation in AWS, may want to go back to original state – i.e. Undo the operation • Not always that straight-forward: – Attaching volume is no problem while the instance is running, detaching might be problematic – Creating / changing auto-scaling rules has effect on number of running instances • Cannot terminate additional instances, as the rule would create new ones! – Deleted / terminated / released resources are gone! NICTA Copyright 2012 From imagination to impact 30
  • 31. Undo for System Operators Administrator begin- do do do rollback transaction + commit + pseudo-delete NICTA Copyright 2012 From imagination to impact 31
  • 32. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Undo System NICTA Copyright 2012 From imagination to impact 32
  • 33. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial state state state state Undo System NICTA Copyright 2012 From imagination to impact 33
  • 34. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial Set of Set of state state state state actions actions Execute Generate code Plan Undo System NICTA Copyright 2012 From imagination to impact 34
  • 35. Upgradability as a process&product quality • Architecture of the system being upgraded can affect the process of installation – Suppose the system checks for version information from dependent libraries. Then the process must encompass descriptions of what to do if an error condition occurs. • Process of upgrade can affect the architecture of the product. – Suppose the process is supported by a tool that checks the health of the installation of version N+1. Then the system must make visible the information used by this tool. NICTA Copyright 2012 From imagination to impact 35
  • 36. Summary • Upgrade is an important problem – Upgrade failures affect user satisfaction – Upgrade failures happen frequently • Upgrade involves the interaction of product and process quality issues. – Communities are focussed on improving the quality of the process or the product. Not the joint process/product quality. • Multiple opportunities for research exist. NICTA Copyright 2012 From imagination to impact 36
  • 37. Q&A Thank You! Research study opportunities in dependable cloud computing: • Software Architecture • Data Management • Performance Engineering • Autonomic Computing To find out more, send your CV and undergraduate details to students@nicta.com.au NICTA Copyright 2012 From imagination to impact 37