SlideShare una empresa de Scribd logo
1 de 51
Using the SLALOM model to improve Cloud SLAs
Efstathios Karanastasis
ICCS/NTUA
SLALOM Technical Track
2
Problem snapshot
SLA Technological Landscape
• A lot of ambiguities exist in SLAs of Cloud providers
• The measurement/auditing process of an SLA cannot be done
non-repudiably
– i.e., the involved parties may be able to challenge the auditing of the SLOs
• Standard models are rare and are not widely used
• Differences between Cloud providers cannot be easily assessed
– Absolute percentages cannot be compared among providers
3
Problem snapshot
Ambiguities in SLAs
• Availability (as defined by providers) definition may encapsulate different
formulas for its calculation
• The definition and calculation of availability may include different ways of
identifying a failure, e.g.:
– Response time less than a limit
– Returned response within a string enumeration
(i.e. a predefined range of string values)
• Preconditions apply
4
Problem snapshot
Real world example of Ambiguity
• Ambiguity in the measurement process of AWS EC2 SLA
• “Unavailable” and “Unavailability” mean:
– When all of your running instances have no external connectivity
• Determination of external connectivity. How?
– Internet Layer: Pinging (ICMP)?
• Security threat
– Application layer: Endpoint checking?
• Includes application downtime
• Not exclusively the responsibility of AWS EC2
5
Problem snapshot
Examples of preconditions
• For any SLA to apply, a number of preconditions typically exist per
provider
• Examples:
– Deployment: A specified number of Availability Zones must be used
– Deployment: Replication options must be used
– Usage/Measurement: Unavailable resources must first be restarted
– Usage/Measurement: The number of request must be throttled
6
Problem snapshot
SLALOM Technical objectives
• To have a standard model for defining SLAs that eliminates ambiguities
• To facilitate the measurement, monitoring and enforcement of SLAs to
achieve non-repudiability
• To abstract the SLA definition process (SLA  SLO  metric  sub-
metric) so as to enable the application of metrics that allow for
direct comparability
7
Interaction with Standards
8
SLALOM@ISO
Interaction with ISO
• Mapped SLALOM 3-layer initial approach to ISO baseline model
– ISO approach powerful at describing more complex metrics (e.g. MS Azure SLA)
• Demonstrated and suggested the ISO model Extendibility for fully defining the
way an SLO can be audited – ACCEPTED
– Suggested the inclusion of an Extension class in the ISO model
– Instantiate the ISO Extension class as the base Sample class of SLALOM
– Introduce the SLALOM Sample layer for concretely defining the sampling process
– In the latest revision of the draft ISO model all classes are extendable
• Applied on different types of Objectives of Commercial SLAs
– GAE Datastore (PaaS)
– AWS EC2 (IaaS)
– Microsoft Azure (Storage)
• Showed applicability of the proposed approach for directly creating machine
understandable descriptions of the SLOs
9
SLALOM@ISO
ISO 19086-2 Metric model
• SLALOM two-fold contribution:
– ISO model classes parameters: machine understandable
– ISO model extension: definition of sampling process
10
SLALOM - proposed
extension
Model from the latest
revision of the 19086-2
draft standard,
to be made available in
the forthcoming weeks
All classes extendible
SLALOM@ISO
SLALOM vs. ISO compliance
ISO-compliant SLA
• Usage of the ISO fields
(classes, parameters)
• SLA not necessarily fully
defined
11
SLALOM-compliant SLA
• ISO compliant
• Clear and Well-defined
• Non-repudiable
• SLAs still not comparable
among providers
Mapping of Commercial SLAs
12
Commercial SLAs @SLALOM
Amazon WS EC2
Amazon EC2
Level / definition Expression Notes
Sample definition
sc: UNDEFINED (assumed ‘ping’->
ICMP)
The sampling condition is not defined in the
Amazon EC2 SLA. The concrete wording is “when
all of your running instances have no external
connectivity”. Nonetheless, the way to specify /
measure “external connectivity” is not defined.
For example, a customer could use a ping
operation or a custom monitoring mechanism.
Type of operation: ping
Not defined how the condition of connectivity
can be actually measured (e.g. the ping operation
mentioned previously).
Boundary period
and error
definitions
bp > 60 sec
The exact wording is “the percentage of
minutes”, thus the period is 60 seconds.
ec = 100%
Error condition reflecting that the error ratio is
that for the entire bp the resource must be
continuously “unavailable”.
Abstract metric
definition
availability < 99.95 %
Availability metric definition given the boundary
period and error condition.
13
Commercial SLAs @SLALOM
Google AE Datastore
Google AppEngine Datastore
Level / definition Expression Notes
Sample definition
sc: INTERNAL_ERROR
Several sampling conditions are
defined per type of operation. For
example it is specified (exact wording)
“INTERNAL_ERROR, TIMEOUT, …” for
API calls.
Type of operation: API calls
Several type of operations are defined.
An example is provided here.
Boundary period
and error
definitions
bp > 300 sec
The exact wording is “five consecutive
minutes”.
ec > 10%
Error condition reflecting that the
error ratio is (exact wording) “ten
percent Error Rate”.
Abstract metric
definition
availability < 99.95 %
Availability metric definition given the
boundary period and error condition.
14
Commercial SLAs @SLALOM
Microsoft Azure
15
Microsoft Azure Storage
Level / definition Expression Notes
Sample definition
sc = 60 sec
Several sampling conditions are defined
per type of operation. For example it is
specified (exact wording) “Sixty (60)
seconds” for PutBlockList and
GetBlockList.
Type of operation: PutBlockList and
GetBlockList
Several type of operations are defined.
An example is provided here.
Boundary period
and error
definitions
bp > 3600 sec
The exact wording is “given one-hour
interval”.
ec > 0%
Error condition reflecting that all periods
should be taken into account for the
availability metric evaluation (exact
wording) “is the sum of Error Rates for
each hour”.
Abstract metric
definition
availability < 99.9 %
Availability metric definition given the
boundary period and error condition.
SLA Comparability
16
SLA comparability
Overview
• Despite the fact that through the SLALOM / ISO model SLA descriptions
may be aligned, this does not mean that SLAs (or their parameters) will be
directly comparable
• Need for more abstract metrics, that result in direct comparisons
– SLA success ratio (Published* by Cloud WG of SPEC**)
– SLA strictness (Published* by Cloud WG of SPEC+)
– Standardised datasets
• SLALOM model enables the application of comparable metrics
– All SLA parameters are clearly and well defined
– The SLAs are machine readable
– Greatly simplifies the process and its automation
* Ready for Rain? A View from SPEC Research on the Future of Cloud Metrics
** SPEC: Standard Performance Evaluation Corporation
17
SLA comparability
Comparative metrics
• SLA success ratio
– Based on experience of usage of a service or provider
– In the course of time keep track of successful or violated SLAs and total SLAs
– Calculate the ratio: (Successful SLAs / Total SLAs)
• SLA strictness
– Extract static SLA parameters of importance for a given domain or application
– Assign weights to parameters and normalise
– Map these parameters to an arbitrary function
– Results in a comparative ranking of different SLAs
• Standardised datasets
– Define a set of failure scenarios
– Benchmark each provider SLA definition against the predefined scenario
18
SLA-related Lessons Learnt for Cloud Uptake
19
Lessons Learnt
Do
1) Target metrics that are directly comparable among providers
2) Consider directly machine understandable descriptions via standardised
templates
3) Look into the ISO 19086 series of standards and adopt if applicable
4) Think outside the narrow Cloud box. With the advent of *aaS and the
emergence of IoT, SLAs may refer to services external to the data center or to
specific metrics needed by Cloud Services based on the individual Use Case
5) Consider composite services that may create chains of SLAs and their
interdependencies. For guaranteeing response time to service-support services
consider downstream (reseller) and upstream (e.g. provider’s subcontractors)
actors’ requirements and the need to ‘float’ SLA clauses down the chain
6) Consider resource management as a key part of SLA upkeep and analysis process
7) Consider mechanisms that would allow providers, resellers and users to easily
monitor the SLA in a common and understandable way, even if not experts.
20
Lessons Learnt
Don’t
1) Consider that offered terms are equivalent, even if they originally seem to refer
to the same SLO. Always check the fine print for differences in how metrics are
actually calculated
2) Consider that SLAs are monitored by providers.
3) Leave end users out of the loop. Comprehensiveness and clarity of an SLA (or its
relevant metric) for non-experts should be a key target. Translate your metrics
into plain English if necessary.
4) Limit yourself to popular metrics (e.g. availability) in SLAs. Users are also
interested in more generic Quality of Experience (QoE) indexes such as stability
5) Expect the market to bend for you: fit in to current practice to the maximum
extent and if not possible, hone your value proposition
21
SLALOM Contribution and Expected Impact
22
SLALOM contribution
Tender Evaluation
• Usable by various actors
– Adopters to specify their needs
– Providers to describe their value proposition
– Third parties (resellers/brokers) to combine and offer services and
suggest options
• Added value
– Application of comparative metrics
– Automation of the process
• Benefits
– Improve transparency
– Enhance efficiency
– Establish fairness
23
SLALOM contribution
Contract monitoring
• Benefits
– Achieve SLA non-repudiation
– Establish trust and transparency for service execution compliant to
the terms and proper violation management
– Enable automation of contract and performance management and
monitoring
– Aid the involvement of actors like trusted third parties offering
relevant services
24
• SLALOM proposed specification / reference model already
takes into account:
– Standardisation approaches and working groups outcomes
– Current SLAs and metrics offered by commercial Cloud providers
– Views expressed by Cloud providers and adopters
– Research outcomes
• Further feedback regarding applicability and practical usage
of our model is more than welcome 
• Please take the survey on IoT/Cloud metrics here:
https://docs.google.com/forms/d/1JmwDXyO_1hT9iR-lm1c3LCQu_zF64nf-uFnxBeGMv3g/viewform
25
SLALOM contribution
Your feedback needed
Contact us
26
• SLALOM Technicl WP Leader
ekaranas@mail.ntua.gr
vandro@mail.ntua.gr
gkousiou@mail.ntua.gr
• SLALOM Project Coordinator
daniel.field@atos.net
?
SLALOM Project 27
SLALOM is a CSA financed by European
Commission under Grant agreement 644270
For more information on the initiative contact us:
@CloudSLAlom
www.SLALOM-Project.eu
SLALOM Project Coordinator
(daniel.field@atos.net)
Backup slides
SLA Strictness example
28
Backup sliSLA strictness example
29
Provider/Service t q (s1 * q) q’ (s2 * q) p (s3 * p) x S S’
Google Compute 0 5 (1.00) 5 (0.10) 99.95 (0.50) 0 0.50 1.60
Amazon EC2 0 1 (0.20) 1 (0.02) 99.95 (0.50) 0 1.30 1.48
MS Azure Compute 1 1 (0.20) 1 (0.02) 99.95 (0.50) 0 2.30 2.48
• Extract static SLA parameters of importance for a given domain/application
– All these parameters (e.g. boundary period, error rates) are described in the SLALOM model
• Map these parameters to an arbitrary Function, e.g.:
, where:
– q: size of the boundary period
– p: percentage of availability
– t: running time vs. overall monthly time (boolean), t ϵ {0,1}
– x: existence of performance metrics (boolean), x ϵ {0,1}
– si: normalisation factor for the continuous variables so that:
(s1*q) ϵ [0,1], (s2*q) ϵ [0,0.1] and (s3*p) ϵ [0,0.5]
• Resulting value may be compared between providers
S = t + (1 - s1/2q) + s3p + x
Backup slides
Mapping of AWS EC2 SLA
30
AWS EC2 SLA @SLALOM (1/9)
Amazon EC2
Level / definition Expression Notes
Sample definition
sc: UNDEFINED (assumed ‘ping’->
ICMP)
The sampling condition is not defined in the
Amazon EC2 SLA. The concrete wording is “when
all of your running instances have no external
connectivity”. Nonetheless, the way to specify /
measure “external connectivity” is not defined.
For example, a customer could use a ping
operation or a custom monitoring mechanism.
Type of operation: ping
Not defined how the condition of connectivity
can be actually measured (e.g. the ping operation
mentioned previously).
Boundary period
and error
definitions
bp > 60 sec
The exact wording is “the percentage of
minutes”, thus the period is 60 seconds.
ec = 100%
Error condition reflecting that the error ratio is
that for the entire bp the resource must be
continuously “unavailable”.
Abstract metric
definition
availability < 99.95 %
Availability metric definition given the boundary
period and error condition.
31
AWS EC2 SLA @SLALOM (2/9)
32
Abstract metric
definition
availability < 99.95 %
Availability metric definition given
the boundary period and error
condition.
Condition of SLA violation specification
Availability threshold specification
Availability definition and calculation
Billing period specification
Unavailability definition and calculation
Unavailability interval definition and calculation
Boundary period specification
Unreachable sample specification
Sample definition and retrieval
PARAM_001
PARAM_002
SAMPLE_001
QDT_001
UAP_001
BP_001
CFA_002
PARAM_003
CONDITION
AWS EC2 SLA @SLALOM (3/9)
33
• Examples of preconditions:
– Deployment: Number of Availability Zones used
– Deployment: Replication options used
– Usage/Measurement: Restarting of resources when unavailable
– Usage/Measurement: Applied Throttling of requests
• Practical suggestions:
– The strict definition of the Rules class to be concerning the
necessary preconditions to apply
– Note field as placeholder for the actual SLA text that refers to a
given block
AWS EC2 SLA @SLALOM (4/9)
34
SAMPLE_001
Sample
definition
sc: UNDEFINED
(assumed ‘ping’-
> ICMP)
The sampling condition is not defined in
the Amazon EC2 SLA. The concrete wording
is “when all of your running instances have
no external connectivity”. Nonetheless, the
way to specify / measure “external
connectivity” is not defined. For example, a
customer could use a ping operation or a
custom monitoring mechanism.
Type of
operation: ping
Not defined how the condition of
connectivity can be actually measured (e.g.
the ping operation mentioned previously).
SAMPLE_001
AWS EC2 SLA @SLALOM (5/9)
35
Boundary period
and error
definitions
bp > 60 sec
The exact wording is “the percentage of
minutes”, thus the period is 60 seconds.
ec = 100%
Error condition reflecting that the error ratio
is that for the entire bp the resource must be
continuously “unavailable”.
PARAM_001
PARAM_002
SAMPLE_001
PARAM_001
PARAM_002
AWS EC2 SLA @SLALOM (6/9)
36
PARAM_001
PARAM_002
SAMPLE_001
QDT_001
PARAM_001
PARAM_002SAMPLE_001
QDT_001
• Calculation of Cloud Service Unavailability Interval
• Based on:
- The current sample
- The defined boundary period
- The definition of unreachable sample
QDT_001
SAMPLE_001
PARAM_001
PARAM_002
AWS EC2 SLA @SLALOM (7/9)
37
PARAM_001
PARAM_002
SAMPLE_001
QDT_001
• Calculation of Cloud Service Unavailability
• Based on:
- The Cloud Service Unavailability Interval
QDT_001
QDT_001
UAP_001
UAP_001
UAP_001
AWS EC2 SLA @SLALOM (8/9)
38
PARAM_001
PARAM_002
SAMPLE_001
QDT_001
• Calculation of Cloud Service Availability
• Based on:
- Billing period
- The Cloud Service Unavailability
UAP_001
UAP_001
UAP_001
UAP_001
BP_001
BP_001 BP_001
BP_001
BP_001
CFA_002
CFA_002
CFA_002
AWS EC2 SLA @SLALOM (9/9)
39
PARAM_001
PARAM_002
SAMPLE_001
QDT_001
• SLA Violation Condition
- i.e.: Availability < 99.95%
UAP_001
BP_001
CFA_002
CFA_002 CFA_002
PARAM_003
PARAM_003
PARAM_003
PARAM_003
ASV_001
ASV_001
ASV_001
Backup slides
Mapping of GAE Datastore SLA
40
GAE Datastore SLA @SLALOM(1/11)
Google AppEngine Datastore
Level / definition Expression Notes
Sample definition
sc: INTERNAL_ERROR
Several sampling conditions are
defined per type of operation. For
example it is specified (exact wording)
“INTERNAL_ERROR, TIMEOUT, …” for
API calls.
Type of operation: API calls
Several type of operations are defined.
An example is provided here.
Boundary period
and error
definitions
bp > 300 sec
The exact wording is “five consecutive
minutes”.
ec > 10%
Error condition reflecting that the
error ratio is (exact wording) “ten
percent Error Rate”.
Abstract metric
definition
availability < 99.95 %
Availability metric definition given the
boundary period and error condition.
41
GAE Datastore SLA @SLALOM(2/11)
42
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
ER_001
DUR_001
QDT_001
UAP_001
BP_001
CFA_002
PARAM_004
ASV_001
Condition of SLA Violation specification
Availability threshold specification
Availability definition and calculation
Billing Period specification
Unavailability definition and calculation
Unavailability Interval definition and calculation
Sampling Period duration definition and calculation
Error Rate definition and calculation
Boundary Period specification
Error Rate threshold specification
Unreachable sample values specification
Sample definition and retrieval
Abstract metric
definition
availability < 99.95 %
Availability metric definition given the
boundary period and error condition.
GAE Datastore SLA @SLALOM(3/11)
43
• Examples of preconditions:
– Deployment: Number of Availability Zones used
– Deployment: Replication options used
– Usage/Measurement: Restarting of resources when unavailable
– Usage/Measurement: Applied Throttling of requests
• Practical suggestions:
– The strict definition of the Rules class to be concerning the necessary
preconditions to apply
– Note field as placeholder for the actual SLA text that refers to a given
block
GAE Datastore SLA @SLALOM(4/11)
44
Sample
definition
sc: INTERNAL_ERROR
Several sampling conditions are
defined per type of operation. For
example it is specified (exact wording)
“INTERNAL_ERROR, TIMEOUT, …” for
API calls.
Type of operation: API calls
Several type of operations are
defined. An example is provided here.
SAMPLE_001SAMPLE_001
SAMPLE_001
GAE Datastore SLA @SLALOM(5/11)
45
Sample
definition
sc: INTERNAL_ERROR
Several sampling conditions are
defined per type of operation. For
example it is specified (exact wording)
“INTERNAL_ERROR, TIMEOUT, …” for
API calls.
Type of operation: API calls
Several type of operations are
defined. An example is provided here.
SAMPLE_001SAMPLE_001PARAM_003
PARAM_003
GAE Datastore SLA @SLALOM(6/11)
46
SAMPLE_001SAMPLE_001PARAM_003
PARAM_003
Boundary period
and error
definitions
bp > 300 sec The exact wording is “five consecutive minutes”.
ec > 10%
Error condition reflecting that the error ratio is
(exact wording) “ten percent Error Rate”.
PARAM_002
PARAM_002
PARAM_001
PARAM_001
GAE Datastore SLA @SLALOM(7/11)
47
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
• Calculation of duration of sampling period:
- The period during which a number of samples was
received
- Period duration calculation based on samples timestamp
• Calculation of actual Error Rate for sampling period:
- Number of violation samples / number of total samples
- Violation samples: samples containing values from a
specific values pool
ER_001
ER_001
SAMPLE_001
SAMPLE_001
PARAM_003
SAMPLE_001DUR_001
DUR_001
SAMPLE_001
DUR_001
ER_001
PARAM_003
GAE Datastore SLA @SLALOM(8/11)
48
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
• Calculation of Unavailability Interval
- IF [Sampling Period duration > Boundary Period]
- AND IF [Error Rate > Thershold (10%)]
- THEN [Unavailability Interval = Sampling Period duration]
ER_001
ER_001
QDT_001 DUR_001
DUR_001
PARAM_001
PARAM_002
DUR_001
QDT_001
QDT_001
QDT_001
QDT_001
ER_001 PARAM_002
DUR_001 PARAM_001
QDT_001 DUR_001
GAE Datastore SLA @SLALOM(9/11)
49
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
• Calculation of Unavailability period
- It equals the SUM of Unavailability Intervals
ER_001
DUR_001
QDT_001
QDT_001
UAP_001
UAP_001
UAP_001
QDT_001
UAP_001
GAE Datastore SLA @SLALOM(10/11)
50
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
ER_001
DUR_001
QDT_001
UAP_001
UAP_001
BP_001
BP_001BP_001
BP_001
CFA_002
CFA_002
CFA_002
• Calculation of Cloud Service Availability
• Based on:
- Billing period
- The Cloud Service Unavailability
CFA_002
BP_001
UAP_001
GAE Datastore SLA @SLALOM(11/11)
51
SAMPLE_001SAMPLE_001
PARAM_003
PARAM_002
PARAM_001
ER_001
DUR_001
QDT_001
UAP_001
BP_001
CFA_002
• SLA Violation Condition
- i.e.: Availability < 99.95%
PARAM_004CFA_002
CFA_002
PARAM_004
PARAM_004
PARAM_004
ASV_001
ASV_001
ASV_001

Más contenido relacionado

Similar a SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technical Model to Improve #Cloud #SLA" v1

CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docxCMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
mary772
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
Pooyan Jamshidi
 
SE - Software Requirements
SE - Software RequirementsSE - Software Requirements
SE - Software Requirements
Jomel Penalba
 
Software Requirements
Software RequirementsSoftware Requirements
Software Requirements
Bala Ganesh
 
VCS_QAPerformanceSlides
VCS_QAPerformanceSlidesVCS_QAPerformanceSlides
VCS_QAPerformanceSlides
Michael Cowan
 
Software Requirements in Software Engineering SE5
Software Requirements in Software Engineering SE5Software Requirements in Software Engineering SE5
Software Requirements in Software Engineering SE5
koolkampus
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
Sandeep Ganji
 

Similar a SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technical Model to Improve #Cloud #SLA" v1 (20)

Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation PerformanceGate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
 
Modeling and Testing Dovetail in MagicDraw
Modeling and Testing Dovetail in MagicDrawModeling and Testing Dovetail in MagicDraw
Modeling and Testing Dovetail in MagicDraw
 
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docxCMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
SE - Software Requirements
SE - Software RequirementsSE - Software Requirements
SE - Software Requirements
 
Software Requirements
Software RequirementsSoftware Requirements
Software Requirements
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
 
Beit 381 se lec 15 - 16 - 12 mar27 - req engg 1 of 3
Beit 381 se lec 15 - 16 -  12 mar27 - req engg 1 of 3Beit 381 se lec 15 - 16 -  12 mar27 - req engg 1 of 3
Beit 381 se lec 15 - 16 - 12 mar27 - req engg 1 of 3
 
VCS_QAPerformanceSlides
VCS_QAPerformanceSlidesVCS_QAPerformanceSlides
VCS_QAPerformanceSlides
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
 
Asp.net,mvc
Asp.net,mvcAsp.net,mvc
Asp.net,mvc
 
Traffic Simulator
Traffic SimulatorTraffic Simulator
Traffic Simulator
 
Software Requirements in Software Engineering SE5
Software Requirements in Software Engineering SE5Software Requirements in Software Engineering SE5
Software Requirements in Software Engineering SE5
 
Ch 1-Introduction.ppt
Ch 1-Introduction.pptCh 1-Introduction.ppt
Ch 1-Introduction.ppt
 
Ogf20 Gmb Chris Swan
Ogf20 Gmb Chris SwanOgf20 Gmb Chris Swan
Ogf20 Gmb Chris Swan
 
Workload design[1]
Workload design[1]Workload design[1]
Workload design[1]
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
 

Más de Oliver Barreto Rodríguez

Más de Oliver Barreto Rodríguez (20)

H-HW&SW Alliance Presentation
H-HW&SW Alliance PresentationH-HW&SW Alliance Presentation
H-HW&SW Alliance Presentation
 
Ditas factsheet h2020 v1.1
Ditas factsheet h2020  v1.1Ditas factsheet h2020  v1.1
Ditas factsheet h2020 v1.1
 
Ditas Poster v1.1
Ditas  Poster v1.1Ditas  Poster v1.1
Ditas Poster v1.1
 
Ditas Flyer v2.1
Ditas  Flyer v2.1Ditas  Flyer v2.1
Ditas Flyer v2.1
 
Ditas Project Presentation v1.0
Ditas Project Presentation v1.0Ditas Project Presentation v1.0
Ditas Project Presentation v1.0
 
Heterogeneous Hardware & Software Alliance...
Heterogeneous Hardware & Software Alliance... Heterogeneous Hardware & Software Alliance...
Heterogeneous Hardware & Software Alliance...
 
TANGO Project in a Nutshell Presentation
TANGO Project in a Nutshell PresentationTANGO Project in a Nutshell Presentation
TANGO Project in a Nutshell Presentation
 
TANGO Project Poster v1
TANGO Project Poster v1TANGO Project Poster v1
TANGO Project Poster v1
 
TANGO Project in a Nutshell Flyer
TANGO Project in a Nutshell FlyerTANGO Project in a Nutshell Flyer
TANGO Project in a Nutshell Flyer
 
TANGO Project Poster v2
TANGO Project Poster v2TANGO Project Poster v2
TANGO Project Poster v2
 
SLALOM Best Practice DOs & DON'Ts Guide on Cloud SLAs for Project Researchers
SLALOM Best Practice DOs & DON'Ts Guide on Cloud SLAs for Project ResearchersSLALOM Best Practice DOs & DON'Ts Guide on Cloud SLAs for Project Researchers
SLALOM Best Practice DOs & DON'Ts Guide on Cloud SLAs for Project Researchers
 
SLALOM Project Legal Webinar Introduction 20151019 Legal Aspects
SLALOM Project Legal Webinar Introduction 20151019 Legal AspectsSLALOM Project Legal Webinar Introduction 20151019 Legal Aspects
SLALOM Project Legal Webinar Introduction 20151019 Legal Aspects
 
SLALOM Project Legal Webinar Introduction 20151019 Introduction
SLALOM Project Legal Webinar Introduction 20151019 IntroductionSLALOM Project Legal Webinar Introduction 20151019 Introduction
SLALOM Project Legal Webinar Introduction 20151019 Introduction
 
MODAClouds Value - Solving Top Problems of Cloud Dev Lifecycle
MODAClouds Value - Solving Top Problems of Cloud Dev LifecycleMODAClouds Value - Solving Top Problems of Cloud Dev Lifecycle
MODAClouds Value - Solving Top Problems of Cloud Dev Lifecycle
 
Solving Top Cloud Problems MODAClouds Toolbox
Solving Top Cloud Problems MODAClouds ToolboxSolving Top Cloud Problems MODAClouds Toolbox
Solving Top Cloud Problems MODAClouds Toolbox
 
H2020 Research Projects Elevator Ptch
H2020 Research Projects Elevator PtchH2020 Research Projects Elevator Ptch
H2020 Research Projects Elevator Ptch
 
MODAClouds - Underpinning the Leap to DevOps Movement on Clouds scenarios
MODAClouds - Underpinning the Leap to DevOps Movement on Clouds scenariosMODAClouds - Underpinning the Leap to DevOps Movement on Clouds scenarios
MODAClouds - Underpinning the Leap to DevOps Movement on Clouds scenarios
 
Cloud Interoperability and Portability at Future Pre-FIA 2013 Multi-Clouds Wo...
Cloud Interoperability and Portability at Future Pre-FIA 2013 Multi-Clouds Wo...Cloud Interoperability and Portability at Future Pre-FIA 2013 Multi-Clouds Wo...
Cloud Interoperability and Portability at Future Pre-FIA 2013 Multi-Clouds Wo...
 
OPTIMIS in a Nutshell
OPTIMIS in a NutshellOPTIMIS in a Nutshell
OPTIMIS in a Nutshell
 
ALERT at fOSSa Conference 2012
ALERT at fOSSa Conference 2012ALERT at fOSSa Conference 2012
ALERT at fOSSa Conference 2012
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technical Model to Improve #Cloud #SLA" v1

  • 1. Using the SLALOM model to improve Cloud SLAs Efstathios Karanastasis ICCS/NTUA
  • 3. Problem snapshot SLA Technological Landscape • A lot of ambiguities exist in SLAs of Cloud providers • The measurement/auditing process of an SLA cannot be done non-repudiably – i.e., the involved parties may be able to challenge the auditing of the SLOs • Standard models are rare and are not widely used • Differences between Cloud providers cannot be easily assessed – Absolute percentages cannot be compared among providers 3
  • 4. Problem snapshot Ambiguities in SLAs • Availability (as defined by providers) definition may encapsulate different formulas for its calculation • The definition and calculation of availability may include different ways of identifying a failure, e.g.: – Response time less than a limit – Returned response within a string enumeration (i.e. a predefined range of string values) • Preconditions apply 4
  • 5. Problem snapshot Real world example of Ambiguity • Ambiguity in the measurement process of AWS EC2 SLA • “Unavailable” and “Unavailability” mean: – When all of your running instances have no external connectivity • Determination of external connectivity. How? – Internet Layer: Pinging (ICMP)? • Security threat – Application layer: Endpoint checking? • Includes application downtime • Not exclusively the responsibility of AWS EC2 5
  • 6. Problem snapshot Examples of preconditions • For any SLA to apply, a number of preconditions typically exist per provider • Examples: – Deployment: A specified number of Availability Zones must be used – Deployment: Replication options must be used – Usage/Measurement: Unavailable resources must first be restarted – Usage/Measurement: The number of request must be throttled 6
  • 7. Problem snapshot SLALOM Technical objectives • To have a standard model for defining SLAs that eliminates ambiguities • To facilitate the measurement, monitoring and enforcement of SLAs to achieve non-repudiability • To abstract the SLA definition process (SLA  SLO  metric  sub- metric) so as to enable the application of metrics that allow for direct comparability 7
  • 9. SLALOM@ISO Interaction with ISO • Mapped SLALOM 3-layer initial approach to ISO baseline model – ISO approach powerful at describing more complex metrics (e.g. MS Azure SLA) • Demonstrated and suggested the ISO model Extendibility for fully defining the way an SLO can be audited – ACCEPTED – Suggested the inclusion of an Extension class in the ISO model – Instantiate the ISO Extension class as the base Sample class of SLALOM – Introduce the SLALOM Sample layer for concretely defining the sampling process – In the latest revision of the draft ISO model all classes are extendable • Applied on different types of Objectives of Commercial SLAs – GAE Datastore (PaaS) – AWS EC2 (IaaS) – Microsoft Azure (Storage) • Showed applicability of the proposed approach for directly creating machine understandable descriptions of the SLOs 9
  • 10. SLALOM@ISO ISO 19086-2 Metric model • SLALOM two-fold contribution: – ISO model classes parameters: machine understandable – ISO model extension: definition of sampling process 10 SLALOM - proposed extension Model from the latest revision of the 19086-2 draft standard, to be made available in the forthcoming weeks All classes extendible
  • 11. SLALOM@ISO SLALOM vs. ISO compliance ISO-compliant SLA • Usage of the ISO fields (classes, parameters) • SLA not necessarily fully defined 11 SLALOM-compliant SLA • ISO compliant • Clear and Well-defined • Non-repudiable • SLAs still not comparable among providers
  • 13. Commercial SLAs @SLALOM Amazon WS EC2 Amazon EC2 Level / definition Expression Notes Sample definition sc: UNDEFINED (assumed ‘ping’-> ICMP) The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example, a customer could use a ping operation or a custom monitoring mechanism. Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously). Boundary period and error definitions bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds. ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 13
  • 14. Commercial SLAs @SLALOM Google AE Datastore Google AppEngine Datastore Level / definition Expression Notes Sample definition sc: INTERNAL_ERROR Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls. Type of operation: API calls Several type of operations are defined. An example is provided here. Boundary period and error definitions bp > 300 sec The exact wording is “five consecutive minutes”. ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 14
  • 15. Commercial SLAs @SLALOM Microsoft Azure 15 Microsoft Azure Storage Level / definition Expression Notes Sample definition sc = 60 sec Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “Sixty (60) seconds” for PutBlockList and GetBlockList. Type of operation: PutBlockList and GetBlockList Several type of operations are defined. An example is provided here. Boundary period and error definitions bp > 3600 sec The exact wording is “given one-hour interval”. ec > 0% Error condition reflecting that all periods should be taken into account for the availability metric evaluation (exact wording) “is the sum of Error Rates for each hour”. Abstract metric definition availability < 99.9 % Availability metric definition given the boundary period and error condition.
  • 17. SLA comparability Overview • Despite the fact that through the SLALOM / ISO model SLA descriptions may be aligned, this does not mean that SLAs (or their parameters) will be directly comparable • Need for more abstract metrics, that result in direct comparisons – SLA success ratio (Published* by Cloud WG of SPEC**) – SLA strictness (Published* by Cloud WG of SPEC+) – Standardised datasets • SLALOM model enables the application of comparable metrics – All SLA parameters are clearly and well defined – The SLAs are machine readable – Greatly simplifies the process and its automation * Ready for Rain? A View from SPEC Research on the Future of Cloud Metrics ** SPEC: Standard Performance Evaluation Corporation 17
  • 18. SLA comparability Comparative metrics • SLA success ratio – Based on experience of usage of a service or provider – In the course of time keep track of successful or violated SLAs and total SLAs – Calculate the ratio: (Successful SLAs / Total SLAs) • SLA strictness – Extract static SLA parameters of importance for a given domain or application – Assign weights to parameters and normalise – Map these parameters to an arbitrary function – Results in a comparative ranking of different SLAs • Standardised datasets – Define a set of failure scenarios – Benchmark each provider SLA definition against the predefined scenario 18
  • 19. SLA-related Lessons Learnt for Cloud Uptake 19
  • 20. Lessons Learnt Do 1) Target metrics that are directly comparable among providers 2) Consider directly machine understandable descriptions via standardised templates 3) Look into the ISO 19086 series of standards and adopt if applicable 4) Think outside the narrow Cloud box. With the advent of *aaS and the emergence of IoT, SLAs may refer to services external to the data center or to specific metrics needed by Cloud Services based on the individual Use Case 5) Consider composite services that may create chains of SLAs and their interdependencies. For guaranteeing response time to service-support services consider downstream (reseller) and upstream (e.g. provider’s subcontractors) actors’ requirements and the need to ‘float’ SLA clauses down the chain 6) Consider resource management as a key part of SLA upkeep and analysis process 7) Consider mechanisms that would allow providers, resellers and users to easily monitor the SLA in a common and understandable way, even if not experts. 20
  • 21. Lessons Learnt Don’t 1) Consider that offered terms are equivalent, even if they originally seem to refer to the same SLO. Always check the fine print for differences in how metrics are actually calculated 2) Consider that SLAs are monitored by providers. 3) Leave end users out of the loop. Comprehensiveness and clarity of an SLA (or its relevant metric) for non-experts should be a key target. Translate your metrics into plain English if necessary. 4) Limit yourself to popular metrics (e.g. availability) in SLAs. Users are also interested in more generic Quality of Experience (QoE) indexes such as stability 5) Expect the market to bend for you: fit in to current practice to the maximum extent and if not possible, hone your value proposition 21
  • 22. SLALOM Contribution and Expected Impact 22
  • 23. SLALOM contribution Tender Evaluation • Usable by various actors – Adopters to specify their needs – Providers to describe their value proposition – Third parties (resellers/brokers) to combine and offer services and suggest options • Added value – Application of comparative metrics – Automation of the process • Benefits – Improve transparency – Enhance efficiency – Establish fairness 23
  • 24. SLALOM contribution Contract monitoring • Benefits – Achieve SLA non-repudiation – Establish trust and transparency for service execution compliant to the terms and proper violation management – Enable automation of contract and performance management and monitoring – Aid the involvement of actors like trusted third parties offering relevant services 24
  • 25. • SLALOM proposed specification / reference model already takes into account: – Standardisation approaches and working groups outcomes – Current SLAs and metrics offered by commercial Cloud providers – Views expressed by Cloud providers and adopters – Research outcomes • Further feedback regarding applicability and practical usage of our model is more than welcome  • Please take the survey on IoT/Cloud metrics here: https://docs.google.com/forms/d/1JmwDXyO_1hT9iR-lm1c3LCQu_zF64nf-uFnxBeGMv3g/viewform 25 SLALOM contribution Your feedback needed
  • 26. Contact us 26 • SLALOM Technicl WP Leader ekaranas@mail.ntua.gr vandro@mail.ntua.gr gkousiou@mail.ntua.gr • SLALOM Project Coordinator daniel.field@atos.net ?
  • 27. SLALOM Project 27 SLALOM is a CSA financed by European Commission under Grant agreement 644270 For more information on the initiative contact us: @CloudSLAlom www.SLALOM-Project.eu SLALOM Project Coordinator (daniel.field@atos.net)
  • 29. Backup sliSLA strictness example 29 Provider/Service t q (s1 * q) q’ (s2 * q) p (s3 * p) x S S’ Google Compute 0 5 (1.00) 5 (0.10) 99.95 (0.50) 0 0.50 1.60 Amazon EC2 0 1 (0.20) 1 (0.02) 99.95 (0.50) 0 1.30 1.48 MS Azure Compute 1 1 (0.20) 1 (0.02) 99.95 (0.50) 0 2.30 2.48 • Extract static SLA parameters of importance for a given domain/application – All these parameters (e.g. boundary period, error rates) are described in the SLALOM model • Map these parameters to an arbitrary Function, e.g.: , where: – q: size of the boundary period – p: percentage of availability – t: running time vs. overall monthly time (boolean), t ϵ {0,1} – x: existence of performance metrics (boolean), x ϵ {0,1} – si: normalisation factor for the continuous variables so that: (s1*q) ϵ [0,1], (s2*q) ϵ [0,0.1] and (s3*p) ϵ [0,0.5] • Resulting value may be compared between providers S = t + (1 - s1/2q) + s3p + x
  • 30. Backup slides Mapping of AWS EC2 SLA 30
  • 31. AWS EC2 SLA @SLALOM (1/9) Amazon EC2 Level / definition Expression Notes Sample definition sc: UNDEFINED (assumed ‘ping’-> ICMP) The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example, a customer could use a ping operation or a custom monitoring mechanism. Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously). Boundary period and error definitions bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds. ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 31
  • 32. AWS EC2 SLA @SLALOM (2/9) 32 Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. Condition of SLA violation specification Availability threshold specification Availability definition and calculation Billing period specification Unavailability definition and calculation Unavailability interval definition and calculation Boundary period specification Unreachable sample specification Sample definition and retrieval PARAM_001 PARAM_002 SAMPLE_001 QDT_001 UAP_001 BP_001 CFA_002 PARAM_003 CONDITION
  • 33. AWS EC2 SLA @SLALOM (3/9) 33 • Examples of preconditions: – Deployment: Number of Availability Zones used – Deployment: Replication options used – Usage/Measurement: Restarting of resources when unavailable – Usage/Measurement: Applied Throttling of requests • Practical suggestions: – The strict definition of the Rules class to be concerning the necessary preconditions to apply – Note field as placeholder for the actual SLA text that refers to a given block
  • 34. AWS EC2 SLA @SLALOM (4/9) 34 SAMPLE_001 Sample definition sc: UNDEFINED (assumed ‘ping’- > ICMP) The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example, a customer could use a ping operation or a custom monitoring mechanism. Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously). SAMPLE_001
  • 35. AWS EC2 SLA @SLALOM (5/9) 35 Boundary period and error definitions bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds. ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”. PARAM_001 PARAM_002 SAMPLE_001 PARAM_001 PARAM_002
  • 36. AWS EC2 SLA @SLALOM (6/9) 36 PARAM_001 PARAM_002 SAMPLE_001 QDT_001 PARAM_001 PARAM_002SAMPLE_001 QDT_001 • Calculation of Cloud Service Unavailability Interval • Based on: - The current sample - The defined boundary period - The definition of unreachable sample QDT_001 SAMPLE_001 PARAM_001 PARAM_002
  • 37. AWS EC2 SLA @SLALOM (7/9) 37 PARAM_001 PARAM_002 SAMPLE_001 QDT_001 • Calculation of Cloud Service Unavailability • Based on: - The Cloud Service Unavailability Interval QDT_001 QDT_001 UAP_001 UAP_001 UAP_001
  • 38. AWS EC2 SLA @SLALOM (8/9) 38 PARAM_001 PARAM_002 SAMPLE_001 QDT_001 • Calculation of Cloud Service Availability • Based on: - Billing period - The Cloud Service Unavailability UAP_001 UAP_001 UAP_001 UAP_001 BP_001 BP_001 BP_001 BP_001 BP_001 CFA_002 CFA_002 CFA_002
  • 39. AWS EC2 SLA @SLALOM (9/9) 39 PARAM_001 PARAM_002 SAMPLE_001 QDT_001 • SLA Violation Condition - i.e.: Availability < 99.95% UAP_001 BP_001 CFA_002 CFA_002 CFA_002 PARAM_003 PARAM_003 PARAM_003 PARAM_003 ASV_001 ASV_001 ASV_001
  • 40. Backup slides Mapping of GAE Datastore SLA 40
  • 41. GAE Datastore SLA @SLALOM(1/11) Google AppEngine Datastore Level / definition Expression Notes Sample definition sc: INTERNAL_ERROR Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls. Type of operation: API calls Several type of operations are defined. An example is provided here. Boundary period and error definitions bp > 300 sec The exact wording is “five consecutive minutes”. ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 41
  • 42. GAE Datastore SLA @SLALOM(2/11) 42 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 ER_001 DUR_001 QDT_001 UAP_001 BP_001 CFA_002 PARAM_004 ASV_001 Condition of SLA Violation specification Availability threshold specification Availability definition and calculation Billing Period specification Unavailability definition and calculation Unavailability Interval definition and calculation Sampling Period duration definition and calculation Error Rate definition and calculation Boundary Period specification Error Rate threshold specification Unreachable sample values specification Sample definition and retrieval Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition.
  • 43. GAE Datastore SLA @SLALOM(3/11) 43 • Examples of preconditions: – Deployment: Number of Availability Zones used – Deployment: Replication options used – Usage/Measurement: Restarting of resources when unavailable – Usage/Measurement: Applied Throttling of requests • Practical suggestions: – The strict definition of the Rules class to be concerning the necessary preconditions to apply – Note field as placeholder for the actual SLA text that refers to a given block
  • 44. GAE Datastore SLA @SLALOM(4/11) 44 Sample definition sc: INTERNAL_ERROR Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls. Type of operation: API calls Several type of operations are defined. An example is provided here. SAMPLE_001SAMPLE_001 SAMPLE_001
  • 45. GAE Datastore SLA @SLALOM(5/11) 45 Sample definition sc: INTERNAL_ERROR Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls. Type of operation: API calls Several type of operations are defined. An example is provided here. SAMPLE_001SAMPLE_001PARAM_003 PARAM_003
  • 46. GAE Datastore SLA @SLALOM(6/11) 46 SAMPLE_001SAMPLE_001PARAM_003 PARAM_003 Boundary period and error definitions bp > 300 sec The exact wording is “five consecutive minutes”. ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”. PARAM_002 PARAM_002 PARAM_001 PARAM_001
  • 47. GAE Datastore SLA @SLALOM(7/11) 47 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 • Calculation of duration of sampling period: - The period during which a number of samples was received - Period duration calculation based on samples timestamp • Calculation of actual Error Rate for sampling period: - Number of violation samples / number of total samples - Violation samples: samples containing values from a specific values pool ER_001 ER_001 SAMPLE_001 SAMPLE_001 PARAM_003 SAMPLE_001DUR_001 DUR_001 SAMPLE_001 DUR_001 ER_001 PARAM_003
  • 48. GAE Datastore SLA @SLALOM(8/11) 48 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 • Calculation of Unavailability Interval - IF [Sampling Period duration > Boundary Period] - AND IF [Error Rate > Thershold (10%)] - THEN [Unavailability Interval = Sampling Period duration] ER_001 ER_001 QDT_001 DUR_001 DUR_001 PARAM_001 PARAM_002 DUR_001 QDT_001 QDT_001 QDT_001 QDT_001 ER_001 PARAM_002 DUR_001 PARAM_001 QDT_001 DUR_001
  • 49. GAE Datastore SLA @SLALOM(9/11) 49 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 • Calculation of Unavailability period - It equals the SUM of Unavailability Intervals ER_001 DUR_001 QDT_001 QDT_001 UAP_001 UAP_001 UAP_001 QDT_001 UAP_001
  • 50. GAE Datastore SLA @SLALOM(10/11) 50 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 ER_001 DUR_001 QDT_001 UAP_001 UAP_001 BP_001 BP_001BP_001 BP_001 CFA_002 CFA_002 CFA_002 • Calculation of Cloud Service Availability • Based on: - Billing period - The Cloud Service Unavailability CFA_002 BP_001 UAP_001
  • 51. GAE Datastore SLA @SLALOM(11/11) 51 SAMPLE_001SAMPLE_001 PARAM_003 PARAM_002 PARAM_001 ER_001 DUR_001 QDT_001 UAP_001 BP_001 CFA_002 • SLA Violation Condition - i.e.: Availability < 99.95% PARAM_004CFA_002 CFA_002 PARAM_004 PARAM_004 PARAM_004 ASV_001 ASV_001 ASV_001