Cloud Access Reliability Initiative

Cloud Access Reliability
Engineering Initiative
Interoperable SLAs for digital supply chains
Technical Prospectus
4th November 2018

CLOUD SERVICE
(SaaS/PaaS)
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
DEVICE
APPLICATION
SOFTWARE
Overview of the problem: digital supply chains exist to support distributed applications 2
DIGITALSUPPLYCHAIN
The increasingly variability has making it harder to reason about performance and
engineer reliability to meet that demand. We have to decide where to put the compute,
and when to communicate. The number of design and configuration degrees of
freedom are rising: location, capacity, scheduling, loss vs delay.
Questions about too much demand or not enough supply have become hard to answer.
Safety margins have become opaque. The trade-offs of cost and performance lack
predictable outcomes. The resulting unreliability is driving user frustration – the
“motherbuffer”, which in turn creates costly workarounds. The main one has been
“spend money on bandwidth” – but this doesn’t work to solve growing variability.
The answer is the software industry equivalent of “containerisation”: what
the “rack” does for cloud hardware, we need for cloud services. But how?
Users have portfolios of cloud applications that they wish to use. These have a mixture
of availability demands. The telecoms-cloud industry is building supply using a variety of
technologies, such as 5G slicing, SDN/NFV, SD-WAN, serverless computing, and WiFi.
That supply is becoming ever more dynamic (higher speeds, more statistical resource
sharing, more wireless) and more distributed (e.g. network function virtualisation, edge
apps). Technologies like cloud also have a start-up latency for the container versus “bare
metal” computing, as well as having to wait for shared resources (as with packet data).

LAN
Access network
Longhaul network
Cloud application container
SD-WAN
VPN
DATA CENTRE
USER DEVICE EXPERIENCE
Presentation layer runtime
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
Digital supply chains are systems of supply and demand… just like any other industry 3
DEMANDFORCOPYINGDATADOWNSTREAM
DEMANDFORCOPYINGDATAUPSTEAM
SUPPLYOFINFORMATIONDOWNSTREAM
SUPPLYOFINFORMATIONUPSTREAM
Industries typically have a
standard unit of supply and
demand that meaningfully
“adds up” and interoperates
Natural gas BTU
Electricity MW
Water litre
Corn bushel
Sugar pound
Oil barrel
Ethanol gallon
Copper ton
Wool kilogram
Gold ounce
Shipping 40ft container
Cloud application access = ?

Wi-Fi Ethernet
xDSL Cable FTTx 2G/3G/4G/5G
Public
Internet
Private cloud
global access
MPLS/Carrier
Ethernet/etc.
Hosted app
Public or
private cloud
Serverless
functions
SD-WAN No SD-WAN
VPN No VPN
DISTRIBUTED APPLICATION SOFTWARE
VDI Web browser UC/VoIP AppTV
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
Digital supply chains involve complex interactions between multiple technology stacks 4
HORIZONTAL INTEROPERABILITY
VERTICALINTEROPERABILITY
ICT suppliers want
to manage “vertical”
interoperability:
How do I deliver enough
reliability end-to-end to meet
the customer’s need?
How can I optimise cost for
my sub-path without
sacrificing that reliability?
End users demand
“horizontal”
interoperability:
How do I know what on-
premise capabilities I need?
How do I select a service
provider and network access
technology and know my
applications will work?

Wi-Fi Ethernet
Public
Internet
Private cloud
global access
MPLS/Carrier
Ethernet/etc.
Hosted app
Public or
private cloud
Serverless
functions
SD-WAN No SD-WAN
VPN No VPN
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
“Vertical” interoperability needs standardised and (de)composable “availability SLAs” 5
The end-to-end
requirement is for a
bounded probability of
packet latency and loss.
The concept of “quality
attenuation” unifies
latency and loss into a
single mathematical
object, analogous to how
complex numbers bring
together real and
imaginary numbers.
”Quality attenuation” can
be expressed as
composable “availability
SLAs” (when using ∆Q
metrics) that “add up”
along the end-to-end path.
END-TO-ENDQUALITYREQUIREMENT

Wi-Fi Ethernet
Public
Internet
Private cloud
global access
MPLS/Carrier
Ethernet/etc.
Hosted app
Public or
private cloud
Serverless
functions
SD-WAN No SD-WAN
VPN No VPN
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
“Horizontal” interoperability decouples operational implementation from the availability SLA 6
Availability is something you can
only lose: the baseline is “always
available”, and every element can
only detract from that standard of
perfection!
We can use the “availability SLAs”
to create a “budget” for quality
attenuation for each sub-system
and element of the system.
”Horizontal” interoperability
means we can safely make
independent operational choices
over how to meet that “budget”.
As long as we are “under budget”
(to an agreed probability) then we
know the end-to-end availability
requirement will still be met.

Wi-Fi Ethernet
Public
Internet
Private cloud
global access
MPLS/Carrier
Ethernet/etc.
Hosted app
Public or
private cloud
Serverless
functions
SD-WAN No SD-WAN
VPN No VPN
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
Our industry challenge: to develop a supply chain quality management system 7
VERTICALINTEROPERABILITY
Relating “vertical” to
“horizontal” needs a
common language of
metrics (for SLAs) and
measures (for operations):
Standardised metrics
Standardised SLAs
Standardised operational
measurement methods
Standardised service
lifecycle management
processes
Standardised network
quality assurance
mechanisms

DISTRIBUTED COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIBUTED
COMPUTING
DISTRIB
UTED
COMPU
TING
APPLICATION
PROVIDER
TELCO &
CLOUD ACCESS
CUSTOMER
PREMISES
APPLICATION
PROVIDER
We have to create the performance integration framework (“glue”) for interoperable SLAs 8
Other industries have solved this
reliability engineering integration
problem. For instance, the oil
industry can relate “upstream”
extraction and refining activities
to “downstream” distribution.
Telecoms had this capability
during the telephony era with
erlangs as a rational unit of supply
and demand. We lost this when
we moved to packet-based
statistical multiplexing.
∆Q-based “availability SLAs” are
”cloud access erlangs”. We can
now performance engineer digital
supply chains, create new
assurance revenue and reduce the
cost of workarounds and failure.

Requirements Challenges Opportunities
Define availability SLA
…in the user/customer’s own terms
Availability SLA metrics
…don’t sufficiently reflect QoE and
aren’t properly composable/causal
Capture the basic science
…as ∆Q calculus is the only known
scientifically sound approach
Solving these reliability engineering problems needs interoperable metrics and measures 9
Decompose SLA
…’vertically’ in supply chains to
specify SLA requirement on each
subsystem and sub-subsystem
Market differentiated SLA
…to win the customer’s business in
‘horizontal’ competitive market
Operationally deliver SLA
…to meet our service availability
promises (with optimised cost
and risk trade-offs for profitability)
SLA operational measures
…aren’t calibrated against testing
and inspection reference standards
SLA management models
…don’t fully incorporate established
and proven management theory
(e.g. 6-sigma, TOC, lean, Vanguard)
SLA assurance methods
…require improved or new business
processes across service lifecycle
Create standard metrics
…that accurately reflect availability
in the customer’s own terms
Calibrate measures
…to manage error bounds,
done based on shared cost and IPR
Construct assurance SLAs
…that have the essential
“horizontal” and “vertical”
interoperability properties

…as ∆Q performance calculus is the
only scientifically sound approach
Calibrate measures
We have a strong baseline of ∆Q research and technology that now requires industrialisation 10
Fully developed ∆Q theoretical framework.
Quality attenuation theory training materials available.
“Wind tunnel” for cloud apps to establish ∆Q-based SLA.
Demonstrated ability to “budget” performance using ∆Q.
Mature first generation ∆Q measurement system.
Industrialisation and scaling (e.g. TWAMP) in progress.
Contention Management technology developed and
demonstrated to assure ∆Q SLA even in overload.
Operational trial platform available for shared use
(collaboration with Just Right Networks Ltd & SureTec Ltd).
15+ years of development by leading team of reliability engineering and distributed systems experts
Proven in application at both tier 1 network operators as well as many boutique and exotic applications

…as ∆Q performance calculus is the
only scientifically sound approach
Calibrate measures
The next steps are clear… and interoperability by its nature is a collaborative activity 11
Develop ex ante reliability engineering curriculum from existing
materials — ‘train the trainer’ and disseminate to industry
Create reference ∆Q-based SLAs for common application types
Construct software libraries to persist and manipulate the SLAs
Frictionless deployment of packet observation end points
Use high-fidelity measures to audit lower-fidelity measures
Establish ‘state of the possible’ (i.e. highest yield of QoE at
lowest cost and risk) to establish technical reference study that
feeds into business case for industry investment in initiative.
The time has come to ‘up our game’ and collectively engage with the problem of interoperable metrics
This requires an industry effort that no single player can deliver as every process and system is affected

We help you to apply these reliability engineering breakthroughs
to solve the cloud access interoperability and integration problem
for distributed applications in complex digital supply chains
We feed specialist skills into
the development of existing
industry initiatives and
projects.
Examples: 5G, SDN/NFV, Zero Touch
Automation, UCaaS, distributed apps
(inc. blockchain), machine learning
We adapt established quality
management methods from
other industries, and drive
the adoption of new scientific
metrics and measures that
are suitable for our own.
We develop methods,
standards and tools by a
process of action learning. We
target the full service lifecycle
(product development,
marketing and sales, in-life
service and support).
12
Purpose of the Cloud Access Reliability Engineering Initiative 12
…performance integration
technology
…management methods and
processes
…skills, people and
relationships
Cloud access reliability engineering…

NETWORK DEMAND NETWORK SUPPLY QoE ‘SLAZARD’
High-fidelity network measures
We know the ‘interoperable unit’ answer: the ∆Q calculus (see http://qualityattenuation.science/) 14

∆Q(A) ∆Q(B) ∆Q(C)
VA
SA
GA
VB
SB
GB
VC
SC
GC
+
+
+
+
+
+
=
=
=
∑V
∑S
∑G
SUPPLIER A SUPPLIER B SUPPLIER C
∆Q(∑ A+B+C)
15
Variable delay due to load
Size of packet delay
Geographic delay
∆Q|G
∆Q|S
∆Q|V
Packet size
One-waydelay
G/S/V are independent probability
functions using improper random
variables or improper cumulative
distributions. These can be
(de)convolved and “budgeted”
along the supply chain using
(de)composable “quality SLAs”.
∆Q metrics have an algebra for engineering predictable performance (& nothing else does!) 15

Other metrics and measures ∆Q-based metrics
and measures
Be a strong proxy for QoE Yes: e.g. effective bandwidth, Actual Experience Yes
Isolate problems in supply chains Partial: Correlation, but not strong causation Yes
Offer an auditable evidence chain No: Would not stand up in court as standard of proof Yes
Be non-intrusive Some: Only for passively observed single point average
metrics; others more like DoS attacks!
Yes
Work for all types of bearer Some: Separate worlds of cable, 5G, SDN/NFV all doing
own thing. No user-centric end-to-end view.
Yes
Be cheap to gather and operate Some: But high fidelity remains expensive. Yes
Be non-proprietary Some: Cheap to gather data is of low fidelity Yes
Have a scientific basis Partial: Have scientific basis, but limited generality Yes
Able to define ‘safety margin’ Partial: Only weak proxies for safety margin Yes
Can engineer spatial (location of
compute, routing of data) as well as
temporal (scheduling of resources)
Partial: Some ability to separate static from dynamic but
no formal algebra
Yes
SLAs are (de)composable No: Composition not a meaningful operation Yes
There are other metrics and measures, some of which meet some needs, but none meet all 16

Martin Geddes
mail@martingeddes.com

Cloud Access Reliability Initiative

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Cloud Access Reliability Initiative

Similar a Cloud Access Reliability Initiative (20)

Más de Martin Geddes

Más de Martin Geddes (20)

Último

Último (20)

Cloud Access Reliability Initiative

Notas del editor