SlideShare una empresa de Scribd logo
1 de 29
HIGH SCALABILITY AND
           RELIABILITY IN THE
           CLOUD
           GREG THOMPSON
           HEAD OF ARCHITECTURE, APPS ENABLEMENT
           ALCATEL-LUCENT

@gmthomp   greg.thompson@alcatel-lucent.com
About This Session
   Target audience is backend application
    developers deploying infrastructure into a
    cloud environment
   Will cover concepts for scalability and
    reliability with the goal of helping application
    developers understand some key
    considerations when designing and building
    the backend.
Design Time Decisions
   When first building your application backend,
    consider a few important questions
     How fast should the application be recovered if a
      failure occurs?
     What kind of down time is acceptable?
     Is the application maintaining stateful data?
     What kind of information needs to be shared across
      multiple instances?
Scalability
What is Scalability?
   Scalability is a term
    used to describe
    how the application
    will handle
    increased loads of
    traffic volume
Scalability – Factors to Consider
   Horizontal vs. Vertical
   Stateless vs. Stateful
   Understanding Limitations
   Connection Management
   Segmentation of traffic
   Segmentation of responsibility (distributed arch)
   Clustering
   Messaging
What Type of Scalability?
Vertical vs. Horizontal
Vertical                        Horizontal
   Scaling up a single            Scaling out across
    node                            multiple nodes
     Physical limitations –         Ability to distribute
      instances are very
      powerful but still have         traffic over a number
      finite limits                   of nodes
     Resources such as              Allows for more
      number of sockets               flexibility over time
      can only go so high
Will the App Maintain State?
Stateless Applications
   Application does not
    persist information
    about transactions     Request       Respons
                                         e
   Each transaction is
    independent and            Application
    atomic
Will the App Maintain State?
Stateful Applications
   Application needs to
    maintain data about
    transactions in
                           First         Subseque
    progress               Request       nt
                                         Request

   Requires storage                            D
                               Application      B
   Persistence may also
    be required
    depending the
Understanding Limitations
   Thorough testing is
    key to understanding
    bottlenecks
   Test real-world
    scenarios included
    latency
   Push the system to
    the max to
    understand how it
Connection Management
Mobile Device Connections
   Mobile devices don’t always
    behave like you expect
       Connectivity is often very
        dynamic
       Devices move from 4G/3G/2G/no
        G/Wifi
       Not all TCP events will get
        reported and sockets can remain
        open
   If not handled correctly, these
    factors can be time bomb no
    matter how vertically you scale a
    component
Segmenting Traffic
   Once the application is
    able to be scaled out,
    traffic can be
    segmented in different
    ways
       Location (i.e. east coast
        vs. west coast)
       Pre-assigned criteria -
        User ID, IP, or other
        dynamic criteria
       Load Balanced
Segmenting Responsibility
   Segmenting
    responsibility allows for
    a distributed
    architecture
       Each component can be
        scaled independently
       Allows for more flexibility
        in scaling
       Adds more complexity
        and potential messaging
        overhead
Clustering
   Clustering is the
    concept of having a
    group of nodes working     App   App   App   App
                               Nod   Nod   Nod   Nod
    together to provide the     e     e     e     e
    same capability
       Nodes typically co-            Share
        located                          d
       Common data shared             Data
        as needed across the
        cluster
       Communication may be
        needed between nodes
Messaging
   Once a clustered          Types of Messaging
    and/or distributed          JMS
    architecture is used        Open Source MQ
    messaging will be            packages
    needed between              Custom Designed
    various components          Use of APIs
    and/or nodes
Example of Scaled Architecture
             Load                                 Load
               Load                                 Load
            Balancer                             Balancer
             Balancer                             Balancer

  Web         Compone     Compone      Web         Compone     Compone
    Web
 Server         Compone
                nt 1        Compone
                            nt 2         Web
                                      Server         Compone
                                                     nt 1        Compone
                                                                 nt 2
   Server          nt 1        nt 2     Server          nt 1        nt 2




              Database                             Database

               Site 1                               Site 2
Reliability/Availability
What is Reliability/Availability?
   Availability is typically
    measured by the amount of
    downtime your application
    has in a given year
       Unplanned downtime and
        planned downtime are both
        considered
   Reliability is described by the
    likelihood of failure based on
    actual measurements
   We’ll focus more on
    Availability
Reliability/Availability
Factors to Consider
   Cost vs. Need
   Problem detection
   Automation for recovery
   Active/standby, active/active, hot standby vs. cold
    standby
   Local and Geo-redundancy
   Multi-zone, multi-cloud
   Test Until You Break the System
Reliability Requirements
Cost Considerations       Need

   Number of instances      User Experience
   Bandwidth                Customer
    requirements              requirements
    between sites
                             Negative Publicity
   Complexity of
    software
   Monitoring
Problem Detection
   Effective monitoring of
    the application is key to
    minimizing downtime
       Event reporting in the
        software
       External monitoring –
        test for successful
        behavior
       Auto detection and
        alerting to minimize cost
        of operations personnel
Automation for Recovery
   How quickly a failed
    component recovers
    increases reliability
     Automatic detection
      and automatic
      recovery
     Automated installation
      key for minimizing
      setup time during
      recovery
Availability Models
   N = number of nodes
    required for normal     N   N
    processing
   N+1 = one additional
    node to provide         N   N   +1
    redundancy in case of
    failure
   N+K = K nodes provide   N   N   K    K
    additional redundancy
Redundancy Models
   Active/Cold Standby                    Cold
       backup site is booted    Active   Standb
        up when needed                       y

   Active/Hot Standby
                                          Active
       Backup site is running   Active   Standb
        and ready to takeover                y

   Active/Active
       Both sites active and    Active   Active
        processing traffic
Local and Geo-Redundancy
   Local                       Geo-Graphic
     Backup  instances           Backup   instances
      are available within         are available in
      the same location            another geo-graphic
                                   location
     Use of availability
                                  Typically in a
      zones within a               separate region to
      region very similar          account for events
                                   such as natural
                                   disasters
Availability to the Max
   Multi-Zone/Multi-              Multi-Cloud
    Region
                                     Ifyour application
     Multi-zone typically
                                      requires the
      provide instances
      running in different            maximum possible
      physical locations, but         availability
      in same region                 Run in different
     Multi-region provides           cloud providers in
      different geographic
      regions of availability
                                      different regions
Test Until You Break the System
   Push the system to
    the max and observe
    the breaking points
   Fix the problem,
    repeat
   The best way to find
    problems to prevent
    unplanned downtime
    is to thoroughly test
    with a mindset to
    break
Q&A
THANK YOU!
Greg Thompson
@gmthomps
greg.thompson@alcatel-lucent.com

Más contenido relacionado

La actualidad más candente

Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureMonishaNehkal
 
Networking in cloud computing
Networking in cloud computingNetworking in cloud computing
Networking in cloud computingBarani Tharan
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a CloudDivya S
 
Cloud computing using Eucalyptus
Cloud computing using EucalyptusCloud computing using Eucalyptus
Cloud computing using EucalyptusAbhishek Dey
 
QUALITY OF SERVICE(QoS) OF CLOUD
QUALITY OF SERVICE(QoS)OFCLOUDQUALITY OF SERVICE(QoS)OFCLOUD
QUALITY OF SERVICE(QoS) OF CLOUDRashmi Agale
 
Security Issues of Cloud Computing
Security Issues of Cloud ComputingSecurity Issues of Cloud Computing
Security Issues of Cloud ComputingFalgun Rathod
 
An introduction of cloud storage
An introduction of cloud storage An introduction of cloud storage
An introduction of cloud storage Wenbin Zhao
 
Evolution of Cloud Computing
Evolution of Cloud ComputingEvolution of Cloud Computing
Evolution of Cloud ComputingNephoScale
 
Cloud computing virtualization
Cloud computing virtualizationCloud computing virtualization
Cloud computing virtualizationAyaz Shahid
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architectureAdeel Javaid
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...Shashi soni
 

La actualidad más candente (20)

Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing Architecture
 
Cloud Computing Technology Overview 2012
Cloud Computing Technology Overview 2012Cloud Computing Technology Overview 2012
Cloud Computing Technology Overview 2012
 
Benefits of Cloud Computing
Benefits of Cloud ComputingBenefits of Cloud Computing
Benefits of Cloud Computing
 
Networking in cloud computing
Networking in cloud computingNetworking in cloud computing
Networking in cloud computing
 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a Cloud
 
Cloud computing using Eucalyptus
Cloud computing using EucalyptusCloud computing using Eucalyptus
Cloud computing using Eucalyptus
 
Cloud Service Models
Cloud Service ModelsCloud Service Models
Cloud Service Models
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
QUALITY OF SERVICE(QoS) OF CLOUD
QUALITY OF SERVICE(QoS)OFCLOUDQUALITY OF SERVICE(QoS)OFCLOUD
QUALITY OF SERVICE(QoS) OF CLOUD
 
Security Issues of Cloud Computing
Security Issues of Cloud ComputingSecurity Issues of Cloud Computing
Security Issues of Cloud Computing
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
An introduction of cloud storage
An introduction of cloud storage An introduction of cloud storage
An introduction of cloud storage
 
Evolution of Cloud Computing
Evolution of Cloud ComputingEvolution of Cloud Computing
Evolution of Cloud Computing
 
Cloud computing virtualization
Cloud computing virtualizationCloud computing virtualization
Cloud computing virtualization
 
Cloud security
Cloud securityCloud security
Cloud security
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
 
Cloud Computing Using OpenStack
Cloud Computing Using OpenStack Cloud Computing Using OpenStack
Cloud Computing Using OpenStack
 
Azure Cloud PPT
Azure Cloud PPTAzure Cloud PPT
Azure Cloud PPT
 

Destacado

Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerancegaurav jain
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.Prachyanun Nilsook
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availabilitys2page
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability GuideNick DeNardis
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controlslylcheng88
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud ComputingCristian Klein
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdfErin O'Neill
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computingKruthikka Palraj
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure RESTguestb2ed5f
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons LearnedJonas Bonér
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Nuxeo
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Shailendra Prasad
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering julia121214
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Mohamed Zahran
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storageJeff Spencer
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computingAsmaa Ibrahim
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacyAdeel Javaid
 

Destacado (20)

Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
 
Scalability Design Principles - Internal Session
Scalability Design Principles - Internal SessionScalability Design Principles - Internal Session
Scalability Design Principles - Internal Session
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availability
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability Guide
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
 
Buffer manager
Buffer managerBuffer manager
Buffer manager
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud Computing
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computing
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure REST
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storage
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacy
 

Similar a Scalability and Reliability in the Cloud

Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Jorgen Thelin
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloudMichael Kopp
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive ArchitectureKnoldus Inc.
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startupsSekhar Mohanty
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksMário Almeida
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud NetApp
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...PROIDEA
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Young Suk Ahn Park
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...OPNFV
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud ComputingLawrence Wilkes
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yangOW2
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Compuware APM
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2종필 김
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333종필 김
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiabilityFrank Gielen
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservicesMithun Arunan
 
Clearing the air on Cloud Computing
Clearing the air on Cloud ComputingClearing the air on Cloud Computing
Clearing the air on Cloud ComputingKarthik Sankar
 

Similar a Scalability and Reliability in the Cloud (20)

Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud Computing
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yang
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiability
 
Spo1 w25 spo1-w25
Spo1 w25 spo1-w25Spo1 w25 spo1-w25
Spo1 w25 spo1-w25
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
 
Clearing the air on Cloud Computing
Clearing the air on Cloud ComputingClearing the air on Cloud Computing
Clearing the air on Cloud Computing
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Scalability and Reliability in the Cloud

  • 1. HIGH SCALABILITY AND RELIABILITY IN THE CLOUD GREG THOMPSON HEAD OF ARCHITECTURE, APPS ENABLEMENT ALCATEL-LUCENT @gmthomp greg.thompson@alcatel-lucent.com
  • 2. About This Session  Target audience is backend application developers deploying infrastructure into a cloud environment  Will cover concepts for scalability and reliability with the goal of helping application developers understand some key considerations when designing and building the backend.
  • 3. Design Time Decisions  When first building your application backend, consider a few important questions  How fast should the application be recovered if a failure occurs?  What kind of down time is acceptable?  Is the application maintaining stateful data?  What kind of information needs to be shared across multiple instances?
  • 5. What is Scalability?  Scalability is a term used to describe how the application will handle increased loads of traffic volume
  • 6. Scalability – Factors to Consider  Horizontal vs. Vertical  Stateless vs. Stateful  Understanding Limitations  Connection Management  Segmentation of traffic  Segmentation of responsibility (distributed arch)  Clustering  Messaging
  • 7. What Type of Scalability? Vertical vs. Horizontal Vertical Horizontal  Scaling up a single  Scaling out across node multiple nodes  Physical limitations –  Ability to distribute instances are very powerful but still have traffic over a number finite limits of nodes  Resources such as  Allows for more number of sockets flexibility over time can only go so high
  • 8. Will the App Maintain State? Stateless Applications  Application does not persist information about transactions Request Respons e  Each transaction is independent and Application atomic
  • 9. Will the App Maintain State? Stateful Applications  Application needs to maintain data about transactions in First Subseque progress Request nt Request  Requires storage D Application B  Persistence may also be required depending the
  • 10. Understanding Limitations  Thorough testing is key to understanding bottlenecks  Test real-world scenarios included latency  Push the system to the max to understand how it
  • 11. Connection Management Mobile Device Connections  Mobile devices don’t always behave like you expect  Connectivity is often very dynamic  Devices move from 4G/3G/2G/no G/Wifi  Not all TCP events will get reported and sockets can remain open  If not handled correctly, these factors can be time bomb no matter how vertically you scale a component
  • 12. Segmenting Traffic  Once the application is able to be scaled out, traffic can be segmented in different ways  Location (i.e. east coast vs. west coast)  Pre-assigned criteria - User ID, IP, or other dynamic criteria  Load Balanced
  • 13. Segmenting Responsibility  Segmenting responsibility allows for a distributed architecture  Each component can be scaled independently  Allows for more flexibility in scaling  Adds more complexity and potential messaging overhead
  • 14. Clustering  Clustering is the concept of having a group of nodes working App App App App Nod Nod Nod Nod together to provide the e e e e same capability  Nodes typically co- Share located d  Common data shared Data as needed across the cluster  Communication may be needed between nodes
  • 15. Messaging  Once a clustered  Types of Messaging and/or distributed  JMS architecture is used  Open Source MQ messaging will be packages needed between  Custom Designed various components  Use of APIs and/or nodes
  • 16. Example of Scaled Architecture Load Load Load Load Balancer Balancer Balancer Balancer Web Compone Compone Web Compone Compone Web Server Compone nt 1 Compone nt 2 Web Server Compone nt 1 Compone nt 2 Server nt 1 nt 2 Server nt 1 nt 2 Database Database Site 1 Site 2
  • 18. What is Reliability/Availability?  Availability is typically measured by the amount of downtime your application has in a given year  Unplanned downtime and planned downtime are both considered  Reliability is described by the likelihood of failure based on actual measurements  We’ll focus more on Availability
  • 19. Reliability/Availability Factors to Consider  Cost vs. Need  Problem detection  Automation for recovery  Active/standby, active/active, hot standby vs. cold standby  Local and Geo-redundancy  Multi-zone, multi-cloud  Test Until You Break the System
  • 20. Reliability Requirements Cost Considerations Need  Number of instances  User Experience  Bandwidth  Customer requirements requirements between sites  Negative Publicity  Complexity of software  Monitoring
  • 21. Problem Detection  Effective monitoring of the application is key to minimizing downtime  Event reporting in the software  External monitoring – test for successful behavior  Auto detection and alerting to minimize cost of operations personnel
  • 22. Automation for Recovery  How quickly a failed component recovers increases reliability  Automatic detection and automatic recovery  Automated installation key for minimizing setup time during recovery
  • 23. Availability Models  N = number of nodes required for normal N N processing  N+1 = one additional node to provide N N +1 redundancy in case of failure  N+K = K nodes provide N N K K additional redundancy
  • 24. Redundancy Models  Active/Cold Standby Cold  backup site is booted Active Standb up when needed y  Active/Hot Standby Active  Backup site is running Active Standb and ready to takeover y  Active/Active  Both sites active and Active Active processing traffic
  • 25. Local and Geo-Redundancy  Local  Geo-Graphic  Backup instances  Backup instances are available within are available in the same location another geo-graphic location  Use of availability  Typically in a zones within a separate region to region very similar account for events such as natural disasters
  • 26. Availability to the Max  Multi-Zone/Multi-  Multi-Cloud Region  Ifyour application  Multi-zone typically requires the provide instances running in different maximum possible physical locations, but availability in same region  Run in different  Multi-region provides cloud providers in different geographic regions of availability different regions
  • 27. Test Until You Break the System  Push the system to the max and observe the breaking points  Fix the problem, repeat  The best way to find problems to prevent unplanned downtime is to thoroughly test with a mindset to break
  • 28. Q&A