SlideShare una empresa de Scribd logo
1 de 70
Descargar para leer sin conexión
Building a cloud service on a cloud infrastructure at



                     Also, cloud.
              Mikhail Panchenko, Surge 2011
Who Am I?
Pancakes
Infrastructure Engineer at SimpleGeo
Backend Engineer at Flickr before that
Backend and Frontend Engineer at Yahoo!
Ops/Tools before that
Philosophy, Economics, and French major
before that



@mihasya
pancakes@simplegeo.com
Tools for mobile/geo developers
Primarily focused on services, some data-
oriented APIs
PaaS, I guess? I've lost track a bit
Availability, redundancy part of brand
   Our outage = your outage
No pressure
Agenda
Goals
A little bit of theory

Challenges in The Cloud

General Architecture
Implementation Details
Architectural Goals
High availability

Linear scalability

Elasticity/Flexibility

Redundancy/Fault Tolerance
Read: don't wake me up, please
Sound Familiar?
Some Theory, Food for Thought
The Internets as Complex Systems
http://www.amazon.com/Normal-Accidents-Living-High-Risk-Technologies/dp/0691004129
"Complex interactions are those of unfamiliar
   sequences, or unplanned and unexpected
    sequences, and either not visible or not
        immediately comprehensible."

Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 78). Kindle Edition.
"The notion of baffling interactions is increasingly
familiar to all of us. [...] As systems grow in size and
 in the number of diverse functions they serve, and
       are built to function in ever more hostile
environments, increasing their ties to other systems,
 they experience more and more incomprehensible
   or unexpected interactions. They become more
    vulnerable to unavoidable system accidents."

  Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
Fortunately,
This Is Only The Internet
"The beauty of this is its simplicity. Once a plan
 gets too complex, everything can go wrong."

               Walter Sobchak, The Big Lebowski
Interactions
Linear vs Complex
Coupling
Tight vs Loose
Three Mile Island
   "... they found that radioactive water was not
traveling to the tank they intended, but because of
complex flow and pressure interactions, was going
 to a different, wrong tank, which also overflowed,
          this time in the auxiliary building."

Charles Perrow. Normal Accidents: Living with High-Risk Technologies (pp. 22-23). Kindle Edition.
Amazon Web Services
   "The traffic shift was executed incorrectly and
rather than routing the traffic to the other router on
the primary network, the traffic was routed onto the
      lower capacity redundant EBS network."

   "Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region"

                        http://aws.amazon.com/message/65648/
Common Theme
Previously independent systems become
  coupled as a result of unanticipated
 interactions, leading to fundamentally
            surprising results
When pumping radioactive water into the wrong
tank, the behavior of the program is undefined
But where does The Cloud come in??
The Trifle Analogy




  Photo by mathematically_impossible
The Trifle Analogy




  Photo by mathematically_impossible
A complex system consisting of complex subsystems
Photo by wwarby
The Trifle Analogy




Original photos by mathematically_impossible and miheco
Tightly coupled to a complex system over which you
 have no control and into which you have no insight
Photo by 20after4
Recall
"Baffling Interactions"
"The notion of baffling interactions is increasingly
familiar to all of us. [...] As systems grow in size and
 in the number of diverse functions they serve, and
      are built to function in ever more hostile
    environments, increasing their ties to other
      systems, they experience more and more
incomprehensible or unexpected interactions. They
  become more vulnerable to unavoidable system
                        accidents."

  Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
DECOUPLE DECOUPLE DECOUPLE
     ( also, simplify )
Photo by erikcharlton
Decouple Your Subsystems
Shared resources are the most common
source of unexpected interaction

Resist temptation to double up on roles

Use queues, caches as buffers
  NOTE: those are complex
  subsystems of their own
Decouple Your Subsystems
    Explicit Decoupling
CPU Affinity
  Webserver on 1-7; SSH etc on 8
  Crude, but gets the job done

More robust solutions - containers
Decouple Your Functionality
Service architecture

Each service does one thing well

Easier to measure, understand, and
accommodate resource demands

Reduce potential for interactions,
cross-functional failure
Decouple from Your Environment with Configuration
                  Management
     Decouple from your platform (OS/kernel)
        Easy to test/bench potential candidates
        Easy to migrate if you find a winner
        This is especially important when dealing with cloud
     Automate as much of deploy/bootstrap
     process as possible
        Probably won't help much during a provider outage
        due to stampede
        BUT: DirectConnect
        You might not always be in the cloud..
Decouple Your Datacenters
Most robust redundancy mechanism
Hot-hot keeps you on your toes

Simplifies, not just for the cloud
  Yahoo! now foregoing datacenter
  features like HVAC
  "If it gets too hot in Washington,
  turn that DC off for a while"
  I'm sure they're not the only ones
Decouple Your Datacenters
"AZ" - Basic building block for EC2

This is the level they (theoretically)
decouple at

They are probably thinking along the
same lines we are - must be able to turn
off one AZ without impact in the other
( there's a hidden interaction there )
Every datacenter as an independent microcosm of
           your overall architecture
The Birds 'n' the Bees
Bird's Eye View
Photo by reschroederimages
Bird's Eye View
( note the absence of specifics )
Bird's Eye View
Maintenance - Divide & Conquer
Local Degradation - Divide & Conquer
Incompatible Upgrade - Guess!
Incompatible Upgrade - Guess!
Incompatible Upgrade - Yay!
Baffling Single Node Failure
202 Accepted
Spike in Write Traffic
Really simple operational steps for stressful tasks
                   & situations
Temporally decouple the problem from the
               resolution
Go back to sleep




    Photo by joshme17
Now, how about those specifics?
Write Path
ELB
Dynamic Load Balancing

Flexible virtual IP
Easy to add/remove AZs

Uses healthchecks to automatically
evict nodes
Gate - "Layer 8 Proxy"
Lightweight Node.js daemon
OAuth

Rate Limiting

Basic routing to actual services
Recall
"Decouple Your Functionality"
Services - Pick Your Own Adventure
Node.js and Python
  Some people just hate Node.js

Can be anything, as long as Gate can
talk to it
   ( another reason to decouple )

Highly specialized
RabbitMQ
A grenade for our knife-fight

Very flexible - more than we need
  Simplification candidate

New persistor in >= 1.3 - degradation
over failure

See talk at 1:30PM
Cassandra
A mostly-textbook DHT

Homogenous distributed model

Random load distribution
Partition tolerance
  A perfect foundation for our
  architecture
Partition Tolerance
It's not just for outages
Recall
"Divide & Conquer"
This too is a partition
Thank You!
@mihasya

pancakes@simplegeo.com

Más contenido relacionado

Similar a Building a cloud service on a cloud infrastructure. Also, cloud.

CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
Daniel Abadi
 
Virtualization Techniques & Cloud Compting
Virtualization Techniques & Cloud ComptingVirtualization Techniques & Cloud Compting
Virtualization Techniques & Cloud Compting
Ahmed Mekkawy
 
NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic Web
Irina Hutanu
 

Similar a Building a cloud service on a cloud infrastructure. Also, cloud. (20)

CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systems
 
The Enterprise Cloud
The Enterprise CloudThe Enterprise Cloud
The Enterprise Cloud
 
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A Study
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
 
Melbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new ArchitectureMelbourne Microservices Meetup: Agenda for a new Architecture
Melbourne Microservices Meetup: Agenda for a new Architecture
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
AWS vs. Azure
AWS vs. AzureAWS vs. Azure
AWS vs. Azure
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Scalable service architectures @ VDB16
Scalable service architectures @ VDB16Scalable service architectures @ VDB16
Scalable service architectures @ VDB16
 
Virtualization Techniques & Cloud Compting
Virtualization Techniques & Cloud ComptingVirtualization Techniques & Cloud Compting
Virtualization Techniques & Cloud Compting
 
Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
Simple Solutions for Complex Problems
Simple Solutions for Complex Problems Simple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic Web
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
 
Time and ordering in streaming distributed systems
Time and ordering in streaming distributed systemsTime and ordering in streaming distributed systems
Time and ordering in streaming distributed systems
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Building a cloud service on a cloud infrastructure. Also, cloud.