SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Carbon Fiber Tank, SpaceX
How to lower the
costs of your Drupal
Site's resources and
plan Capacity in
advance
ricardoamaro sre@acquia
About me
@ricardoamaro
● Principal SRE @Acquia (Cloud Data Team)
● Joined in December 2011
● Location: Lisbon, Portugal
● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly)
● Founder and Lead of the Portuguese Drupal Association
● Fun Facts:
○ Presented in DevOps events including DrupalCons.
○ Dedicated father of 2 kids and still manages to study and write.
○ First Linux installation: Slackware in 1994.
○ Former theatre actor.
Agenda
What we will be talking about
The problem
What is Capacity
Why do Capacity Planning
Relation to Site Reliability Engineering
Budget & Capacity Planning
Load Testing
Performance Tuning vs. Capacity Planning
What to measure
How to measure
How to track capacity
Forecasting
First Easy Steps
Conclusions
The Problem
Site Launch & User Expectations
Falcon Heavy launch, Spacex
Typical Drupal Site Launch
What about
Capacity Planning??
- Disable devel
- Configure cron
- Check The Upload Sizes & Execution Time
- Check Recipient Email Addresses
- Set The File Permissions
- Protect Your Root Account
- Check Permissions
- Turn Off Error Reporting
- Handle 404 Errors Gracefully
- Check Robots.txt
- Combine Pathauto With Global Redirect
- Create A Maintenance Page
- Configure Caching
- Css And Javascript Optimisation
- Check Unpublished Content Is Not Visible
- Configure Statistics
- Monitor the Site
-
** Plan for Failure **
User Expectations
Drupal click screenshot
● The end goal of capacity
planning is a smooth and
speedy experience for the users
● Varies depending on what type
of application is and what
portion of the application they
interact with
No silver bullet
● Plenty of capacity but a slow
website or unavailable
● Capacity is only one part of
making the end-user experience
fast
● We want to measure and track
to make forecasts
● Intolerable amount of latency
should raise a flag
What is
Capacity
resources required to run your services
in the context you have chosen to run them
Carbon Fiber Tank, SpaceX
Capacity in Site Reliability Engineering (SRE)
● Capacity: The maximum amount of output a product deployment is
capable of completing in a given period of time
● Capacity planning: Process that determines the resources needed,
like people, instances, CPU, memory, time and more, for the company
to meet changing demands for its services
● In the Drupal World we focus mostly on serving WEB capacity
Resource management
The Art of Capacity Planning
Arun Kejariwal, John Allspaw
"O'Reilly Media, Inc."
● Ensure proper resources are
available to handle load
● Define procurement and an
approval process
● Justify capital needs
● Manage resources after
deployment
Why do
Capacity Planning
Kroger grocery store, Lexington Kentucky,
1947, by Brett Streutket
Quick and Dirty Math
● Only spend as much as you
actually need
● Be ahead of sharp growth
● Avoid emergencies
Stay Fast and Reliable
Site Reliability
Engineering
Rocket Laboratory, 1952
NASA/William A. Bowles
Ben Treynor - Google
...an SRE team is responsible for
the availability, latency,
performance, efficiency, change
management, monitoring,
emergency response, and capacity
planning of their service(s)...
“
“
Demand Forecasting and Capacity Planning
● Ensuring that there is sufficient
capacity and redundancy
● Serve projected future demand
with the required availability
● Ensure the required capacity is
in place by the time it is needed
● Take both organic and inorganic
growth into account
https://unsplash.com/photos/mexeVPlTB6k
How SRE advocates for Capacity Planning
● Perform regular load testing
● Incorporate SLOs on Capacity
● Capacity is critical to
availability, therefore the SRE
team leads capacity planning
initiatives and provisioning
https://unsplash.com/photos/DX9X0g0Cg88
Budget & Capacity Planning
Vintage Grow Your Money
by Chris Potter, ccPixs.com
Keeping the costs low
● Meet with Finance, Engineering
and Product
● Gather Systems and Application
metrics
● Use that data to justify the
investment Three forces that impact Capacity Planning
Product
FinanceEngineering
Plan
Load Testing
“Hope is not a strategy”
St. Margrethen - Load Test by Kecko
Load testing a Drupal stack
● How to load test?
“Hit it until it breaks”
● Include the points of failure in
the calculations
● Determining backend limits can
be tricky
● Use those resource ceilings as a
basis while predicting future
growth
https://docs.acquia.com/acquia-cloud/arch/
Database Backend Load Test
➔ How many queries/second (QPS)
can the DB server manage?
➔ How many QPS can it serve
before performance
degradation affects end-user
experience?
● What load will cause the
database to be unresponsive or
fail-over? Allowing to set alert
thresholds accordingly.
● What to expect from adding (or
removing) nodes to the
backend?
● When to begin sizing for a new
database capacity?
A Few Load testing Tools
simulate
● Loadrunner
○ http://bit.ly/microfocus-loadrunner
● Iago
○ https://github.com/twitter/iago
● JMeter
○ http://jmeter.apache.org/
collect
● Prometheus
○ http://www.prometheus.io/
● Signalfx
○ http://www.signalfx.com/
● Cacti
○ http://cacti.net
● Ganglia
○ http://ganglia.info
● Nagios
○ http://nagios.org/
https://www.gocomics.com/calvinandhobbes/1986/11/26
Performance Tuning
vs. Capacity planning
(different goals)
Top Speed
by Alexander Nie
What to measure
defining the metrics
End-of-life
by Dennis van Zuijlekom
Divide & Conquer
● Splitting nodes
● Understand capacity demands
of each node
● Measure more distinctly
● How requests or queries per
second affect resources
Identifying the key resources to measure
● Disk space (MB)
● Disk throughput (IOPS)
● CPU performance (FLOPS)
● RAM memory (MB)
● Network bandwidth (Mbps)
● Network IP pool (Netmask)
● Others
How to measure
Living Computer Museum, Seattle
http://www.brendangregg.com/Perf/linux_perf_tools_full.png
| Tools to measure on Linux servers |
Collecting resources on web servers
TODO: CODE
● Example script that
sends metrics to statsd
● Low footprint using
/proc, df and ps
● For a constant reliable
monitoring service use
collectd: https://collectd.org
or Telegraf:
https://www.influxdata.com/time-
series-platform/telegraf/
How to track Capacity
Store and display time-series
● Signalfx
● Cacti
● Ganglia
● Graphite
● Signalfx
● Datadog
● Ruxit
● LogicMonitor
● Sematext
● CoScale
● Riemann
● Prometheus
● Sensu
● Idera
● Bijk
● X-Pack
● vRealize Hyperic HQ
A couple of load testing tips
load testing Tutorials:
https://www.tutorialspoint.com/jmeter
https://www.blazemeter.com/load-testing
docker app for grafana:
https://github.com/kamon-io/docker-grafana-graphite
Forecasting
(predicting trends)
Numbers And Finance by SeniorLiving.org
Predict the future?
● Use Context & Math
● Make educated guesses
● Long-term view is generally
steady
● Generate estimates to sustain
growth
● Use an adjustable process
● Forecast guides autoscaling
policies
Ceilings and Historical data
● Daily storage consumption
example
● Metric: total available disk space
● Cumulative total provides an
historical perspective
● We can predict future needs
● Storage will probably be
exhausted in the ceiling to
where the line is headed
Curve fitting
● Curve fitting
● Creative & Scientific
● Stay ahead of growth
● Use time-series data
● Forecast by constructing new
data points beyond the known
● Reconciliation of what we know
and the best fit equation
● Consider context before math
y = mx+b
Forecasting Peak-Driven Resource Usage
● Track how the peaks change over time
● Extrapolate from that data to predict
future needs
● Identify the server resource ceilings
● Find a relation between resources and
application-level work
● Decide if we should scale vertically or
horizontally
● and perform proactive autoscalling
● Fityk is an Open Source
Software for nonlinear fitting
of analytical functions to data.
● Incorporate cfityk scripts into
automated curve fitting, like:
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Returns the formula:
4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2
Homepage: https://fityk.nieto.pl/
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Automating Forecasts with fityk & cfityk
Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
Forecasting with Machine Learning
Seeking SRE
Conversations About
Running Production Systems
at Scale
Publisher: O'Reilly Media
● Most popular method for
curve-fitting in fityk is
Levenberg-Marquardt
● ML is also an option for
forecasting (book I co-authored)
● Code examples and guides
https://github.com/ricardoamaro/MachineLearning4SRE
Start with Easy Steps
Get Started
1. Select a process owner.
2. Identify the resources to be measured.
3. Measure these resources.
4. Compare to maximum capacity.
5. Collect workload forecasts.
6. Use forecasts for IT resource requirements.
7. Map requirements onto existing utilizations.
8. Predict when the system will be out of capacity.
9. Update forecasts and utilizations.
Set a Goal!
● Two Classes:
○ Load: usually expressed in
arrival rate or peak rate of
requests hitting the service
eg. target for 10.000 authenticated concurrent
Drupal users
○ Performance: usually expressed
in the form of Service Level
Objectives
eg. 99th percentile of all requests should return
in less 500ms
Be proactive
( plan & document ahead)
Picasso drawing with Paloma and Claude at Villa la Galloise, 1953.
By Edward Quinn, EdwardQuinn.com.
Capacity Planning Dashboard
● Support your conclusions with
metrics in a dashboard
● Both manual scaling and auto
scaling decision should be based
on real data
● When to scale?
○ date and time (be alerted if needed)
● How to scale?
○ vertical, horizontal or diagonal scaling
(Example) Drupal Cluster Dashboard
type valu
e
limit/
node
ceiling
units
limit
(total)
current
(peak)
peak
%
Estimated
days left
Varnish
cache
28 1024 req/sec 2048 600 29% 830
Web 31 80 busy calls 160 145 90% 12
Database 15 60 connections 120 96 80% 36
Storage 14 30 TB 30 14 46% 21
Conclusions
Drive the system to the appropriate level of risk for the lowest cost.
Join us for
contribution opportunities
Thursday, October 31, 2019
9:00-18:00
Room: Europe Foyer 2
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
#DrupalContributions
9:00-14:00
Room: Diamond Lounge
9:00-18:00
Room: Europe Foyer 2

Más contenido relacionado

La actualidad más candente

Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Simplilearn
 

La actualidad más candente (20)

Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Docker Networking Deep Dive
Docker Networking Deep DiveDocker Networking Deep Dive
Docker Networking Deep Dive
 
Terraform: Infrastructure as Code
Terraform: Infrastructure as CodeTerraform: Infrastructure as Code
Terraform: Infrastructure as Code
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Introduction to Terraform and Google Cloud Platform
Introduction to Terraform and Google Cloud PlatformIntroduction to Terraform and Google Cloud Platform
Introduction to Terraform and Google Cloud Platform
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Docker networking Tutorial 101
Docker networking Tutorial 101Docker networking Tutorial 101
Docker networking Tutorial 101
 
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
 
Intro to containerization
Intro to containerizationIntro to containerization
Intro to containerization
 
A Guide to Adopting Kubernetes
A Guide to Adopting KubernetesA Guide to Adopting Kubernetes
A Guide to Adopting Kubernetes
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with Jaeger
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
 

Similar a Capacity Planning Infrastructure for Web Applications (Drupal)

Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
Kristofferson A
 

Similar a Capacity Planning Infrastructure for Web Applications (Drupal) (20)

[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Triple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityTriple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and quality
 

Más de Ricardo Amaro

Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
Ricardo Amaro
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
Ricardo Amaro
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
Ricardo Amaro
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
Ricardo Amaro
 
_ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum __ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum _
Ricardo Amaro
 

Más de Ricardo Amaro (11)

Web Devtoolspanel
Web DevtoolspanelWeb Devtoolspanel
Web Devtoolspanel
 
SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
 
The free software history and communities’ journey ahead
The free software history and communities’ journey aheadThe free software history and communities’ journey ahead
The free software history and communities’ journey ahead
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
 
_ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum __ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum _
 
Cck views
Cck viewsCck views
Cck views
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Capacity Planning Infrastructure for Web Applications (Drupal)

  • 1. Carbon Fiber Tank, SpaceX How to lower the costs of your Drupal Site's resources and plan Capacity in advance ricardoamaro sre@acquia
  • 2. About me @ricardoamaro ● Principal SRE @Acquia (Cloud Data Team) ● Joined in December 2011 ● Location: Lisbon, Portugal ● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly) ● Founder and Lead of the Portuguese Drupal Association ● Fun Facts: ○ Presented in DevOps events including DrupalCons. ○ Dedicated father of 2 kids and still manages to study and write. ○ First Linux installation: Slackware in 1994. ○ Former theatre actor.
  • 3. Agenda What we will be talking about The problem What is Capacity Why do Capacity Planning Relation to Site Reliability Engineering Budget & Capacity Planning Load Testing Performance Tuning vs. Capacity Planning What to measure How to measure How to track capacity Forecasting First Easy Steps Conclusions
  • 4. The Problem Site Launch & User Expectations Falcon Heavy launch, Spacex
  • 5. Typical Drupal Site Launch What about Capacity Planning?? - Disable devel - Configure cron - Check The Upload Sizes & Execution Time - Check Recipient Email Addresses - Set The File Permissions - Protect Your Root Account - Check Permissions - Turn Off Error Reporting - Handle 404 Errors Gracefully - Check Robots.txt - Combine Pathauto With Global Redirect - Create A Maintenance Page - Configure Caching - Css And Javascript Optimisation - Check Unpublished Content Is Not Visible - Configure Statistics - Monitor the Site - ** Plan for Failure **
  • 6. User Expectations Drupal click screenshot ● The end goal of capacity planning is a smooth and speedy experience for the users ● Varies depending on what type of application is and what portion of the application they interact with
  • 7. No silver bullet ● Plenty of capacity but a slow website or unavailable ● Capacity is only one part of making the end-user experience fast ● We want to measure and track to make forecasts ● Intolerable amount of latency should raise a flag
  • 8. What is Capacity resources required to run your services in the context you have chosen to run them Carbon Fiber Tank, SpaceX
  • 9. Capacity in Site Reliability Engineering (SRE) ● Capacity: The maximum amount of output a product deployment is capable of completing in a given period of time ● Capacity planning: Process that determines the resources needed, like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services ● In the Drupal World we focus mostly on serving WEB capacity
  • 10. Resource management The Art of Capacity Planning Arun Kejariwal, John Allspaw "O'Reilly Media, Inc." ● Ensure proper resources are available to handle load ● Define procurement and an approval process ● Justify capital needs ● Manage resources after deployment
  • 11. Why do Capacity Planning Kroger grocery store, Lexington Kentucky, 1947, by Brett Streutket
  • 12. Quick and Dirty Math ● Only spend as much as you actually need ● Be ahead of sharp growth ● Avoid emergencies Stay Fast and Reliable
  • 13. Site Reliability Engineering Rocket Laboratory, 1952 NASA/William A. Bowles
  • 14. Ben Treynor - Google ...an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)... “ “
  • 15. Demand Forecasting and Capacity Planning ● Ensuring that there is sufficient capacity and redundancy ● Serve projected future demand with the required availability ● Ensure the required capacity is in place by the time it is needed ● Take both organic and inorganic growth into account https://unsplash.com/photos/mexeVPlTB6k
  • 16. How SRE advocates for Capacity Planning ● Perform regular load testing ● Incorporate SLOs on Capacity ● Capacity is critical to availability, therefore the SRE team leads capacity planning initiatives and provisioning https://unsplash.com/photos/DX9X0g0Cg88
  • 17. Budget & Capacity Planning Vintage Grow Your Money by Chris Potter, ccPixs.com
  • 18. Keeping the costs low ● Meet with Finance, Engineering and Product ● Gather Systems and Application metrics ● Use that data to justify the investment Three forces that impact Capacity Planning Product FinanceEngineering Plan
  • 19. Load Testing “Hope is not a strategy” St. Margrethen - Load Test by Kecko
  • 20. Load testing a Drupal stack ● How to load test? “Hit it until it breaks” ● Include the points of failure in the calculations ● Determining backend limits can be tricky ● Use those resource ceilings as a basis while predicting future growth https://docs.acquia.com/acquia-cloud/arch/
  • 21. Database Backend Load Test ➔ How many queries/second (QPS) can the DB server manage? ➔ How many QPS can it serve before performance degradation affects end-user experience? ● What load will cause the database to be unresponsive or fail-over? Allowing to set alert thresholds accordingly. ● What to expect from adding (or removing) nodes to the backend? ● When to begin sizing for a new database capacity?
  • 22. A Few Load testing Tools simulate ● Loadrunner ○ http://bit.ly/microfocus-loadrunner ● Iago ○ https://github.com/twitter/iago ● JMeter ○ http://jmeter.apache.org/ collect ● Prometheus ○ http://www.prometheus.io/ ● Signalfx ○ http://www.signalfx.com/ ● Cacti ○ http://cacti.net ● Ganglia ○ http://ganglia.info ● Nagios ○ http://nagios.org/ https://www.gocomics.com/calvinandhobbes/1986/11/26
  • 23. Performance Tuning vs. Capacity planning (different goals) Top Speed by Alexander Nie
  • 24. What to measure defining the metrics End-of-life by Dennis van Zuijlekom
  • 25. Divide & Conquer ● Splitting nodes ● Understand capacity demands of each node ● Measure more distinctly ● How requests or queries per second affect resources
  • 26. Identifying the key resources to measure ● Disk space (MB) ● Disk throughput (IOPS) ● CPU performance (FLOPS) ● RAM memory (MB) ● Network bandwidth (Mbps) ● Network IP pool (Netmask) ● Others
  • 27. How to measure Living Computer Museum, Seattle
  • 29. Collecting resources on web servers TODO: CODE ● Example script that sends metrics to statsd ● Low footprint using /proc, df and ps ● For a constant reliable monitoring service use collectd: https://collectd.org or Telegraf: https://www.influxdata.com/time- series-platform/telegraf/
  • 30. How to track Capacity
  • 31. Store and display time-series ● Signalfx ● Cacti ● Ganglia ● Graphite ● Signalfx ● Datadog ● Ruxit ● LogicMonitor ● Sematext ● CoScale ● Riemann ● Prometheus ● Sensu ● Idera ● Bijk ● X-Pack ● vRealize Hyperic HQ
  • 32. A couple of load testing tips load testing Tutorials: https://www.tutorialspoint.com/jmeter https://www.blazemeter.com/load-testing docker app for grafana: https://github.com/kamon-io/docker-grafana-graphite
  • 33. Forecasting (predicting trends) Numbers And Finance by SeniorLiving.org
  • 34. Predict the future? ● Use Context & Math ● Make educated guesses ● Long-term view is generally steady ● Generate estimates to sustain growth ● Use an adjustable process ● Forecast guides autoscaling policies
  • 35. Ceilings and Historical data ● Daily storage consumption example ● Metric: total available disk space ● Cumulative total provides an historical perspective ● We can predict future needs ● Storage will probably be exhausted in the ceiling to where the line is headed
  • 36. Curve fitting ● Curve fitting ● Creative & Scientific ● Stay ahead of growth ● Use time-series data ● Forecast by constructing new data points beyond the known ● Reconciliation of what we know and the best fit equation ● Consider context before math y = mx+b
  • 37. Forecasting Peak-Driven Resource Usage ● Track how the peaks change over time ● Extrapolate from that data to predict future needs ● Identify the server resource ceilings ● Find a relation between resources and application-level work ● Decide if we should scale vertically or horizontally ● and perform proactive autoscalling
  • 38. ● Fityk is an Open Source Software for nonlinear fitting of analytical functions to data. ● Incorporate cfityk scripts into automated curve fitting, like: cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Returns the formula: 4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2 Homepage: https://fityk.nieto.pl/ cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Automating Forecasts with fityk & cfityk Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
  • 39. Forecasting with Machine Learning Seeking SRE Conversations About Running Production Systems at Scale Publisher: O'Reilly Media ● Most popular method for curve-fitting in fityk is Levenberg-Marquardt ● ML is also an option for forecasting (book I co-authored) ● Code examples and guides https://github.com/ricardoamaro/MachineLearning4SRE
  • 41. Get Started 1. Select a process owner. 2. Identify the resources to be measured. 3. Measure these resources. 4. Compare to maximum capacity. 5. Collect workload forecasts. 6. Use forecasts for IT resource requirements. 7. Map requirements onto existing utilizations. 8. Predict when the system will be out of capacity. 9. Update forecasts and utilizations.
  • 42. Set a Goal! ● Two Classes: ○ Load: usually expressed in arrival rate or peak rate of requests hitting the service eg. target for 10.000 authenticated concurrent Drupal users ○ Performance: usually expressed in the form of Service Level Objectives eg. 99th percentile of all requests should return in less 500ms
  • 43. Be proactive ( plan & document ahead) Picasso drawing with Paloma and Claude at Villa la Galloise, 1953. By Edward Quinn, EdwardQuinn.com.
  • 44. Capacity Planning Dashboard ● Support your conclusions with metrics in a dashboard ● Both manual scaling and auto scaling decision should be based on real data ● When to scale? ○ date and time (be alerted if needed) ● How to scale? ○ vertical, horizontal or diagonal scaling (Example) Drupal Cluster Dashboard type valu e limit/ node ceiling units limit (total) current (peak) peak % Estimated days left Varnish cache 28 1024 req/sec 2048 600 29% 830 Web 31 80 busy calls 160 145 90% 12 Database 15 60 connections 120 96 80% 36 Storage 14 30 TB 30 14 46% 21
  • 45. Conclusions Drive the system to the appropriate level of risk for the lowest cost.
  • 46. Join us for contribution opportunities Thursday, October 31, 2019 9:00-18:00 Room: Europe Foyer 2 Mentored Contribution First Time Contributor Workshop General Contribution #DrupalContributions 9:00-14:00 Room: Diamond Lounge 9:00-18:00 Room: Europe Foyer 2