Capacity Planning Infrastructure for Web Applications (Drupal)

Carbon Fiber Tank, SpaceX
How to lower the
costs of your Drupal
Site's resources and
plan Capacity in
advance
ricardoamaro sre@acquia

About me
@ricardoamaro
● Principal SRE @Acquia (Cloud Data Team)
● Joined in December 2011
● Location: Lisbon, Portugal
● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly)
● Founder and Lead of the Portuguese Drupal Association
● Fun Facts:
○ Presented in DevOps events including DrupalCons.
○ Dedicated father of 2 kids and still manages to study and write.
○ First Linux installation: Slackware in 1994.
○ Former theatre actor.

Agenda
What we will be talking about
The problem
What is Capacity
Why do Capacity Planning
Relation to Site Reliability Engineering
Budget & Capacity Planning
Load Testing
Performance Tuning vs. Capacity Planning
What to measure
How to measure
How to track capacity
Forecasting
First Easy Steps
Conclusions

The Problem
Site Launch & User Expectations
Falcon Heavy launch, Spacex

Typical Drupal Site Launch
What about
Capacity Planning??
- Disable devel
- Configure cron
- Check The Upload Sizes & Execution Time
- Check Recipient Email Addresses
- Set The File Permissions
- Protect Your Root Account
- Check Permissions
- Turn Off Error Reporting
- Handle 404 Errors Gracefully
- Check Robots.txt
- Combine Pathauto With Global Redirect
- Create A Maintenance Page
- Configure Caching
- Css And Javascript Optimisation
- Check Unpublished Content Is Not Visible
- Configure Statistics
- Monitor the Site
-
** Plan for Failure **

User Expectations
Drupal click screenshot
● The end goal of capacity
planning is a smooth and
speedy experience for the users
● Varies depending on what type
of application is and what
portion of the application they
interact with

No silver bullet
● Plenty of capacity but a slow
website or unavailable
● Capacity is only one part of
making the end-user experience
fast
● We want to measure and track
to make forecasts
● Intolerable amount of latency
should raise a flag

What is
Capacity
resources required to run your services
in the context you have chosen to run them
Carbon Fiber Tank, SpaceX

Capacity in Site Reliability Engineering (SRE)
● Capacity: The maximum amount of output a product deployment is
capable of completing in a given period of time
● Capacity planning: Process that determines the resources needed,
like people, instances, CPU, memory, time and more, for the company
to meet changing demands for its services
● In the Drupal World we focus mostly on serving WEB capacity

Resource management
The Art of Capacity Planning
Arun Kejariwal, John Allspaw
"O'Reilly Media, Inc."
● Ensure proper resources are
available to handle load
● Define procurement and an
approval process
● Justify capital needs
● Manage resources after
deployment

Why do
Capacity Planning
Kroger grocery store, Lexington Kentucky,
1947, by Brett Streutket

Quick and Dirty Math
● Only spend as much as you
actually need
● Be ahead of sharp growth
● Avoid emergencies
Stay Fast and Reliable

Site Reliability
Engineering
Rocket Laboratory, 1952
NASA/William A. Bowles

Ben Treynor - Google
...an SRE team is responsible for
the availability, latency,
performance, efficiency, change
management, monitoring,
emergency response, and capacity
planning of their service(s)...
“
“

Demand Forecasting and Capacity Planning
● Ensuring that there is sufficient
capacity and redundancy
● Serve projected future demand
with the required availability
● Ensure the required capacity is
in place by the time it is needed
● Take both organic and inorganic
growth into account
https://unsplash.com/photos/mexeVPlTB6k

How SRE advocates for Capacity Planning
● Perform regular load testing
● Incorporate SLOs on Capacity
● Capacity is critical to
availability, therefore the SRE
team leads capacity planning
initiatives and provisioning
https://unsplash.com/photos/DX9X0g0Cg88

Budget & Capacity Planning
Vintage Grow Your Money
by Chris Potter, ccPixs.com

Keeping the costs low
● Meet with Finance, Engineering
and Product
● Gather Systems and Application
metrics
● Use that data to justify the
investment Three forces that impact Capacity Planning
Product
FinanceEngineering
Plan

Load Testing
“Hope is not a strategy”
St. Margrethen - Load Test by Kecko

Load testing a Drupal stack
● How to load test?
“Hit it until it breaks”
● Include the points of failure in
the calculations
● Determining backend limits can
be tricky
● Use those resource ceilings as a
basis while predicting future
growth
https://docs.acquia.com/acquia-cloud/arch/

Database Backend Load Test
➔ How many queries/second (QPS)
can the DB server manage?
➔ How many QPS can it serve
before performance
degradation affects end-user
experience?
● What load will cause the
database to be unresponsive or
fail-over? Allowing to set alert
thresholds accordingly.
● What to expect from adding (or
removing) nodes to the
backend?
● When to begin sizing for a new
database capacity?

A Few Load testing Tools
simulate
● Loadrunner
○ http://bit.ly/microfocus-loadrunner
● Iago
○ https://github.com/twitter/iago
● JMeter
○ http://jmeter.apache.org/
collect
● Prometheus
○ http://www.prometheus.io/
● Signalfx
○ http://www.signalfx.com/
● Cacti
○ http://cacti.net
● Ganglia
○ http://ganglia.info
● Nagios
○ http://nagios.org/
https://www.gocomics.com/calvinandhobbes/1986/11/26

Performance Tuning
vs. Capacity planning
(different goals)
Top Speed
by Alexander Nie

What to measure
defining the metrics
End-of-life
by Dennis van Zuijlekom

Divide & Conquer
● Splitting nodes
● Understand capacity demands
of each node
● Measure more distinctly
● How requests or queries per
second affect resources

Identifying the key resources to measure
● Disk space (MB)
● Disk throughput (IOPS)
● CPU performance (FLOPS)
● RAM memory (MB)
● Network bandwidth (Mbps)
● Network IP pool (Netmask)
● Others

How to measure
Living Computer Museum, Seattle

http://www.brendangregg.com/Perf/linux_perf_tools_full.png
| Tools to measure on Linux servers |

Collecting resources on web servers
TODO: CODE
● Example script that
sends metrics to statsd
● Low footprint using
/proc, df and ps
● For a constant reliable
monitoring service use
collectd: https://collectd.org
or Telegraf:
https://www.influxdata.com/time-
series-platform/telegraf/

Store and display time-series
● Signalfx
● Cacti
● Ganglia
● Graphite
● Signalfx
● Datadog
● Ruxit
● LogicMonitor
● Sematext
● CoScale
● Riemann
● Prometheus
● Sensu
● Idera
● Bijk
● X-Pack
● vRealize Hyperic HQ

A couple of load testing tips
load testing Tutorials:
https://www.tutorialspoint.com/jmeter
https://www.blazemeter.com/load-testing
docker app for grafana:
https://github.com/kamon-io/docker-grafana-graphite

Forecasting
(predicting trends)
Numbers And Finance by SeniorLiving.org

Predict the future?
● Use Context & Math
● Make educated guesses
● Long-term view is generally
steady
● Generate estimates to sustain
growth
● Use an adjustable process
● Forecast guides autoscaling
policies

Ceilings and Historical data
● Daily storage consumption
example
● Metric: total available disk space
● Cumulative total provides an
historical perspective
● We can predict future needs
● Storage will probably be
exhausted in the ceiling to
where the line is headed

Curve ﬁtting
● Curve fitting
● Creative & Scientific
● Stay ahead of growth
● Use time-series data
● Forecast by constructing new
data points beyond the known
● Reconciliation of what we know
and the best fit equation
● Consider context before math
y = mx+b

Forecasting Peak-Driven Resource Usage
● Track how the peaks change over time
● Extrapolate from that data to predict
future needs
● Identify the server resource ceilings
● Find a relation between resources and
application-level work
● Decide if we should scale vertically or
horizontally
● and perform proactive autoscalling

● Fityk is an Open Source
Software for nonlinear fitting
of analytical functions to data.
● Incorporate cfityk scripts into
automated curve fitting, like:
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Returns the formula:
4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2
Homepage: https://fityk.nieto.pl/
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Automating Forecasts with fityk & cfityk
Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I

Forecasting with Machine Learning
Seeking SRE
Conversations About
Running Production Systems
at Scale
Publisher: O'Reilly Media
● Most popular method for
curve-fitting in fityk is
Levenberg-Marquardt
● ML is also an option for
forecasting (book I co-authored)
● Code examples and guides
https://github.com/ricardoamaro/MachineLearning4SRE

Get Started
1. Select a process owner.
2. Identify the resources to be measured.
3. Measure these resources.
4. Compare to maximum capacity.
5. Collect workload forecasts.
6. Use forecasts for IT resource requirements.
7. Map requirements onto existing utilizations.
8. Predict when the system will be out of capacity.
9. Update forecasts and utilizations.

Set a Goal!
● Two Classes:
○ Load: usually expressed in
arrival rate or peak rate of
requests hitting the service
eg. target for 10.000 authenticated concurrent
Drupal users
○ Performance: usually expressed
in the form of Service Level
Objectives
eg. 99th percentile of all requests should return
in less 500ms

Be proactive
( plan & document ahead)
Picasso drawing with Paloma and Claude at Villa la Galloise, 1953.
By Edward Quinn, EdwardQuinn.com.

Capacity Planning Dashboard
● Support your conclusions with
metrics in a dashboard
● Both manual scaling and auto
scaling decision should be based
on real data
● When to scale?
○ date and time (be alerted if needed)
● How to scale?
○ vertical, horizontal or diagonal scaling
(Example) Drupal Cluster Dashboard
type valu
e
limit/
node
ceiling
units
limit
(total)
current
(peak)
peak
%
Estimated
days left
Varnish
cache
28 1024 req/sec 2048 600 29% 830
Web 31 80 busy calls 160 145 90% 12
Database 15 60 connections 120 96 80% 36
Storage 14 30 TB 30 14 46% 21

Conclusions
Drive the system to the appropriate level of risk for the lowest cost.

Join us for
contribution opportunities
Thursday, October 31, 2019
9:00-18:00
Room: Europe Foyer 2
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
#DrupalContributions
9:00-14:00
Room: Diamond Lounge
9:00-18:00
Room: Europe Foyer 2

Capacity Planning Infrastructure for Web Applications (Drupal)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Capacity Planning Infrastructure for Web Applications (Drupal)

Similar a Capacity Planning Infrastructure for Web Applications (Drupal) (20)

Más de Ricardo Amaro

Más de Ricardo Amaro (11)

Último

Último (20)

Capacity Planning Infrastructure for Web Applications (Drupal)