How Salesforce Scaled Their Performance Engineering Team Through Automation, Tools and Environments

How Salesforce built a Scalable,
World-Class, Performance
Engineering Team
September 18th, 2012
Kasey Lee, Salesforce, VP Performance Engineering
in/leekasey

Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if
any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-
looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of
product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments
and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated
with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling
non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the
financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may
not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently
available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

Welcome! What brings you here?

A. I’m curious how PerfEng can excel in an Agile Environment
B. I’m curious how to utilize a Performance Engineer's time
C. I’d like to understand how to better articulate the value of
Performance Engineering
D. I thought this was a great place to take a break and check my
social feeds before dinner
E. A, B, or C
F. All of the above

What do typical Performance Teams start as?
“Performance Engineering is run as a Shared Services
model so your charter is the entire organization with
maximum visibility. Everything flows through PerfEng
because it’s so critical. Dev, QE, Technical
Operations, Level II and III Support, and Professional
Services wants the most out of your engineers by
leveraging your talent across projects to scale mission
critical applications”

What it sounds like 

PerfEng

What it actually feels like ;)

PerfEng

Top Ten Signs Your Team Needs Help
1. You laugh when asked to signoff at Feature Freeze (and Release Freeze)
2. Your engineers work on 6-12 parallel projects (others work on 1-2 projects serially)
3. If you attended each of your assigned scrum teams’ daily 15 min standup you’d
never sit down (the entire week)
4. When you can’t signoff on a feature, everyone wants to raise the goals instead of
fixing the performance problem
5. Every day you answer “How did you decide to prioritize my feature? How can I
escalate this?” (even after you had agreement)
6. You’re told to commit to a plan for the next release while your team is busiest in the
current release and has no time to plan
7. Your team wants to influence the product or hardware architecture but can’t find the
time to even write up their analysis
8. Developers discount poor results due to variance without looking at the data (even
though the results of the latest release are always worse)
9. IT always asks “Why do you need isolated labs? Dev and QA don’t need them”
10.Devs ask your engineers to do manual tasks at all hours

That sounds like my
situation… How did
Salesforce approach this?

What’s in store?

Introduction
The Unique Challenge at Salesforce
How the Team Scales
Workloads
Automation, Tools, Environments
Closing Thoughts and Tips

Brief Background
VP @ Salesforce
 Performance Engineering
Sr. Director / Tech Lead @ Wily Technology
 Performance Engineering, Software Tools, QA,
R&D Lab
Architect @ Event Zero
 Developer, consultant for startups
Developer @ Ziff-Davis Benchmark Operation
 Industry Standard Software Benchmarks
 iBench, WebBench, ServerBench, NetBench

What drew me to Salesforce?

• Performance and Scalability is one of the
top three core values of the company
• One of the most complex Enterprise
scalability challenges anywhere
• As of today one of the best funded teams in
the industry and growing as quickly as we
can find the best people

What are some key challenges at Salesforce?
1. Mission Critical Enterprise Apps Customers pay for
 No perf testing in production on unwary customers
 No tolerance for downtime or slow response times which
immediately impact customers’ bottom line
2. Security is Paramount
 Extremely difficult to access production systems / data
 Can’t easily examine load and data shapes in detail
3. True Multi-Tennant Architecture
 Every customer can create completely different load / data
characteristics at a moment’s notice

Noteworthy Milestones
Mid 2006 – “System Test” Team created from HA crisis
April 2008 – Kasey Lee joins a struggling team of 7
Sept 2008 – Automation & Tools Team Created
Sept 2009 – Team averts162 R1 Load Balancer Disaster
Jan 2010 – Leads solution to Capacity Planning crisis
Sept 2010 – Team predicts GC Heap 168 R1 Regression
Sept 2010 – Team leads solutions to NA6 Perf
Nov 2011 – Team helps reduce production CPU >60%
May 2012 – Team Triages 178 R1 Bytecode Regression
June 2012 – Team size rises to 60 Traffic & complexity
continues to increase to
Jan 2013 - Target size: 80+ ~60B / Quarter, but response
times have decreased!

Major Accomplishments ex. – “CPU 15”
• SWAT team optimization /
tuning efforts saved the
company ~$150 Million dollars
• Optimizations include potential
to change the JVM spec
directly to benefit everyone
• Great example of ROI
 Not only in dollars, but helps
build the credibility that you can
leverage to do even more

Performance
Daily Dashboard
– 10/24/2011 –
RED! 

Performance
Daily Dashboard
– 11/15/2011 –
Look at all that
GREEN! 

How do we accomplish this?
• Baseline Functionality & Benchmarking
• New Feature Benchmarking
• Patches / Production Support
• Hardware / Infrastructure Analysis
• Special Studies / Research / POC
• Production Visualizations
• Capacity / Sizing Guides
• Architecture Expertise
• Profiling Concepts and Training
• Automation Frameworks
• Self Service Frameworks
• Data Analysis, Creation, Visualization Tools
• Load Generation Tools
• Environment Design
• Optimization

What We Continually Focus On
Blazing fast performance delivered by Cloud teams and PerfEng through
collaboration, innovation and transparency

Empowered and engaged PerfEng inspired by the real world impact of
their work and widely recognized as industry thought leaders

Quick and accurate test results, effective testing, seamless scheduling
and flexible environments

Frequent assessment, optimizations, and deep visibility into feature
performance during development and in production
Fully integrating PerfEng into product development as beneficial and
essential members of Cloud teams

Performance built in by Cloud teams and able to catch obvious
performance issues themselves

What really makes us so effective?

1. Our Perf/Dev ratios have been adopted (after numerous “discussions” )
2. We have a Software Development Team
3. We have a Product Owner (Prod. Mgr) for our Labs
4. We have a dedicated TechOps team “PerfInfra” for Labs
5. We have a substantial lab for testing
6. We have a Program Manager focused on cross functional project strategy,
visibility, and communications

Performance Engineering Team Structure
Performance Automation & Tools & Env
• Sales/Service/Data • Software Tools Developers
• Features • Environments
• Workloads • Product Owner
• Chatter • Special Projects Lead
• Features
• Workloads
• Platform/Mobile/UI
• Features Architect
• Workloads
• Core/Search/Analytics Program Manager
• Features
• Workloads

The Importance of Early Starts…

PerfEng Historical Lag per Release
Release R1
Planning Final Plans Due Feature Freeze Sandbox
Freeze

Product Development Sprints Release Sprint Release

Jan Feb Mar Apr May Jun

Accumulated performance bug debt

Cost of
finding &
fixing bugs
PerfEng Begins Testing 

Why Start Early
and Profile
upstream?

Avoid obvious
problems earlier

• Late starts with
minimal
workloads

• Increased
workloads and
decreased time
to bring online

• No longer need
to track 

PerfEng Starts vs. Release Timeline
Release R1
Freeze



Q: How do we do scale PerfEng to meet
demands of a larger organization?

Ratios are Key to Establish and Socialize
PerfEng established a 1:8 ratio of Perf/Dev IC
 No more than two scrum teams or three projects / release
 Does not include workload engineers (min of two per cloud)
 Does not include Managers or Software Tools Engineers
Perf Managers/IC ratio may need to be higher than 1:8
 Managers may require 1:3 or 1:5 due to the additional teams
managers interact with cross functionally
Find a ratio that enables PerfEng
 Factor early participation, deep dives, optimization work to provide
meaningful contributions
 Support discussion with velocity points, automation and efficiency
examples, ROI Examples

>1.2x

>2x

>2x

Gap is closing today, but still
haven’t reached the target

Embed Performance Mindset into Every Team

“Closely partner with scrum teams to
provide early, fast, continuous
architecture engagement / results /
analysis for complex scenarios and
enable scrum teams to catch obvious
performance issues with self service
tools, automation, and processes before
they reach Performance Engineering”

How We Interact with Scrum Teams
• Each scrum team appoints one Dev and QE engineer
who are mapped to a single PerfEng

• Teams must co-develop their release plans and sign off
criteria up front

• Teams are accountable for their features (complete
ownership coming back to PerfEng as team scales up)

• Teams must characterize obvious performance criteria
themselves every sprint (Cadence, PTest)

• Teams must deliver their features on time or accept
testing into the release sprint or beyond

PerfEng Rep Mapping
Dev/QE
Reps

PerfEng
Rep

Embedding Performance – A Tiered Approach
Increasing Test Complexity and Feature Risk

80% Scrum Team + 15% Scrum Team + 5% Scrum Team +
0% PerfEng 40% PerfEng 60% PerfEng
Single user transactions on Single user transactions on Corsa Single user transactions on IST
Desktops/Local Builds
Single user transactions in PTests High Load, High Concurrency on High Load, High Concurrency on
Corsa IST

Scrum Teams focus on catching obvious low-hanging
fruit; PerfEng focuses on difficult to construct, high
load/concurrency scenarios requiring highly specialized
knowledge to detect and analyze

86 GB of meta data primarily from PerfEng workload tests!

1.4 TB of meta data from tests created by Devs and outside teams!

Q: What are the key Agile Release
Milestones and activities for PerfEng?

Release Timeline and PerfEng Activities
Release R1
Freeze



•Appoint Liaisons •Complete Release Plans •Double Check Exit •Initial visibility into •Signoff on all •Monitor Sandbox
Milestones

Criteria all features Features and •Final Optimizations
Workloads
•Signoff on ¾ of
features •Continue Workload
Optimizations
•Get workloads
green

How Do We Allocate Engineer’s Time?
70% Velocity Points Open
 Feature or Workloads work for a specific cloud
30% Velocity Points Reserved
 PTOn (9 days/year to work on whatever they want)
 External Training Classes (e.g. SQL Tuning)
 Other Cloud’s projects they are interested in
 Conferences (e.g. HBase, Hadoop)
We Leverage Agile and
 Foundation events (1:1:1) ADM to enable People’s
Changing Interests

Templates Cover Most Important Phases of a Project

Requirements/Arch Strategy/Test Plan Analysis/Results

Release Signoff Criteria and Team Dynamics
• PerfEng will only sign off on features we
worked on directly (or have thoroughly
reviewed the plans and results)
• Scrum Teams may sign off on features by
themselves at their own risk for any feature
with Medium or Less Risk (if PerfEng is
short of resources)

Quick Tip – Negotiating Release Criteria

Bring in teams from operations and support
 Quote examples of consequences of releasing
without adequate throttles and caps in place
 Cite examples from your company or other
leading companies of the cost of reduced
customer credibility

What is a “Workload”?

• A repeatable test simulation or benchmark that provides a
meaningful result by utilizing specific inputs into the system under
test while recording numerical metric data, which is subsequently
analyzed and weighted to perform a qualitative assessment
• Changing a variable in the workload and re-running provides a
meaningful comparison
• Baseline Workloads are automated and enhanced release over
release wherever possible

Workloads Map

DB Workloads Chatter
Grinder Search
Force.com MQ
Apex UI
VisualForce Sharing

“Shape” Terminology

Load Shape – The distribution, rate, and
type of requests injected into the system
under test (SUT)
Data Shape - The size, skew, and type of
data, files, etc. accessed during the test

Categories

Playback tests take production traffic logs and replay traffic against the cut of data from
that time period
• This enables Salesforce.com to properly capture data skews, volumes, and transactions that customers have
run at a particular time and cover features that are heavily customizable

Synthetic tests involve utilizing custom tools to profile production load and data shapes
and then use custom tools to create workloads that mimic the desired characterisitics
• Synthetic tests enable the team to create data and load shapes that may be far greater or more accentuated
than in production, in a deterministic and precise fashion that enables granular studies of linearity,
bottlenecks, and resource utilization
• In most situations different versions of Salesforce are compared against one another, although absolute
performance metrics are used for new features or situations where it is too difficult to make meaningful
comparisons

Workload Highlights
Name Summary Load Shape Data Shape
DB Workloads A workload the replays real production requests against 100,000 complex Sanitized copy of real world production
customer data in a precise fashion to meticulously identify reports and filters Data with emphasis on massive data sets
proper DB stats, tuning for reports

Grinder A large scale, high load, high concurrency test that simulates an 400 RPS, target Sanitized copy of real world production
hour of peak production traffic by replaying transactions production Data
steady state
utilization of 35%,
peaks of 80%
Force.com Simulates traffic against a standard Ideas sites / base them Requests are Synthetic data based on real world force.com
application. generated across app “Ideas”
40 different URLs /
operations
Visual Force A read-only targeted test isolating specific components of VF at high 32 concurrent Small Synthetic VF classes. Viewstates,
request rates. Apex components are designed to be constant across requests across 10 Wrapperless / Wrapped nested data
all requests so regressions are to pure VF orgs presentation and Namespaces

Apex A targeted test that exercises the components of Apex Cache, CPU 64 threads across Synthetic set of classes that exercise Apex
consumption, Memory 16 organizations Cache, CPU Use of Apex L1, Maximal
number of lines of apex, creation of
temporary objects

Sharing A workload that performs DML Operations on Sharing Enabled Orgs, 2 app servers, 10 Synthetic Orgs (One Territory Managed,
Performs Sharing Rule Maintenance Operations on Various Entities, concurrent Users, One Regular)
Territory Management Operations and Accounts/Opportunity 7 Thread groups

Workload Highlights (continued)
Name Summary Load Shape Data Shape

Search High load, high concurrency test that simulates peak production traffic Replay production searches Sanitized copy of real world
by replaying searches and concurrently simulating incremental and performs incremental production Data
indexing. Monitors and reports metrics on entire stack [Indexers, DB, indexing at peak load. Issues
Query Servers, App Servers, Memcached] searches at 55 RPS

MQ Workload: A workload which enqueues messages into QPID on an IST using 20 app servers x 20 threads synthetic; configurable message
QPID multiple IST app servers. Tests QPID (the MQ transport service) in enqueue messages of size
(transport in isolation. Suitable for acceptance testing an upgrade. varying sizes for 10min-6hr
isolation)

MQ Workload: A workload which creates load on the integrated SFDC MQ framework 20 app servers x 20 threads synthetic; configurable message
Hydra using the SFDC MQ API library. Uses synthetic asynchronous handlers enqueue messages of size
(integrated) running on the app servers to simulate message and resource varying sizes for 10min-6hr
consumption. Suitable for running with every release, and for
simulating the impact of a new asynchronous handler.

Mobile Workloads simulate user actions over a real 3G network. Captures Real Device & Emulator. On Sanitized copy of real world
metrics to measure end-user perceived response times on slow Real 3G networks production Data
networks and real devices

UI Workloads simulate user actions in a real browser. Captures metrics to 6 Browsers – Nightly tests Synthetic user data – across all
measure end-user perceived response times. Org with Chatter data is standard pages. 3 Different orgs
very large. to test across different
skins/chatter.

Workload End to End Coverage* (At a Glance)
UI Network App Search Indexer FFX Batch DB SAN
DB Wkld 8 1
Grinder 1 8 3 3 3 3 7 3
Force.com 1 8 1
VF 7
Apex 7
Sharing 7 2
Search 6 6 6 4 3
MQ 6 2
Mobile 5 4 1
UI 8 5 4 1
Batch 6

*Higher numbers indicate better coverage in a given tier

Daily DB, Appserver, and UI Performance Tests!

Database Workloads

Appserver Workloads

UI - End User Response
Time Workloads

168 – Performance Bugs ROI

Note that >50% of P0 bugs were
found by baseline workloads!

290 Total = 78 Workloads (27%), 211 Feature Testing (73%)

Tools (Analysis, Monitoring, Automation)
Custom Off the shelf
 Michelangelo / Caliper  JMeter
 StatsForce  STAF
 Leonardo  JProfiler
 LightHouse  Dynatrace Ajax
 Suzuki  Splunk
 Cadence  Shunra
 SUIT / CSP  Jiffy
 PTest  HTTP Analyzer
 ReplayForce  Selenium
 DataForce  Fiddler, YSlow, PageSpeed, Firebug

Michelangelo – Results Viewer
• Provides single point of
entry into all automated
tests
• Dynamic Test vs. Test
views
• Automatic Averaging of
test runs and filtering of
outliers
• Compare baseline to
results trends

Michelangelo Changelist Trend Example
• Dramatically shows
changes in performance
to the changelist
Specific
• Row and Column Changelist fix
highlighting results in 33%
more GC
• Color Coding activity

• Annotations
• Compare baseline to
results trends
• Absolute and Relative
difference comparisons

StatsForce – High Resolution
Time Correlated
Visualizations
Notice the
• OS Statistics benefits of
time
• Application Statistics correlation!

• JVM Statistics
• Errors Notice
how Full
GCs affect
• Mix and match Notice different
chart types Response
(scatter) on Times!
representations and chart same timeline!

types on demand

Statsforce Example - Force.com Workload Load
Balancer Regression
164 166
This looks odd!

Statsforce Example – Errors Per Second
164 166
This looks odd!

Statsforce
Example -
Custom Test vs.
Test Views!

Environment Types
Name Description Size
IST (Integration • Large scale pod. Closest to production in both software and hardware
System Testing) configuration (load balancers, 8 node RAC database, etc.)
• Primarily uses production data

CST (Comparison • Small environments focused on Database workloads
System Testing) • Primarily uses production data
DB Load (Prod, • Small environment with large sized DBs (4TB – 20TB)
Synthetic) • “Prod” uses production data , “Synthetic” uses synthetic data

Corsa (“Race”) • Small environments with hardware vertically identical to production
• Fewer horizontal nodes, focused on a particular SUT (Search, DB)
• Does not utilize production data
VMs / Autobuilds Dedicated environment for each engineer for development purposes

Desktops / Adhocs Dev local machines or Adhocs for PerfEng – dedicated for each
engineer for local tests or development

Continuous Data Refresh System
•Enables teams to access latest
production / synthetic data with
minimal downtime

•Performance tests can modify / delete
TB of data and rollback in minutes

Details
 Production snapshots and corsa
images are taken periodically and
stored on SAN Refresh
 A “jukebox” server prepares snaps
into “green” database images
 The jukebox applies schema updates
and keep them “green”
 The “green” images are always ready
to use and rsynched directly to the
environments

Where is Salesforce.com PerfEng Today? 30,000 ft. view
• Team has evolved from seven “Systest” engineers who
struggled to produce meaningful analysis, to a world class
Performance Engineering organization of >60 engineers with
no significant production issues the day after release for
almost three years
• Active participation in features, provides visibility and risk
assessment at critical milestones and averts major
degradations, helps triage and mitigate production issues,
delivers optimizations across the stack, and whose skills and
headcount are now lobbied for by Development teams
• Automation has increased from two workloads which ran a
handful of times late in the release, to over 15 sophisticated
workload suites that run every day and are critical to signoff

Top Ten Tips for Scaling Your Team
1.Socialize your ratios for PerfEng to Developers to eventually embed into teams
2.Propose a dedicated model over a shared service model
3.In a pinch, provide teams the velocity points they have funded, and ask them to prioritize
4.Build out your management team at every opportunity
5.Develop meaningful automated workloads with low variance and show the ROI regularly
6.Create a tools team that spends >=75% of their time developing automation and tools
7.Make your Labs and Test Frameworks self service
8.Develop production monitoring tools to collect relevant data for workloads and exit criteria
9.Create frameworks to enable staged work from Dev desktop to large scale Perf environments
10.Develop training classes for perfeng, new hires, Dev/QE liaisons – smaller population first

What else could be
responsible for this
dramatic optimization?

Could Increasing PerfEng
continue this trend…? 

Bonus Tips for a Happy Team

• Contribute to a positive atmosphere that
promotes Autonomy, Mastery, and Purpose
with interesting projects to tackle in depth
• Focus on your strengths and strive to
improve at every opportunity
• Set a bold vision with achievable
milestones, and celebrate progress

What will you take from today?
What will you change starting next week?

“Is anything truly impossible? Perhaps it is
temporarily impractical or unlikely” – Kasey Lee

Ex. Human Exoskeletons (2:05)

Kasey Lee
VP Performance
Engineering,
in/leekasey

Turn your PerfEng team from this… Into this…

Manual Black Box Testers Architecture / Analysis /
Simulation / Optimization /
Visualization / Automation /
Monitoring Experts

Lines of Defense
1. Single user requests in PTest on VMs
2. Single user requests / high load on Corsa
3. Concurrent / high load on Corsa
4. Single user requests on DB Load
5. Concurrent / high load on DB Load
6. Single user requests on IST
7. Concurrent / high load on IST

How Salesforce Scaled Their Performance Engineering Team Through Automation, Tools and Environments

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (18)

Similar a How Salesforce Scaled Their Performance Engineering Team Through Automation, Tools and Environments

Similar a How Salesforce Scaled Their Performance Engineering Team Through Automation, Tools and Environments (20)

Más de Salesforce Developers

Más de Salesforce Developers (20)

How Salesforce Scaled Their Performance Engineering Team Through Automation, Tools and Environments