Keeping a pulse on performance is important and proves difficult in a rapidly changing environment. One release to the next can have a significant impact on performance. Introducing performance budgets can lend a relatively simple safety net to catch unintended swings in resources or changes in timing during page load, that might have otherwise go unnoticed.
I’ll discuss how to create good performance budgets, why it’s important to have them in place, and tell a true story of how Zillow was able to respond quickly in a case when budgets were exceeded.
In telling this story, I will provide supporting evidence to debunk why “onload” isn’t a good measure of perceived performance, show real business Intelligence data supporting how it’s possible to deliver content that users want without impacting perceived performance, and tie these events together with RUM data that enables the best decision for the engineering team to support the business and ultimately the customer.
6. 6
Measure and Trend What’s Important!
Bigger : Size, # of Requests
• All Resource Types by Weight and # of Requests
– Images, Fonts, JavaScript, CSS, HTML, total size, total requests, etc.
Faster : Time
• User Perceived Performance
– Ex. “AFT Hero Image”, “AFT Search Box”
• SpeedIndex
• TTFB
• Pagespeed
7. 7
Add W3C User Timing Marks
• Measure Performance of User Experience
– http://blog.patrickmeenan.com/2013/07/measuring-performance-of-user-
experience.html
• “AFT” Notation (Above-the-Fold Time)
• Hero Image Custom Metrics
– http://www.stevesouders.com/blog/2015/05/12/hero-image-custom-metrics/
<img src="hero.jpg" onload="performance.mark(’AFT
Hero1')"> <script>performance.mark(’AFT Hero2')</script>
http://www.w3.org/TR/user-timing/
8. 8
Tools for Measuring and Trending
Synthetic Measurements
• SpeedCurve.com
• Independent WPT Instance
Graphite
• PhantomJS w/HAR HARStorage
• WPT OR PhantomJS w/HAR
SPLUNK
Real User Monitoring (RUM)
• Boomerang SPLUNK
• Numerous companies offering
RUM
9. 9
Synthetic vs. RUM
Synthetic Measurements
• Consistency in device/network
• Detailed Resource Analysis
Drawbacks
• Limited Sample and Frequency
Real User Monitoring (RUM)
• Large Sample
• Real Conditions
• Not Geographically Bound
Drawbacks
• Limited Resource Detail
– No Size from W3C Resource Timing
14. 14
Evaluate Performance Trade-offs
More Engaging : Conversion Rates
Jakob Nielsen “ Increased conversion is one of the strongest ROI arguments
for better user experience and more user research. Track over time, because
it's a relative metric.”
• Trend and Correlate Traffic Patterns to Performance
– PV/Session, Session Duration, Bounce Rate
• Observe Variations by Build Release
Heavier and Slower Modifications May Improve Engagement!
• Prove changes through AB Testing
15. 15
What else should we budget?
• 3rd Party Resources
• Service Response Time
• Database Query Time
• DNS Resolution
• SSL Negotiation
• TCP Connect
• HTTP Status Codes
27. 27
Business Metrics Remained the
Same!
No Impacts to User Perceived Performance!
Performance Changed in a Big
Way!
28. 28
Lessons Learned
• Increase Frequency of Measurements, “Quick Response”!
• Budgets enable a “Call to Action”!
• Prioritize User Timing marks for user perceived performance.
• Excess resources on page load may not affect user perception of
performance.
C: In most cases we want our products to exhibit these first three behaviors.
Budgets are a NOT taboo or a bad thing.
Financial Budgets
How we Eat
Exercise, Fit Bit Alarm!
T: First we must identify how we measure things.
C: Most of us have seen this. It will continue to grow.
Dynamic and engaging JS
Big beautiful Luscious images
T: How do our developers commonly measure the product?
C: This is TOTO a device NOAA developed to measure tornados.
Only one Intercepted Tornado in 1984.
TOO LATE. Damage is done. Maybe not full Picture.
Do developers attempt to measure issues as they arise?
T: No chasing tornadoes! Can we measure more frequently to predict issues?
C: There are over 22,000 registered seismograph stations according to the ISC and can detect seismic activity in near real time.
In 1960 Chile experienced the most powerful earthquake in history, 9.5, 6,000 fatalities.
In 2010, over 500 fatalities.
Last month another 8.4 earthquake
Advancement in early warning systems notified smartphones within minutes. only 13 known fatalities and 5 persons missing.
T: In fast paced Agile development, this give us an opportunity to identify influencers on performance and be notified as soon as possible.
C: You might be saying, We already have tons of Measures and Metrics!
Are they Visible?
People don’t always look at their fuel gauge.
Newer cars added a low fuel Alarm.
T: We want to see measurements accessible all the time! The Empty alarm should be your indicator to stop and investigate.
C: What should we measure?
All resources! It’s easy and might surprise you with what you find.
User perceived Performance w/ User Timing Marks
T: How do I measure User Perceived Performance?
C: What should we measure?
AFT notation
Measuring Hero Image or possibly when a focus control appears.
T: How do we measure these?
C: Two common ways to measure.
SpeedCurve
Ilya Grigorik and Peter Lubbers on the GDL HAR Show
T: Trade-offs between these tools.
C: Use Both Because
Consistency
Sample Size
No Size Detail in RUM
T: Let me show you briefly how simple it is to create a Budget!
C: This is SpeedCurve’s form to add a budget.
- Easy to implement resource and timing budgets.
T: Here is what this looks like.
C: This is Start Render for Zillow Home over the last 30days.
Red Line
Green Line
T: Here is what a Custom User Timing Mark looks like.
C: This is Zillow’s AFT Home Custom User Timing Mark.
Many Custom User Timing Marks
3 spikes
Access to each measurement in more detail or record a note!
T: How often should I measure?
C: It depends on how often you Release and dynamic content.
More data measurements lend added insight to dynamic content
Once per build. Use the API
T: What do we do when a budget is exceeded?
C: Identify what Changed, Evaluate Engagement!
Increased Conversion is one of strongest ROI arguments.
Add build labels to trends.
AB Test
T: How else can budgets help?
C: Let’s think out of the box a bit.
3rd party is dynamic
Service response times
T: Budgets can be applied in many valuable ways.
C: This is a screen shot from one of our perf dashboards.
Zillow largest RealEstate Network on the Web
2/3rds of our traffic is mobile and on weekends exceeds 70%
T: How do we manage all these budgets?
C: Good Performance Budget Practices
Again Prioritize User Experience
How has this helped? Let me share a true story.
T: During my commute one morning I saw this.
C: Onload Time was over by 61%, 7 second increase
T: Scrolled further…
C: Image requests were over budget by 373%, 112 new images
T: Scrolled Further...
C: Total size over by 417% Increased to almost 13.5mb!
T: This can’t be, it’s our mobile search page!! Scroll again…
C: Images increased 708%, 11mb of images seemed the culprit.
What’s going on?
Upset users!
T: Before I was in office, I had already engaged engineers at Zillow.
C: This is the feature Zillow Shipped the day before.
Improved Engagment in AB Test Results
The data didn’t match up with my perception.
Did change warrant the performance trade-offs?
T: I reached out to our internal experts in Business Intelligence and they came back with this.
C: Much to our surprise. Impact to engagement was Insignificant.
Fluctuations with traffic patterns and other factors during this time sample represent potential influencers.
T: What had changed to cause this effect?
C: The engineers found a bug in a future AB Test bucket.
More images were downloaded but functionally was still disabled.
No PERCEIVED CHANGE in PERFORMANCE
Win for vNext Feature
T: Supporting that theory that Perceived Performance was not impacted.
C: The SpeedIndex rating did not falter despite the 4X weight increase.
- SpeedIndex interesting indicator of User Perception.
T: As an aside from this investigation…
C: RUM data showed a 10 sec swing in “Time to Onload”!
Another case where we have debunked “Time to Onload”
Just another Indicator!
T: To summarize this.
C: No impacts to user perceived Performance.
T: We’ve covered some really interesting observations, so what have we learned?
C: Increase Frequency of Measurements! Enable quick response!
Budgets are an indicator and a CALL TO ACTION!
User Timing marks would have enabled us to know more accurately what the user is experiencing.
Give Users Extra content when It makes sense.
Be mindful of data plan usage and battery power, but Delight Your Users!
T:
This is Mount Rainer, a dormant volcano in Washington state. A sleeping giant.
Atop the pole on the right is the early warning system millions depend upon.
These are lessons I’ve learned for you. I encourage all of you to implement budgets and enable your teams with their own early warning systems.
Engineering Blog
Tweet Slides
We are Hiring!
I’ll be available afterward for questions.
Measure Early, Measure Often, and Measure while on a BUDGET.
Thank you!