Benchmarking allows you to track UX progress over time, giving you an indication of how successful digital platform changes have been. It provides a holistic product assessment and requires greater attention to methodology, stakeholders, tasks, protocol, and analysis than a typical feature study does. Our method captures quantitative measures, along with qualitative feedback, for product stakeholders to use to justify and inform their business decisions.
In this session, you’ll get tips for developing a benchmark strategy. You’ll also hear stories about how benchmarks have impacted our organization’s digital strategy.
You will learn:
The business impact of benchmark studies
Designing, running, and analyzing such studies
How to avoid issues with recruiting, study design, execution, reporting
A variety of UX and product professionals, including seasoned researchers, novice designers, and digital product owners can learn and take action from this session.
The 3rd Intl. Workshop on NL-based Software Engineering
Measuring UX Over Time
1. Building Your Benchmark
How to Measure UX
for Product Impact Over Time
Jennifer Otto | Paul Sisler | Yina Li Turchetti
UXPA
2 0 1 9
2. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Welcome
Yina Li Turchetti
Senior UX Researcher
Autodesk
Jennifer Otto
Director, UX Research Ops
Fidelity Investments
Paul Sisler
Senior UX Researcher
Pearson Education
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
3. 1. Benchmark Basics
2. Method
3. Cadence
4. Recruiting
5. Study Design
6. Analysis
7. Reporting
Contents
Photo by Kimberly Farmer on Unsplash
4. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
What Is a Benchmark?
Test end-to-end system, site, or app
Get an overall product pulse (not to be
for a deep understanding of one area)
Repeatable and run on a regular
cadence
Measure how one change affects
overall performance
Compare to previous
performance and competitors’
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
*Sample data.
9.0
25.0
29.0
22.9
49.7 50.2 49.0 46.6
-10
0
10
20
30
40
50
60
70
2012 2013 2014 2015 2016 2017 2018 2019
NPS for iOS Mobile App
5. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
2012
2014
2016
2018+
A History of Our Benchmarks
Mobile app
baseline
Standardized
study design
across platforms
Refined metrics
and investigated
automation
opportunities
Expanded scope to
include desktop
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
6. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Our Approach
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
100-200 participants per platform
10 most common tasks per
platform, including some pre-login
and post-login
Participant demographics line up
with site and app visitors
Run at year-end
Quantitative and Qualitative task-
based measures
7. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Benchmark Measurements
Metrics are compared YoY and platform to platform
Task success Ease of use
Time on task SUS
system usability score
NPS
net promotor score
Meets Needs
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
8. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Why We Perform Benchmarks
Product Strategy
Competition
Market Research
Customer Feedback
(attitudinal)
Analytics (behavioral)
Benchmarking
attitudinal & behavioral
Full Website task
success
= 92.4%
Global navigation
changes
Feature testing
went well
Full Website
Benchmark
task success
= 75.9%
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
10. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces quantifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
Method
Photo by Hans Reniers on Unsplash
11. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Method
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Pilot - Qualitative
Unmoderated online video
Study design, tools issues, wording
Core - Quantitative
Task-based online
Success, time, task ease, SUS/NPS
Insights - Qualitative
Unmoderated online video
Insight and stories to explain core
1.
2.
3.
12. Follow a recurring cycle
that respects your release
schedule and rhythm of
your industry with enough
time to observe change.
Cadence
Photo by Wayne Bishop on Unsplash
13. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Cadence
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Considerations Investing Education
Time to observe changes Once per year Academic year
Unpredictable times of year Tax season Finals and breaks
Team bandwidth November to December Summer breaks
Release schedule Skipped year for redesign Back-to-school
Unexpected events Market changes Educational policy
14. Stay in field for as little
time as possible to limit
effects of unplanned
updates, current events,
and other factors out of
your control.
Time
in Field
Photo by Bruno Martins on Unsplash
15. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Time in Field
Impacts on time
Scope and number of tasks Sample size, complexity, and incident rate
Number of platforms Recruit process (direct, panel)
Changes from previous rounds of study Study requirements (log in to account)
Size and experience of team Technical issues (product, tools, vendors)
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
16. Match your sample to users by
platform using info from
analytics and other sources.
Aim for consistency, and plan
for subtle demographic changes
over time.
Recruiting
Photo by Jacek Dylag on Unsplash
17. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Recruiting
Guidelines Experience
Consistency
Same screener round-to-round
Sample changes as user demographics
changes
Actual users
(not prospects, co-workers, your mom)
Customer panel and a third-party recruiter
Significant results
About 150 for each task per platform and
per segment
About 300 mobile app users on two
platforms and 300 web (on computer)
Clean data
Prepare for cheaters and speeders
Removed 30% of sample after closing
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
18. Test a core of common and
critical tasks likely to remain
important over time.
Cover the breadth of your
information architecture.
What
To Test
Photo by Sherman Yang on Unsplash
19. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
What to Test
Guidelines Experience
Same tasks every time Tasks evolve with team and site/app
Same tasks across platforms Core tasks and platform specific tasks
Same tasks as possible with competitor 2-3 competitors focused on our main services
Shortest reasonable test time
Limit to 10 minutes - Adjust tasks-per-participant
and increase sample size
Randomly assign tasks if you have too many Login and logout are first and last
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
21. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces qutifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
TitleDefine
Task
Goals
Identify your goals before
creating your tasks.
Photo by Christian Widell on Unsplash
22. Task goal example
How well does our target
audience find and
understand the general
market performance?
Task examples
Option 1:
According to [Platform], is the
NASDAQ up or down today?
Option 2:
According to [Platform], what
is the current value of Dow
Jones?
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
23. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces qutifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
Title
Be
Sensitive
to Data
Privacy
Have a way for participants to
opt-out of the study if they are
not comfortable continuing, due
to potential sensitivity of their
data.
Photo by Markus Spiske on Unsplash
24. Situations
Your study requires people
to log in to their own
accounts to complete
some tasks which may
make some
uncomfortable
Opt-out Examples
Study level:
Are you willing to log in to
your account in your web
browser on a computer and go
through a few tasks for this
study?
Task level:
Find your account with the
smallest balance. What is its
current value?
Provide ranged answers and
an option of Prefer Not to Say
Your study requires people
to log in to their own
accounts to complete
some tasks which may
make some
uncomfortable
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
25. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces qutifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
TitleUse
Verifiable
Tasks
Use verifiable answers if you
can.
Self-report scores tend to
introduce noise and produce
inflated results.
Photo by Fleur on Unsplash
26. Type
Self-report –
Likert:
Validation Questions
How confident are you that you
have completed the task successfully?
1 (Not confident at
all) – 7 (Extremely
confident)
According to [Platform], is the NASDAQ up or down today?
Answers
Self-report –
binary:
Semi-verifiable:
Verifiable:
Did you complete this task
successfully?
Is the NASDAQ up or down?
What is NASDAQ at today?
Yes | No | I don’t
know
Up | Down | I don’t
know
[Text entry]
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
27. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces qutifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
Title
Consider
Measurable
vs Realistic
Tasks
Balance realistic task design and
verifiable answers.
Sometimes, you may have to
choose between a more realistic
task which only allows for self-
reported answers versus a less
realistic task which allows for
measurable results.
Photo by Ali Abdul Rahman on Unsplash
28. Task goal
Can participants
locate news
relevant to a
stock?
Task examples
More realistic:
Find a news article about [company
X]. Were you able to complete this task?
Self-report: Y/N
More measurable:
Find the most recent article on [platform]
about [company X]. What is the first word
of the article?
Verifiable: [write-in]
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
29. Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces qutifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
TitleIncrease
Data
Integrity
Use open-ended answer format
over multiple choice options to
curb cheating behavior.
Photo by Jan Kahánek on Unsplash
30. Task
Find the most
recent price for
Apple stock
(AAPL).
Task examples
Easy to grade:
Below $150
Between $151 and $190
Between $191 and $230
Between $231 and $270
Above $271
I don’t know
Discourage cheating:
[Open ended text answer]
I don’t know
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
39. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Where Are We Now
Creating a measurement
framework similar to
Google’s HEART
Monitoring on an on-
going basis
Planning baseline of
existing products to
precede launch of new
learning platform.
Evaluating scope:
academic subjects, tasks,
number of participants.
Identifying personnel
with skills and bandwidth.
Continuing to run the
benchmark annually
Exploring ways to
automate benchmarks of
experiences
Building a data storage
component in order to
create dashboards
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
40. UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
In Summary
Consider your
research goals
when developing
tasks
Leverage data
from both
qualitative and
quantitative task-
based methods
Take time when
developing your
study design
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
41. Questions?
Keep in touch!
Jennifer Otto: jennifer.otto@fmr.com
Paul Sisler: paul.sisler@pearson.com
Yina Li Turchetti: yina.turchetti@autodesk.com
UXPA
2 0 1 9
UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Notas del editor
Ask audience who has experience with running task-based quant studies.
Speak to data storage in the last point.
Changes in business approach, environment, users, released features.
Good to do it even nothing has changed: Data may show your customers have become more familiar with a year old release.
Clearly define success
Don’t change scales (5- to 7-point)
Consistent tasks and answers make it easy.
Binary vs. Likert success
5-point vs. 7-point scale
It maybe different even within the same tool for how you institute the study.
Version control our analysis files and save them in more than one places!
Determine what’s a correct answer.
E.g. position task. Is entering a % a correct answer?
There was some manual labor (Most gain a day)
Potentially incorporate into the Method slide or leave off.