Measuring UX Over Time

Building Your Benchmark
How to Measure UX
for Product Impact Over Time
Jennifer Otto | Paul Sisler | Yina Li Turchetti
UXPA
2 0 1 9

UX Benchmarking ■ #UXPA19 ■ #UXBenchmark
Welcome
Yina Li Turchetti
Senior UX Researcher
Autodesk
Jennifer Otto
Director, UX Research Ops
Fidelity Investments
Paul Sisler
Senior UX Researcher
Pearson Education

1. Benchmark Basics
2. Method
3. Cadence
4. Recruiting
5. Study Design
6. Analysis
7. Reporting
Contents
Photo by Kimberly Farmer on Unsplash

What Is a Benchmark?
Test end-to-end system, site, or app
Get an overall product pulse (not to be
for a deep understanding of one area)
Repeatable and run on a regular
cadence
Measure how one change affects
overall performance
Compare to previous
performance and competitors’
*Sample data.
9.0
25.0
29.0
22.9
49.7 50.2 49.0 46.6
-10
0
10
20
30
40
50
60
70
2012 2013 2014 2015 2016 2017 2018 2019
NPS for iOS Mobile App

2012
2014
2016
2018+
A History of Our Benchmarks
Mobile app
baseline
Standardized
study design
across platforms
Refined metrics
and investigated
automation
opportunities
Expanded scope to
include desktop

Our Approach
 100-200 participants per platform
 10 most common tasks per
platform, including some pre-login
and post-login
 Participant demographics line up
with site and app visitors
 Run at year-end
 Quantitative and Qualitative task-
based measures

Benchmark Measurements
Metrics are compared YoY and platform to platform
Task success Ease of use
Time on task SUS
system usability score
NPS
net promotor score
Meets Needs

Why We Perform Benchmarks
Product Strategy
 Competition
 Market Research
 Customer Feedback
(attitudinal)
 Analytics (behavioral)
 Benchmarking
attitudinal & behavioral
Full Website task
success
= 92.4%
Global navigation
changes
Feature testing
went well
Full Website
Benchmark
task success
= 75.9%

GETTING STARTED
UXPA
2 0 1 9

Repeatable
A future team member can execute the
same protocol on a new version
Measurable
Produces quantifiable observations
Comparable
Results in findings suitable for
comparison across rounds
Plays to Team Forte
Relies on familiar approach
Method
Photo by Hans Reniers on Unsplash

Method
Pilot - Qualitative
Unmoderated online video
Study design, tools issues, wording
Core - Quantitative
Task-based online
Success, time, task ease, SUS/NPS
Insights - Qualitative
Unmoderated online video
Insight and stories to explain core
1.
2.
3.

Follow a recurring cycle
that respects your release
schedule and rhythm of
your industry with enough
time to observe change.
Cadence
Photo by Wayne Bishop on Unsplash

Cadence
Considerations Investing Education
Time to observe changes Once per year Academic year
Unpredictable times of year Tax season Finals and breaks
Team bandwidth November to December Summer breaks
Release schedule Skipped year for redesign Back-to-school
Unexpected events Market changes Educational policy

Stay in field for as little
time as possible to limit
effects of unplanned
updates, current events,
and other factors out of
your control.
Time
in Field
Photo by Bruno Martins on Unsplash

Time in Field
Impacts on time
Scope and number of tasks Sample size, complexity, and incident rate
Number of platforms Recruit process (direct, panel)
Changes from previous rounds of study Study requirements (log in to account)
Size and experience of team Technical issues (product, tools, vendors)

Match your sample to users by
platform using info from
analytics and other sources.
Aim for consistency, and plan
for subtle demographic changes
over time.
Recruiting
Photo by Jacek Dylag on Unsplash

Recruiting
Guidelines Experience
Consistency
Same screener round-to-round
Sample changes as user demographics
changes
Actual users
(not prospects, co-workers, your mom)
Customer panel and a third-party recruiter
Significant results
About 150 for each task per platform and
per segment
About 300 mobile app users on two
platforms and 300 web (on computer)
Clean data
Prepare for cheaters and speeders
Removed 30% of sample after closing

Test a core of common and
critical tasks likely to remain
important over time.
Cover the breadth of your
information architecture.
What
To Test
Photo by Sherman Yang on Unsplash

What to Test
Guidelines Experience
Same tasks every time Tasks evolve with team and site/app
Same tasks across platforms Core tasks and platform specific tasks
Same tasks as possible with competitor 2-3 competitors focused on our main services
Shortest reasonable test time
Limit to 10 minutes - Adjust tasks-per-participant
and increase sample size
Randomly assign tasks if you have too many Login and logout are first and last

STUDY DESIGN CONSIDERATIONS
UXPA
2 0 1 9

Repeatable
Measurable
Produces qutifiable observations
Comparable
Plays to Team Forte
TitleDefine
Task
Goals
Identify your goals before
creating your tasks.
Photo by Christian Widell on Unsplash

Task goal example
How well does our target
audience find and
understand the general
market performance?
Task examples
Option 1:
According to [Platform], is the
NASDAQ up or down today?
Option 2:
According to [Platform], what
is the current value of Dow
Jones?

Repeatable
Measurable
Comparable
Plays to Team Forte
Title
Be
Sensitive
to Data
Privacy
Have a way for participants to
opt-out of the study if they are
not comfortable continuing, due
to potential sensitivity of their
data.
Photo by Markus Spiske on Unsplash

Situations
Your study requires people
to log in to their own
accounts to complete
some tasks which may
make some
uncomfortable
Opt-out Examples
Study level:
Are you willing to log in to
your account in your web
browser on a computer and go
through a few tasks for this
study?
Task level:
Find your account with the
smallest balance. What is its
current value?
Provide ranged answers and
an option of Prefer Not to Say
Your study requires people
to log in to their own
accounts to complete
some tasks which may
make some
uncomfortable

Repeatable
Measurable
Comparable
Plays to Team Forte
TitleUse
Verifiable
Tasks
Use verifiable answers if you
can.
Self-report scores tend to
introduce noise and produce
inflated results.
Photo by Fleur on Unsplash

Type
Self-report –
Likert:
Validation Questions
How confident are you that you
have completed the task successfully?
1 (Not confident at
all) – 7 (Extremely
confident)
According to [Platform], is the NASDAQ up or down today?
Answers
Self-report –
binary:
Semi-verifiable:
Verifiable:
Did you complete this task
successfully?
Is the NASDAQ up or down?
What is NASDAQ at today?
Yes | No | I don’t
know
Up | Down | I don’t
know
[Text entry]

Repeatable
Measurable
Comparable
Plays to Team Forte
Title
Consider
Measurable
vs Realistic
Tasks
Balance realistic task design and
verifiable answers.
Sometimes, you may have to
choose between a more realistic
task which only allows for self-
reported answers versus a less
realistic task which allows for
measurable results.
Photo by Ali Abdul Rahman on Unsplash

Task goal
Can participants
locate news
relevant to a
stock?
Task examples
More realistic:
Find a news article about [company
X]. Were you able to complete this task?
Self-report: Y/N
More measurable:
Find the most recent article on [platform]
about [company X]. What is the first word
of the article?
Verifiable: [write-in]

Repeatable
Measurable
Comparable
Plays to Team Forte
TitleIncrease
Data
Integrity
Use open-ended answer format
over multiple choice options to
curb cheating behavior.
Photo by Jan Kahánek on Unsplash

Task
Find the most
recent price for
Apple stock
(AAPL).
Task examples
Easy to grade:
Below $150
Between $151 and $190
Above $271
I don’t know
Discourage cheating:
[Open ended text answer]
I don’t know

ANALYSIS
UXPA
2 0 1 9

Use same
methodology
year after year
Photo by Kolleen Gladden on Unsplash

Determine and
record task-level
success for write-
in responses
Photo by Gabriel Sollmann on Unsplash

Consider
quotes and
video content
to support
conclusions
Photo by Bogomil Mihaylov on Unsplash

REPORTING RESULTS
UXPA
2 0 1 9

Include charts
and verbatims
Photo by Jason Leung on Unsplash

Describe factors that
may have affected
results: demographic
changes, industry events
Photo by Outsite Co on Unsplash

Include screenshots
of before and after
designs
Photo by Roberto Nickson on Unsplash

Where Are We Now
Creating a measurement
framework similar to
Google’s HEART
Monitoring on an on-
going basis
Planning baseline of
existing products to
precede launch of new
learning platform.
Evaluating scope:
academic subjects, tasks,
number of participants.
Identifying personnel
with skills and bandwidth.
Continuing to run the
benchmark annually
Exploring ways to
automate benchmarks of
experiences
Building a data storage
component in order to
create dashboards

In Summary
Consider your
research goals
when developing
tasks
Leverage data
from both
qualitative and
quantitative task-
based methods
Take time when
developing your
study design

Questions?
Keep in touch!
Jennifer Otto: jennifer.otto@fmr.com
Paul Sisler: paul.sisler@pearson.com
Yina Li Turchetti: yina.turchetti@autodesk.com
UXPA
2 0 1 9

Measuring UX Over Time

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Measuring UX Over Time

Similar a Measuring UX Over Time (20)

Más de UXPA International

Más de UXPA International (20)

Último

Último (20)

Measuring UX Over Time

Notas del editor