This presentation describes how usability testing of surveys can be used to improve data quality and reduce respondent burden. We describe what kind of surveys can be tested and when. We also provide practice advice for planning, conducting, and analyzing usability tests of surveys.
Usability Testing for Survey Research:How to and Best Practices
1. Nov 9, 2016
QDET2 short course | Miami, FL
Usability Testing for Survey Research:Usability Testing for Survey Research:Usability Testing for Survey Research:Usability Testing for Survey Research:
How To and Best PracticesHow To and Best PracticesHow To and Best PracticesHow To and Best Practices
Jen Romano-Bergstrom
Sr. UX Researcher
Facebook
jenrb@fb.com
@romanocog
Emily Geisen
Usability/Cog Lab Manager
Survey Methodologist
RTI
egeisen@rti.org
#QDET2
2. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing results
2
2:00 – 3:45
4:00 – 5:30
#QDET2
3. Activity
• How long did it take you to get here?
• What is today’s date?
3
#QDET2
4. Why is Design Important?
4
#QDET2
Image source: Geisen & Romano Bergstrom, 2017
5. Is this position 0 or missing?
Why is Design Important?
Image source: Geisen & Romano Bergstrom, 2017
6. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing results
6
#QDET2
2:00 – 3:45
4:00 – 5:30
7. Usability definedUsability definedUsability definedUsability defined
7
“The extent to which a product can
be used by specified users to achieve
specified goals with effectiveness,
efficiency, and satisfaction in a
specified context of use.”
-ISO 9241:11
#QDET2
8. Usability definedUsability definedUsability definedUsability defined
8
“The extent to which a product can
be used by specified users to achieve
specified goals with effectiveness,
efficiency, and satisfaction in a
specified context of use.”
-ISO 9241:11
#QDET2
9. 9
• Product
• Users
• Goals
• Context
• Effectiveness, Efficiency, Satisfaction
What does usability mean for your survey?What does usability mean for your survey?What does usability mean for your survey?What does usability mean for your survey?
#QDET2
10. What does usability mean for surveys?What does usability mean for surveys?What does usability mean for surveys?What does usability mean for surveys?
10
Product Web sites, web surveys, paper surveys, apps
Users Our respondents (mostly), interviewers
Goals Respondents must be able to provide their correct
and accurate opinions, stories, facts, predictions
Context of use In their homes, offices, out and about
Effectiveness Completing the questions and survey with accurate
answers
Efficiency Completing the questions and survey quickly, with as
few steps/clicks as possible
Satisfaction Having a pleasant experience
#QDET2
12. 12
Satisfaction is more complex.
• Did it allow them to provide their accurate answers?
• Did they enjoy the experience?
• Did it require too much time to complete?
• Did they find the instrument easy to use?
• Did they find it easy to learn how to use?
#QDET2
13. 13
Other factors may be important.
• Easy to remember how to use (memorability)
• Error frequency and severity
• Accessibility
• And most crucial for surveys
• Data quality
• Respondent burden
#QDET2
14. 14
Usability testing is watching a user try to
achieve the goal
• Participants represent real users
• Participants do real tasks
• You observe and record what participants do
• You think about what you saw:
• Analyze data,
• Diagnose problems,
• Recommend changes.
• Make changes and test again
#QDET2
Image source: Geisen & Romano Bergstrom, 2017
15. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing results
15
#QDET2
2:00 – 3:45
4:00 – 5:30
16. 16
Surveys are not web sites.
…so why is usability testing
needed for survey research?
#QDET2
17. Surveys
• Improve data quality
• Reduce respondent
burden
Websites
• Increase traffic/revenue
(e.g., sell more product)
• Disseminate
information
17
Usability Testing Goals
#QDET2
18. 18
Users are not trained interviewers
• Web surveys go to the general (or specific) public
• Varying levels of computer expertise
• Varying levels of literacy
• Likely to be in a hurry, interrupted, distracted
• There is no interviewer
• No one to interpret the questions
• No one to navigate around the instrument
#QDET2
19. 19
#QDET2
Presser et al 2004: pretesting focuses on a
“broader concern for improving data
quality so
that measurements
meet a survey’s
objective”
Image source: Geisen & Romano Bergstrom, 2017
20. 20
Cognitive Testing vs Usability
• Cognitive Testing: Do people understand it?
• What are your feelings towards Obamacare? vs.
• What are your feelings towards the Affordable Care Act?
• Usability Testing: Can people use it?
#QDET2
Image source: Geisen & Romano Bergstrom, 2017
21. Usability Model for Surveys
#QDET2
1. Interpreting the design:
a. What meaning do respondents assign to visual design and layout?
b. How do respondents believe the survey works?
2. Completing actions and navigating:
a. How well does the survey support respondents’ ability to
complete tasks and goals?
b. How well do respondents follow navigational cues and
instructions?
3. Processing feedback:
a. How do respondents interpret and react to the survey feedback in
response to their actions?
b. How well does the survey help respondents identify, interpret, and
resolve errors?
Source: Geisen & Romano Bergstrom, 2017
23. 23
Navigation Usability Study
Method
• Lab-based usability study
• TA read introduction and left letter on desk
• Separate rooms
• R read letter and logged in to survey
• Think Aloud
• Eye Tracking
• Satisfaction Questionnaire
• Debriefing
* p < 0.0001
#QDET2
Romano & Chen, 2011
24. 24
Navigation Usability Study
Eye Tracking
Romano & Chen, 2011
• Participants looked at Previous and Next in PN conditions
• Many participants looked at Previous in the N_P conditions
• Couper et al. (2011): Previous gets used more when it is on the right.
#QDET2
25. 25
Navigation Usability Study
Debriefing Interview
• N_P version
• Counterintuitive
• Don’t like the “buttons being flipped.”
• Next on the left is “really irritating.”
• Order is “opposite of what most people would design.”
• PN version
• “Pretty standard, like what you typically see.”
• The location is “logical.”
#QDET2
Romano & Chen, 2011
26. • “If you’re doing a web survey, you’re doing a mobile
survey.” - Michael Link, 2013 AAPOR
• Respondents on mobile devices are as high as 30%
or more for some surveys (Lugtig, Toepoel & Amin,
2016; Saunders, 2015).
26
Don’t forget about mobile
#QDET2
27. 27
#QDET2
Romano Bergstrom, QDET2, 2016
Mobile Usability Study
V1: long list of items: grid on desktop; drop down to select
response on mobile
28. 28
#QDET2
Mobile Usability Study
V1: long list of items; drop down to select response
V2: each question on
separate screens
Romano Bergstrom, QDET2, 2016
30. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing results
30
2:00 – 3:45
4:00 – 5:30
#QDET2
31. What can be tested?
31
#QDET2
Image source: Geisen & Romano Bergstrom, 2017
32. Exploratory Testing Example
32
• Background
• GSS collects data about graduate students and postdocs in
different fields of study
• NSF wanted to modify GSS to capture data at a more
consistent and detailed level (e.g., programs instead of
departments)
• The design approach
• Redesigned parts of survey using this model
• Conducted usability testing
#QDET2
33. Created a Hierarchy to Collect Data
33
• School/College: The Graduate school
• Department: Biological Sciences
• Program: Cell Biology
• Provide counts of graduate students in cell biology by
race/ethnicity, sex, etc
• Program: Botany
• Provide counts of graduate students in botany by race/ethnicity,
sex, etc
• Department: Physics
• Program: Atmospheric physics
• Provide counts of graduate students in cell biology by
race/ethnicity, sex, etc
#QDET2
34. Survey Framework Did Not Fit Users
34
• No common terminology
• What is a department vs program?
• Used other terminology altogether:
division, concentration, track, field, subject
• No common hierarchy or structure
• Departments within programs and vice versa
• No departments, just programs
• Some departments had programs, some didn’t
• Information not available at level desired
#QDET2
36. Low-Fidelity Testing
36
• Methods: simple drawings or illustrations,
Word/Excel/Visio, simple screen shots, website shell
• Uses: new questions or surveys, redesigns, evaluating
information architecture, visual aspect of survey, web-
centric features
• Benefits: allows for quick-feedback without spending
too much time or money on programming, can apply
results to other aspects of survey
• Don’t forget mobile testing (if using) at this stage!
#QDET2
37. What I can test: Paper Mock-Ups
37
#QDET2
Image source: Geisen & Romano Bergstrom, 2017
43. When to test
43
• Start as early as possible
• Testing should be integrated into the programming
schedule, not conducted after
• Test in stages as web survey is being developed
• Test until all serious problems resolved / stop learning
anything new (ideally)
• Iterative testing benefits from more rounds, fewer
people
#QDET2
44. Reasons for more rounds, fewer people
44
• Identify more issues: 2 rounds of 5 users will likely
identify more issues than 1 round of 10 users
• Diminishing returns from more users in each round
• Can be hard for users to see past the big glaring problems
to other more subtle problems
• Allows you to test solution
• Good balance between testing resources and revision
resources
• Quicker to summarize results and revise testing
#QDET2
45. Smaller rounds support collaboration
45
• Include stakeholders in testing
• Have programmers, clients, decision-makers observe
testing live or remotely
• Direct observation is more exciting than reading a report
• Collaborative process
• Conduct tests in the morning, meet to discuss over a long
lunch, recommendations for changes ready in the
afternoon
• Report can then summarize findings
and changes instead of findings and
recommendations
#QDET2
46. Iterative Testing: Example
One box, prompt inside box
One box, prompt below box: resulted in
more complete names
Separate boxes, prompt below: even
more complete names
46
#QDET2
Geisen, Olmsted, Goerman & Lakhe, 2014
47. Iterative Testing: Example
One box, prompt inside box
One box, prompt below box: resulted in
more complete names
Separate boxes, prompt below: even
more complete names
47
#QDET2
Geisen, Olmsted, Goerman & Lakhe, 2014
48. Start with web & survey best practices
48
• User-centered evaluation includes best
practices/findings from literature
• Abundance of literature
• Designing Effective Web Surveys (Couper)
• Internet, Mail, and Mixed-Mode Surveys (Dillman et al.)
• Jakob Nielsen
#QDET2
49. Build off the literature before doing
usability testing
49
• Usability testing will show you how well or how easily
people can do method A
• Will not necessarily show you that method A is
definitively better than method B
• Not a replacement for large, probability-based
methodological experiments
• Don’t reinvent the wheel
#QDET2
50. Building off the literature: Example
50
• Concern: Want to know best method for providing
definitions in web surveys
• Ask: Has this been done before?
• Start with the literature:
• Conrad, Couper, Tourangeau, Peytchev (2006)
• Peytchev, Conrad, Couper, Tourangeau (2007)
• Peytchev, Conrad, Couper, Tourangeau (2010)
#QDET2
51. Building off the literature: Example (cont)
51
• Methods
• Experiment 1: one-click, two-clicks, click and scroll
• Experiment 2: roll-over, one-click, two-clicks
• Experiment 3: roll-over vs. always included
• Conclusions:
• Reading definitions probably improves accuracy
• Less effort required, more likely to read definitions
#QDET2
52. Example 2
52
• Methods
• Experiment 1: One-click, two-clicks, click and scroll
• Experiment 2: roll-over, one-click, two-clicks
• Experiment 3: hover-over vs always included
• Conclusions:
• Reading definitions probably improves accuracy
• Less effort required, more likely to read definitions
#QDET2
53. Start with the literature, but decide
what’s relevant for your study
53
• The literature may not focus on the study population
needed for your survey
• May not be any literature on the particular topic or
issue your survey has
• And sometimes the experts just don’t agree,
then what?
#QDET2
54. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Reporting findings
54
#QDET2
2:00 – 3:45
4:00 – 5:30
55. Obstacles to Testing
55
• “There is no time.”
• Start early in development process.
• One morning a month with 3 users (Krug)
• 12 people in 3 days (Anderson Riemer)
• 12 people in 2 days (Lebson & Romano Bergstrom)
• “I can’t find representative users.”
• Everyone is important, and something is better than
nothing.
• Remote testing or travel
• “We don’t have a lab.”
• You can test anywhere.
#QDET2
56. Planning
56
• Participant Selection and Recruitment
• Testing Location and Equipment
• Identifying Testing Focus/Concerns
• Identifying Measures to Collect
• Decide on Testing Roles
• Preparing Test Materials
#QDET2
57. Participants: determining target audience
57
• Recruit people who are like your target users
• Who is the survey for?
• Consider participants’ jobs and other roles
• Recruit diverse participants
• Age
• Income
• Education
• Is location important?
#QDET2
59. Participant recruitment
59
• Existing Participant Lists (Existing Frame)
• Target respondents; small percentage of successful recruits
• No Participant Lists (Constructed Frame)
• Research firm with database
• Reliable; may be professional participants
• Hang fliers nearby
• Target locals; bit of work walking around
• Online social media ads
• Target specific criteria; social media users
• Classifieds
• Lots of responses quickly, non-Internet users; may be professional
• Snowball (word-of-mouth)
• Good for specific populations; They may know each other
#QDET2
60. Participant recruiting tips
60
• Recruit “floaters” (for no-shows and cancellations)
• Talk to your participants early
• Ask about specific behaviors relevant to your study (e.g.
mobile usage, time spent online)
• Talk about what they’ll do and build rapport
• Get an email address and a contact number
• Schedule sessions ASAP (e.g., 3 weeks ahead)
• Remind them the day before
#QDET2
61. 61
Location: Lab, Remote, In the Field
• Controlled environment
• All participants have the
same experience
• Record and communicate
from control room
• Observers watch from
control room and provide
additional probes (via
moderator) in real time
• Incorporate physiological
measures (e.g., eye
tracking, EDA)
• No travel costs
Laboratory Remote In the Field
• Participants tend to be
more comfortable in
their natural
environments
• Recruit hard-to-reach
populations (e.g.,
children, doctors)
• Moderator travels to
various locations
• Bring equipment (e.g.,
eye tracker)
• Natural observations
• Participants in their
natural environments
(e.g., home, work)
• Use video chat
(moderated sessions)
or online programs
(unmoderated)
• Conduct many
sessions quickly
• Recruit participants in
many locations (e.g.,
states, countries)
#QDET2
65. 65
#AAPOR2016
Great for
quickly
assessing
different
designs –
gather large
amounts of
data quickly
0.00
0.10
0.20
0.30
0.40
0.50
0.60
Only Me Friends Confirm Change progress bar back on
phone
outside area
V1, N=2000
V2, N=1898
Pre-UX, N=864
First-click heat maps
Percentage of participants who made
the first click to these areas of interestRemote
unmoderated
testing
example
67. Testing location: summary
67
Your location / Facility User’s location
Pros • Use your equipment (e.g.,
one-way mirror, recording
software)
• Controlled setting with
no/few interruptions
• Simulates real user experience
• Users have access to info
• Easier to schedule/
accommodate users
Cons • Not true to real life
• More burden to user – more
no shows and cancelations
• Using your computer/device
not the user’s computer
• Need portable equipment or do
without
• Interviewer travel - increased
cost to researcher
• Safety matters
• Harder to schedule observers
#QDET2
68. Equipment: video/audio recording
68
• Helps with note-taking
• Reduces need to take notes or have a note-taker during
interview
• Can take more nuanced notes afterwards if you can start
and stop the video
• Helps after the interview
• Useful during debriefing to replay parts of video
• Accommodates observers who could not make it to the
actual session
#QDET2
69. Equipment: Screen-sharing for observers
69
• Fosters collaboration
• Can accommodate observers from any location
• Facilitate discussions in conference setting
• Improved schedule
• Stakeholders get information immediately
• No waiting for recorded videos or report
• Cheaper
• Inexpensive compared to travel costs
• However, watching from their desks leads to less
engagement than in-person – the best is always
having observers in-person, in the observation room
#QDET2
70. Identify Testing Focus and Concerns
70
• General focus: Can they complete the survey?
• More specific concern:
We are worried about the definitions.
• More specific:
Are hover-overs an effective way of providing definitions?
• More specific: Do participants know definitions are available?
• More specific: Do participants understand what’s hover-overable?
• More specific: How helpful/unhelpful are the definitions?
#QDET2
71. More examples of specific concerns
71
• How well do people understand the instructions?
• Do people read the entire question and response options before responding? If not,
what do they read?
• Can people use the Next and Previous navigation buttons correctly?
• Do people know what to do on each screen?
• How easily do people find the information they need to answer the questions?
• When people do not understand something or have a question, do they use the
FAQs?
• Are the FAQs helpful/sufficient? What is missing?
• Are people able to correctly select their job from a long list of potential jobs?
• When do people use the left navigation, if at all?
• Can people use sliders correctly to select the desired response?
#QDET2
72. Identifying measures to collect
72
• Observational metrics tell us howhowhowhow participants navigate and interact.
• Self-report metrics tell us whywhywhywhy participants focus on certain site aspects.
• Eye tracking tells us what, how long, and how oftenwhat, how long, and how oftenwhat, how long, and how oftenwhat, how long, and how often participants focus on design elements.
• The combination of observational, self-report, and implicit data allows us to accurately measure
the user experience. We do not use eye tracking in isolation.
#QDET2
73. Include eye tracking?
73
Consider using eye tracking if you want to:Consider using eye tracking if you want to:Consider using eye tracking if you want to:Consider using eye tracking if you want to:
• Observe what attracts attention
• Discover potential areas of confusion/interest
• Watch as users learn to interact with an interface over time
• Validate/invalidate design changes
Usability Testing Usability Testing with Eye Tracking
Can users complete the survey
and individual items?
Do users see things that aid/hurt
completion?
• Direct observation of users’
behaviors
• Analysis: users’ conceptual
model vs. survey model
• Look patterns: locations, duration,
path
• Analysis: intended visual hierarchy
vs. actual look pattern
Evaluates usability Evaluates user experience
Supports improved ease of use Supports improved ease of use and
increased engagement
#QDET2
75. Eye tracking enables researchers to assess
attention to motivational language and
brand, which may impact response rate....
75Walton, Romano Bergstrom, Hawkins & Pierce, 2014
#QDET2
76. Eye tracking example: People read pagesPeople read pagesPeople read pagesPeople read pages
withwithwithwith questions on them differently than other pagesquestions on them differently than other pagesquestions on them differently than other pagesquestions on them differently than other pages....
76Jarrett & Romano Bergstrom, 2014
The F-shaped eye-tracking pattern of the
block of text at the top of the page is
completely different from the eye-
tracking pattern on the question and
answer spaces at the bottom of the page.
#QDET2
77. Eye tracking example: PeoplePeoplePeoplePeople dondondondon’’’’t read importantt read importantt read importantt read important
parts of survey invitation letters.parts of survey invitation letters.parts of survey invitation letters.parts of survey invitation letters.
77Olmsted-Hawala, Wang, Willimack, Burke & Lakhe, 2016
#QDET2
84. Plan your measurements
84
• Examples of performance measures
• Success rate and/or speed for tasks
• Requests for help/assistance
• Number and types of errors that occurred (e.g., incorrect
selections, menu choices)
• Count of features used (e.g., help menu, hover-over
definitions, calculate button)
• Examples of preference measures
• Do you prefer A or B? Why?
• How or easy or difficult was it to do … Very easy, easy…
#QDET2
85. Organize roles
85
• Meet and greet
• Observers
• Test facilitator
• Note taker
• Videographer
#QDET2
86. Develop your test materials
86
• Develop consent forms, screeners
• Instructions/directions for participants
• Prepare written tasks/scenarios (on index cards)
• Pretest/posttest questionnaires
• Observer note sheets
#QDET2
87. Tasks, Scenarios, Probes
87
• Scenario – a real-life situation that you ask
participants to put themselves in to test the
instrument
• Task – something you want the participant to
accomplish
• Probe – questions asked of the user to elicit additional
information and feedback
#QDET2
88. A scenario brings the data together into a
coherent story
88
• Keep scenarios short and simple
• Scenarios should reflect things participants might
actually do
• Use vignettes to test rare/unusual situations
• Use the participant’s words, not researchers
• May need to prepare fake data to answer questions
(e.g., SSN, phone number)
#QDET2
89. For some products, you need a task
89
• These are things that you want the user to do
• Often as simple as: “Please fill out this survey as you
would at home”
• You may need specific tasks to match your test focus
and concerns
#QDET2
90. 90
Example 1 – Scenario
Romano Bergstrom, Childs, Olmsted-Hawala & Jurgenson, 2013
• Participants imagined they were at the respondent’s door
#QDET2
91. 91
Example 2 – Scenario
• Participants imagined they were at the respondent’s door
#QDET2
Romano Bergstrom, Childs, Olmsted-Hawala & Jurgenson, 2013
92. 92
Example 2 – Scenario
• To assess if the Information Sheet worked well, scripts
were used to ensure interviewers could record
difficult-to-record households.
#QDET2
Romano Bergstrom, Childs, Olmsted-Hawala & Jurgenson, 2013
93. Example 2 - Scenario and Task
Scenario: Your graduate school will include the following PhD
programs this year:
• Biology
• Chemistry
• Marine, Earth, and Atmospheric Sciences
• Physics
• Spanish
Task: Please update the list of departments, programs, and
research units that should be included in the survey for this
year.
#QDET2
94.
95. Don’t confuse participants with
too many scenarios
95
• Whittle lists of tasks/scenarios to manageable number
• Prepare tasks to give to participants (index cards are
useful)
• Tasks should flow in order of the survey
• Okay to change tasks between rounds
#QDET2
96. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing results
96
#QDET2
2:00 – 3:45
4:00 – 5:30
98. The day before the test
98
• Send out reminders
• Phone or email to respondents
• Email to stakeholders
• Equipment/Facility
• Check the computers and software (remote sharing, video
recording), keyboard, mouse
• Make sure the room you’ll use is tidy
• Make sure your meet/greet person
has the final list of participants’ names
• Incentives are available
#QDET2
101. Set-Up for Mobile w Eye Tracking
101
Fors Marsh Group UX Lab Facebook UX Lab
#QDET2
102. Moderating Technique: Think Aloud
102
• Getting respondents to verbalize their thoughts
• Can be concurrent or retrospective
• Implementing
• Explain “thinking aloud” at the start
• Get the participant to try an example
• Remind them periodically (What are you thinking?)
• Snags:
• Thinking aloud is not natural for some people
• Others will start well, then forget
#QDET2
103. Think Aloud: concurrent vs retrospective
Concurrent
• Immediate thoughts
(good recall)
• Procedural comments
• May affect task
performance and
usability metrics.
• Can interfere with eye-
tracking data
• Shorter session length
• Less natural
Retrospective
• Relies on memory (recall
failure)
• Explanatory comments
• No effect on task
performance or usability
metrics
• Accurate Eye-tracking
data
• Session length increases
• More natural
103
#QDET2
104. Users may need help with thinking aloud
104
• Prompt as needed
• “What are you thinking?”
• “Tell me what you’re doing.
• “Tell me what you’re looking at.”
• “Keep talking.”
• “Tell me more about that.”
• Show you’re listening
• Be Patient
• Give reminders
#QDET2
105. Moderating Technique: Verbal Probing
105
• Ask targeted questions (probes) about content or
functionality
• Explore content in more depth
• Concurrent vs retrospective
• Scripted vs spontaneous
#QDET2
106. Verbal Probing: Concurrent vs Retrospective
Concurrent
• Immediate thoughts (good
recall) and more detail
• May be biased
• Affects task performance and
usability metrics
• Ideal for exploratory tests and
cognitive/usability combined
tests
• Better for participants with
low cognitive ability
Retrospective
• Relies on memory (recall
failure), less detail
• Less biased
• No effect on task
performance or usability
metrics
• Can be used in any stage
of testing
106
#QDET2
107. Verbal Probe Examples 1
Immediate thoughts or
reactions
• What are your thoughts
on this [screen]?
• What are you thinking?
• What are you doing?
• What are you looking at?
• What are you trying to
do?
Does functionality match
expectations?
• What do you expect to
happen when you [click
that link/button]?
• How did you expect that
to work?
107
#QDET2
108. Verbal Probe Examples 2
Understand user
• What do you want to
accomplish?
• Can you describe the steps
you are taking now?
• How did you feel about
that process to [complete
task]?
• What’s going through your
mind right now?
Probing further
• Echoing
• Can you tell me more
about that?
• Can you provide an
example of [X]?
108
#QDET2
109. Probing Tips
109
• Avoid yes/no questions, people tend to be
acquiescent
• Bad: “Was this task difficult to complete?”
• Good: “How easy or difficult was that task to complete?”
• Ask unbiased questions
• Bad: “Are you looking at the X link?”
• Good: “Can you tell me what are you looking at?”
• Be quiet and wait
• Bad: Impatiently asking “what’s happening?”
• Good: Count to 20 before jumping in. Or to 30.
#QDET2
110. When you hear yourself asking a leading
question, balance it
110
Leading
question
“So you think
that’s difficult
then?”
Balanced
question
“...or was it
easy?”
#QDET2
111. Choosing a Moderating Technique
111
• Can the participant work completely alone?
• Will you need time on task and accuracy data?
• Are the tasks multi layered and/or require
concentration?
• Will you be conducting eye tracking?
#QDET2
112. Moderating Techniques: Summary
112
#QDET2
Approach Advantages Disadvantages
Concurrent
Think
Aloud
• Feedback in real-time
• Good recall
• Procedural comments
• Shorter session length
• Unbiased feedback
• Easy for moderators to learn
• Slight effect on task
performance (vs. RTA)
• May affect usability
metrics.
• Some interference
with eye-tracking data
• Less natural
• Hard for some
participants
Retro-
spective
Think
Aloud
• Explanatory comments
• No effect on task performance
or usability metrics
• Accurate Eye-tracking data
• More natural
• Unbiased feedback
• Easy for moderators to learn
• Recall failure
• Longer session length
• Hard for some
participants
• Requires heavy cueing
Geisen & Romano Bergstrom, 2017
113. Moderating Techniques: Summary 2
113
#QDET2
Approach Advantages Disadvantages
Concurrent
Verbal
Probing
• Feedback in real-time
• Good recall
• Ask targeted questions
• More detailed comments
• Works well for exploratory tests,
cognitive/usability combined tests
• Easiest for participants, especially
with low cognitive ability
• May introduce bias
• Negative effect on task
performance and
usability metrics
• Hardest for moderators
to learn
• Longest session lengths
Retro-
spective
Verbal
Probing
• Less biased
• Ask targeted question
• No effect on task performance or
usability metrics
• Can be used in any stage of testing
• Easier for participants
• Recall failure
• Requires some cueing
• Less detailed
comments
• Hard for moderators to
learn (vs CTA, RTA)
Geisen & Romano Bergstrom, 2017
114. Moderating Tips
114
• Maintain objective viewpoint
• Be prepared for surprises
• Report accurately
• Redirect participants to keep them on task
• Avoid coaching
• Don’t help participants
• Don’t ask if they would do “anything else”
• Don’t suggest: “Let’s try this”
#QDET2
115. Moderating Tips (Continued)
• Participant is silent
• Be patient. Then if necessary, ask “What are you
thinking?”
• Participant asks you, “Is this right?”
• “We just want to see how you do it?”
• Participant asks you for help
• “What would you do if I wasn’t here?”
• Participant blames himself/herself
• “A lot of people have had this problem.”
• “Your feedback helps us learn what we need to
improve.”
115
#QDET2
116. Provide neutral feedback
116
• Provide praise/feedback for every task, successful or
unsuccessful
• Keep it neutral
• “mm hmmm:
• “uh huh”
• “That’s interesting”
• “that’s helpful”
• If you are writing notes then write down everything
• If you only write bad things, the participant will notice it’s
biased
#QDET2
117. AgendaAgendaAgendaAgenda
Usability Testing and Survey Research
• What is usability and usability testing?
• Why do we need it in survey research?
• What to test and when
How to Conduct Usability Testing
• Planning
• Conducting sessions
• Analyzing Results
117
#QDET2
2:00 – 3:45
4:00 – 5:30
118. Analyzing Results
118
Analyze
• Collect all of your data together
• Summarize/Reduce to meaningful chunks
• Understanding what it means
Revise
• What can/should we do about it?
Test again
Barnum, 2011
#QDET2
119. Collect All of Your Data Together
119
• Self-reported
• Verbalizations
• Satisfaction and difficulty ratings from questionnaires
• Observational
• Usability metrics
• Click patterns
• Behavior and other observations
• Implicit:
• Eye-tracking data
#QDET2
121. Focus on the most serious problems
121
• Run your usability test
• Have a meeting with the key stakeholders
• Decide on most important problems (and problems
that are easily fixed)
• Go away and fix them
• Ignore everything else
• Test again and repeat
#QDET2
122. Determining the problems to target
122
• Frequency of the problem (e.g., 5 out of 5 users)
• How likely are others to have this problem?
• What’s the impact on the survey (e.g., causes break-
offs, inaccurate data)
• How much of the survey does it affect (e.g., local vs.
global finding)?
• How easy/difficult is it to fix (low-hanging fruit)
#QDET2
123. Focus on findings that improve quality
123
• When usability testing a survey, focus on
• Improving data quality
• Reducing respondent burden
#QDET2
124. Determining what and how to fix
problems is harder.
124
• Group debrief after the test to discuss
• Set priorities
• Most serious problems (ignore “nice to haves”)
• Problems that are easy to fix (e.g., typos, wording)
• How long will it take to fix?
• Will fixing it cause other potential problems?
• Recommendations should be specific/doable within
timeframe and budget
• Not everything will get fixed
#QDET2
125. Weigh effect on data vs Effort to fix
125
#QDET2
Geisen & Romano Bergstrom, 2017
126. With few users, each one really matters
126
• Are there outliers in small studies?
• How representative is each user?
• Are others likely to have this problem?
• Caveats for reporting data with few users
• Report numbers (4 out of 5) rather than percentages
• Report with numbers rather than words (most, usually,
almost all)
• With a small number of users, you will find your
biggest problems
• Iterate and/or conduct remote unmoderated testing, if
possible
#QDET2