UXPA 2013 Annual Conference Wednesday July 10, 2013 4:30pm - 5:30pm ET by Paul Doncaster
Online five-second testing tools promise valuable response data that that can inform UX designs, but that value can be compromised by ignoring the restrictions of the method and designing the tests accordingly.
An analysis of more than 300 "crowdsourced" five-second tests showed that most tests are designed to encourage responses like "I don't recall" or "I cannot answer this".
From this analysis, rules and guidelines were developed and tested to increase the likelihood of obtaining useful data.
2. Agenda
The Method
The Analysis
The Rules
Special Topics
◦ Emotional Response
◦ Trust/Credibility
◦ Outside-the-Box uses
3. The Five-Second Test
Perfetti, C (2005). “5-Second Tests:
Measuring Your Site's Content Pages.”
http://www.uie.com/articles/five_second_test/Brief history
1. Present the focused task
2. Present the method instructions
3. Show the entire page
4. Participant recollection
5. Success verification
4. Now . . . a discount/rapid addition to
the UX toolkit
8. We would like you to rate
Quality, Professionalism and Credibility of
page 1 lowest 10 highest.
9.
10. We would like you to rate
Quality, Professionalism and Credibility of
page 1 lowest 10 highest.
Do you feel we are a credible training
organisation?
Do you want "further" information i.e. would you
Opt In?
Please tell us one Like and one Dislike of the page
11. Imagine you are engaged and browsing
websites about weddings.
12.
13. Imagine you are engaged and browsing
websites about weddings.
What in particular is this site about?
What is your strongest memory of the site?
Is the site trustworthy?
What is the free offer for?
Can you name the brand?
16. Imagine that you are looking for promotional
products.
What does the company do?
What do you think the purpose of this website is?
Does this website grab your attention?
How would you improve this site?
Does this design compel you to call [the Company
Name]?
19. You are evaluating companies to hire for
professional services.
What does this company do?
What word would you use to describe the design?
Would you hire this company?
20. N = 319 (Complete sample)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
* 25% were clearly using the wrong method
OK
24%
Level I Offender
28%
Level II Offender
30%
Level III Offender
18%
21. N = 239 (Modified sample)
0% 20% 40% 60% 80%
Poor Instructions
Image Size
Question Order
Unfocused
Elab. Answers
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
OK
30%
Level I Offender
30%
Level II Offender
27%
Level III Offender
13%
25. Five-Second Rule #2
Don’t use a five-second test when
another type of test is more
appropriate
◦ Reading the content
◦ Context is required
◦ Predict future behavior
◦ For Homepages, limit to Emotional Response
questions
◦ For Comparisons, limit to simple, singular
elements (e.g., logos, buttons, etc.)
26. Five-Second Rule #3
Best results come from knowing what
design aspect you want to test, and
focusing the test on that single aspect
◦ Is the page’s purpose evident?
◦ Are target(s) easily discerned/memorable?
◦ Does the design elicit the desired emotional
response?
◦ Does the design convey trust/credibility?
◦ Split elements into separate tests if needed
27. The Five-Second Rules
“Where are you going to click?”
“Imagine you're
researching software
vendors for your bank.”“You have 5 seconds to view
the image. Afterwards, you'll
be asked a few short
questions.” “You are going to
view a website.”
“Imagine you found the site
using a search engine.”
“Imagine you are standing on the
street and a bus drives past.
You see the advertisement on the
back.”
“Imagine that you are looking at
the following page…”
“What does this page do?”
28. Five-Second Rule #4
Give proper instructions
◦ Adequate table-setting
◦ Tailor instructions to the questions
◦ When instructions are general, don’t
expect retention of specifics
Use the online defaults with caution
34. The Five-Second Rules
No
answer
given
"Nothing"
or "I don't
know"
General
site
attribute
Specifi
c or
implied
target
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Item1 Item2 Item3 Item4 Item5
No
answer
given
"Nothing" or
"I don't know"
General
site
attribute
Specifi
c or
implied
target
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Item1 Item2 Item3 Item4 Item5
New Orleans (n=21) Chicago (n=21)
35. Five-Second Rule #6
Limit questions of specific recall to 2-3
◦ Expect -20% memory impact for each
question asked
◦ With 3 target questions, ~75% likelihood
of a “full” response
38. Five-Second Rule #7
Order the questions with care
◦ Most important if the test focuses on or contains
targets
Critical target(s) early, or focus the test accordingly
If you must do a mixed test, target identification
question(s) come first
39. The Five-Second Rules
“What articles do you think would
be valuable additions to this
knowledge base. Tell me anything
you can think of.”
“Would you find this
product useful if it
successfully
aggregated top news
and content? (and
obviously was much
better designed)”
“Is the product/service
offering clear?”
“Describe the website
in a couple of words.”
“Does the site look
professional?”
Wording
40. Five-Second Rule #8
Craft the questions with care
◦ Remember, the recall clock is ticking . . .
◦ Specific is not always better than abstract
◦ Use devices (scales, etc.) judiciously to
get more robust data
42. Five-Second Rule #9
Consider very carefully the
“Most Prominent Element” question
◦ If you use it, put it toward the end
43. The Five-Second Rules
“If you could change one thing
about the page, what would it be?”
“What changes
would you make to
the layout or design
of this page?”
“What would you change
to improve the site?”
“What would you change
about the design (if
anything)?”“Any suggestions?”
44. Five-Second Rule #10
Don’t ask “What would you change?”
◦ Improper question for the method
55. Trust / Credibility
Fogg, B.J. et. al. (2003). How do people evaluate the
credibility of Web sites? Proceedings of
DUX2003, Designing for User Experience Conference.
http://www.consumerwebwatch.org/pdfs/stanfordPTL.pdf
Participants were given as much time as they wanted on sites to
gauge credibility
In the end, 46.1% noted "Design Look" in their comments
“Our result is consonant with findings of other research (Cockburn and
McKenzie, 2001) that describes typical Web-navigation behavior as “rapidly
interactive,” meaning that Web users typically spend small amounts of time at any
given page, moving from page to page quickly. If such rapid navigation is indeed the
norm for most types of Web use, then it makes sense that Web users have developed
efficient strategies, such as focusing on the design look, for evaluating whether a Web
site is worthwhile . . .
In other words, the visual design may be the first test of a site’s credibility. If it
fails on this criterion, web users are likely to abandon the site and seek other sources
of information and services."
56. Trust / Credibility
Standardized Universal Percentile Rank
Questionnaire
◦ “I feel comfortable purchasing from this website.”
◦ “This website keeps the promises it makes to me.”
◦ “I can count on the information I get on this website.”
◦ “I feel confident conducting business with this website.”
◦ “The information on this website is valuable.”
http://www.suprq.com/
57. Trust / Credibility
The design reflects a company that is
dependable and responsive.
The design reflects a company that is honest
and ethical.
The design reflects a company that connects
with the wants and needs of its customers.
How does the look and feel of this page
design represent the company?
58. Trust / Credibility
Agree
Agree Agree
No Opinion
No Opinion No Opinion
Disagree
Disagree
Disagree
0%
20%
40%
60%
80%
100%
Dependible/Responsive Honest/Ethical Wants/Needs of
Customers
Agree
Agree
Agree
No Opinion
No Opinion
No Opinion
Disagree
Disagree Disagree
0%
20%
40%
60%
80%
100%
Dependible/Responsive Honest/Ethical Wants/Needs of
Customers
60. Emotional Response:
Triading
Hawley, M. (2007). The Repertory Grid: Eliciting User Experience Comparisons in the
Customer’s Voice. Available online at
http://www.uxmatters.com/mt/archives/2007/12/the-repertory-grid-eliciting-user-
experience-comparisons-in-the-customers-voice.php
Hawley, M. (2010). Rapid Desirability Testing: A Case Study. Available online at
http://www.uxmatters.com/mt/archives/2010/02/rapid-desirability-testing-a-case-
study.php
62. Triading
Consistent with
triading
Consistent with
triading
Pass / Don't
know, 10% Pass / Don't
know, 10%
Not consistent
Not consistent
0%
20%
40%
60%
80%
100%
• “The gradient used in C makes it
look modern.”
• “A is easily recognizable (a
scorpion -- rather than a wave).”
• “Logo A felt very slick, and
embodied the values visually . . .”
• “A was not good - looked
outdated.”
• “B - the red is over-used.”
• “Logo B was a little less
stylized, and the very literal
scorpion imagery didn't support
the brand as well.”
Q1: Particularly Good Q2: Particularly Bad
63. You're wandering on the floor of a trade show
and happen to notice this booth/display.
64.
65. You're wandering on the floor of a trade show
and happen to notice this booth/display.
What type of product is being sold?
What type of person (or group of
people) is this product targeted to?
Given your responses to #1 and #2,
is the display's design appropriate for
the intended customer?
Given your responses to #1 and #2,
is the display's design visually
appealing?
What is the brand name of the
67. Thanks for staying awake.
Questions?
Blog: “UX Five Second Rules” Coming Soon
Website: Coming Soon!
Email: pwdoncaster@hotmail.com
Twitter: @pwdoncaster
Notas del editor
How did this come about?Quick need for feedback on strictly visual elements – icon metaphors -- from outside of the buildingExplain karma points on Usabilityhub.com for my own Click or Nav testsIn order to get the points, I contributed to a lot of others’ tests, mostly 5 sec testsSOOO many times I found myself hitting PASS on the question, or “impossible for me to answer that based on 5 seconds worth of exposure to an imageGot me very curious about the method and the impact that these types of online tools and technologies are having on itThe result is what we’re going to talk about now . . .
The method itself is about as simple as UX testing can possibly get . . . courtesy of Christine Perfetti: Partnership with Fidelity (around 2000) – Jared Spool, Tom Tullis, Harry Hirsch To solve a specific problem: Measuring whether a content page working or not (as opposed to a Home Page) Answering the core question: “Is the primary purpose of the page evident?”1. Present the Focused task: Set the table with appropriate context2. Present the Method instructions: Tell them we'll only display it for 5 seconds, and ask them to try to remember everything they see in this short period.3. Show the entire page for 5 seconds, then remove it 4. Participant Recollection: Participant writes down everything they remember about the page. 5. Success Verification: Ask two useful questions to assess whether users accomplished the task Never a stand-alone decision-maker type of method -- Always used it as part of a larger study, would throw in a few of these Always used it true to the original intent, never broadened the use into other areas ALWAYS moderated (with unmoderated (using the free tools), you don’t know what you’re getting)
It’s up to us as professionals to know what we’re doing BUT over 210,000 tests were completed in 2012 -- Approximately half of all tests are Five Second Teststhe cat’s out – they’re here, they’re free, get used to itAND they’re positioning themselves beyond content pages (look at the description on fivesecondtest)AND I’m not at all convinced that expanding the method with these tools in unmoderated circumstances is without valueSO . . . My intent is to focus on the method within this context: How to maximize the value of the method using these unmoderated tools
Taking 5 second tests for Karma pointsCopy/paste the instructionsUse SnagIt to grab the image at 1024x768 (no scrolling) and save, -- RULE – I did not scroll the image; if it was too big, I answered based on what was visible to meAfter saving the image, view in the SnagIt capture window for 5 seconds (to give some value back to the designer)At each question, copy/paste into spreadsheet before answeringOnce I had a sizable sample, review and analyze the instructions, wording, order, intentObviously with all of this activity, my STM was impacted such that answers were probably lesser than they would have been otherwise -- so to anyone in this room who may have contributed tests between May and August of 2012, my apologies for giving you “tainted” feedback.
UPFRONT – -- Purposes of illustration and (let’s face it) entertainment, I focus on bad ones-- All are Homepages (opp of the original intent) – I think can be OK, if you (a) understand the limits of the method, and (b) are asking the right questions-- I did not scroll the image
!! No alignment with instructions !! - q1 has cruddy wording - q2 is context specific - q3 too much info requested by a single question, plus recall of specifics is diminished and scrolling impacts
Q1 is answered by the instructions – you’ve just wasted your most critical question positionQ2 Q3 is trustworthinessQ4 can’t see it without scrolling -- even if we were able to see it, we’ve likely forgotten by nowQ5 is a target ID question -- combine question order and design of site, it’s almost guaranteed to be IDK
Image requires mega scrollingQ1 & Q2 answer is provided in the questionQ3 is weakly wordedQ4 can’t be reliably answered with big image and 5 secsQ5 huh?
Yes or no -- Does anybody recognize it? (CONSIDER -- how would your response be affected if you did recognize it?)- Q1 not worded well – how many are inclined to say “provides professional services”?- Q3 a fair question? Unmoderated test, AND after only 5 secs?
Level III = 4+ violations, OK = 1 or less violations6% (18) “no violations” – but typically extremely simple, highly focused, 1 question
Note the nature of the offences
Half are violating the original intent of the method by focusing on a Home/Welcome page almost 40% are using 5 questions – nothing wrong with that, but remember the reverse polaroid; you’ve got to be aware of what you should and shouldn’t be asking with that let’s move onto the rules . . .
Not talking about including the ridiculous or inappropriate question: Quicentera site: “What is the quality of the dresses sold here?” -- although that’s definitely wrong I’m talking about respecting the method, taking the time to care for structure of test and its components – which is controllable and can not only help/hurt your test, but can also stretch the limits of what you’re able to do with the method online
Reading the content = how much can you reasonably read in 5 secs?Appropriateness = any question necessitating knowledge of context cannot be consideredPredict future behavior = NO test is good for this; PLUS it “reeks” of an attempt to brown-nose the stakeholdersHomepages = cannot ask “what is the purpose of the page”
Nuns taught me that the first 3 of the Ten Commandments are paramount– all the others support them. The same applies here.While Sister Rodriguez be displeased at my agnosticism (no, she’d be OK with it – Sister Maralia would beat my ass), I hope they take some small comfort in knowing that that little lesson took hold.
TEST INSTRUCTIONSCompare Imagine “researching software” v. “standing on a street”Point to Favorite: “Well, my wish is about to come true . . .”
-- Write the questions first-- Lennon clause – I won’t say “DON’T imagine there’s no heaven” – but when going broad, don’t force the user to “imagine” too narrowly
IMAGE SIZECog tool – how much of the 5 secs is taken up scrollingIt doesn’t take genius to understand that by the time you hone on the scrolling mechanism, move the mouse, click and hold, actually scroll, etc., a good amount of cognitive processing has been spent without ever having retained anything about the image – even abstract questions become that much more difficult to answer
Experiment: does scrolling influence ability to recall a target?Large Target, Gomu Big Scroll v Beados No Scroll[Default instructions?]Question = “What is the name of the brand featured on the page?”Prod brand and Moose were acceptable – Focus =
Large image increases the likelihood of breaking the golden rule
I would theorize (but cannot prove yet) that even the indication that scrolling is required is a distractor and will impact the responses (Cog tool to show impact?)
HOW MANY QUESTIONS?recall Perfetti’s two phases – brain dump, then task success questionsEXPERIMENT: Is there any predictability to what the answers will be? Specifically, is there a drop-off in precision?5 things rememberedInstructions = “You will be asked to name 5 things that you remember about the page . . .”Questions = What is the first thing you remember, what is the second thing I remember, etc.
Plan on no more than 35% of testers to give you 5 decent answersAfter the 2nd question, decrease of about 20% for each question askedWould be interested in seeing if this aligns with Christine’s experience of in-person testing
ORDER OF QUESTIONSExperiment: Does order impact ability to recall?Main Target = Name of the siteSecTarget1 = Specific page (in navigation)SecTarget2 = Item for SaleSecTarget3 = Call to Action5th item was an opinion question – first 2 tests it was kept at the end, but in the final test I threw it in in the middle to see if it acted as a distractorInstructions: “After viewing the webpage, you will be asked to recall 5 five visual targets.”
Ability to correctly identify the site, order mattered . . .
Consider the precision of your intended responsesFor target recall, order is extremly importantFor attribute recall, it’s less importantFor Emotional Response, it’s less important still
WORDING OF QUESTIONS“Tell me everything” works when you’re being faithful to the original method, but not within these tools. It will almost guarantee “no response” on any subsequent questions -- Rewrite: “Name one type of article…” “Is product clear?” – Yes/No is good when the question is specific, like a target, but in this instance it’s vague – Rewrite: “What is the product or service offered?”“Does the site look professional” – Yes/No when dealing with opinions is OK but again, vauge; would benefit from a scale – Rewrite: Rate the site’s look (1= not professional, 10 = extremely professional”“Describe in a couple of words” could elicit “It stinks” (2 words) -- Rewrite: “Which two adjectives best describe the website?”
ELABORATE THIS POINT: conflicts with original intent, but the online/unmoderated seems to warrant different rulesUhub default questions: specific is better than general, “Did you see the free shipping offer?” A better alternative, abstract: “What do you remember about the page specific to shipping?” (answers will tell you whether “free” is remembered) A better alternative, specific: “
MOST PROMINENT ELEMENTOnly 7% of sample used this question, but it’s provided as a default question a lot 81% of instances in the sample put this as the #1 or #2 question – which is fine if the test has 1-2 questions, but most often that was not the case One test I looked at illustrated the danger well -- had only 2 questions: What’s most prominent? What’s most distracting? Vast majority of time, "prominent element" is a throwaway question; unless the design uses a pattern containing equalized visual presentation, the answer will most likely be dictated by whatever wins the Gestalt sweepstakes (color, size, etc.), especially if photos are used, like here . . .
if you feel like you have to include it, put it to the end; it will force the user to use any available STM, and the noted element will truly be prominent- “what did you notice first?” might be better wording Carlin – “10 is a psychologically satisfying number”, so . . .
Change RecommendationsOnly 6% of sample contained this type of inquiry, but it’s a natural sort of “wrap-up” question: “I’ll assume you’ve noticed something you don’t like, so help me improve it”In all of these, you’re asking for something beyond the scope of what can be reasonably expected, given 5 seconds of exposure to an image
A question just begging for “I don’t know” or “No idea” or “Not enough time” answer
The title of a frequently cited Canadian study from 2006 says it all: “You have 50 milliseconds to make a first impression.” A 2010 eye-tracking study suggests a slightly more generous -- but no less onerous -- 180 milliseconds. With heavy reliance on sensory input, method seems especially suited to measuring reactions to designs. Most of sample had some element of emotional response inquiry, but in highly unfocused and ultimately ineffective ways- Like/dislike-- how does the designfeel? (open ended or scale)
If you’ve looked at the UXPA conference site, you’ll see Mike pointing at a word cloudHe describes a case study where the ER measurements involved combination of methods – IBM cards, then rating scales – to confirm that the design conveys the corporate valuesPair of experiments the employ this approach within the 3-second context
Test 1 = FinancialAssumed values – Professional, Clear, Stimulating, ReassuringData1 – Descriptors “What two words would you use to describe the appearance of this site?” (note the wording)Data2 - Targeted attributes = Professional/Amateurish, Clear/Confusing, Stimulating/Dull, Reassuring/Intimidating
Data1 = Word Cloud 2 descriptors = 43%, 2-word phrase (stock Photos, Old school, looks OK) = 33%, single descriptor = 24%
Data2 = Scales for targeted attributes consider that it's a financial firm; dull might be a good thing, as long as its reassuring You wind up with (1) confirmation that you’re hitting at least one of your target attributes (professional), and (2) some detailed guidance on what aspects of emotional response need tweaking- others might be "stable“ v. “volatile”, “conservative“ v. “high-risk”, etc. just as effective might be focusing on what you do NOT want to be
Test 2 = Health ClubImprovement goal = cut down on 2 descriptor phrases, more instances of 2 descriptorsData1 - What two adjectives would you use to describe the general appearance of this site?Data2 - Targeted attributes = Professional, Easy-Going, Modest, Reassuring
Data1 = Word cloud- Change in wording = Instances of 2-descriptors jumped to 95% (from ~45%)
Data2 = Scales for targeted attributes
We clearly are concerned that our designs project trustworthiness and credibility, and that concern is borne out by industry experts. So it should come as no surprise that 26% of the 5-second tests I analyzed included at least one question relating to trust – either explicitly . . .“Do you trust this website?”“Is this website trustworthy?”“Would you trust this website with your email address/credit card information?”or by inference:“Would you recognize this site as an authority?”“Based on first impressions, would you feel confident buying products from this company?”)
Even with all other factors outside of the control,- Perceived Motive- Name Recognition and Reputation - Advertising - Past Experience with Site Affiliations 46% said that design was a huge factorBUT that doesn’t mean it can’t be a piece of the puzzle
4 experiments – 2 online shopping, 2 Antiques/memorabilia 5 secs wasn’t necessarily an impediment to measuring trust in some way (only 13% said that 5 secs is too short) uncontrollable variables (reputation or lack, no previous experience with site, perceived motive) were largely in play -- 35%- Comfort does not imply/necessitate deep knowledge or interaction with the company; 5 seconds could be long enough to get a "gut feeling" - Keeping promises requires an interaction/transactions; only those with prior experience with this can answer this question - Counting on information requires some familiarity with the material provided, which cannot be determined in 5 seconds without prior experience - Confidence could imply/necessitate deep knowledge or interaction with the company; 5 seconds is likely long enough to get a "gut feeling" Information is valuable necessitates reading, not in 5 secs
2 more tests – 1 financial services, 1 moving/relocation companyAgree/disagree/no opinion responses Note the wording – Not “The company is . . .” that wording puts the focus on the company and increases the uncontrollable variablesOn 2nd test – I added “Based on the design, I would consider contacting this company to get a quote.”Goals Measurable numbers based on comfort and confidence Low PASS/IDK responses
2nd test “get quote” – 67% YES, 29% NO, only 1 passCan be more fully elaborated … perhaps a 1-10 scale would be better than Agree/DisagreeBeats the hell out of “Is this company trustworthy?”
TriadingCompare 3 items, talk about how one is different from the othersViolates the principle of “do not compare” but I was curious as to whether that is nullified by presenting simple/isolated design elementsGoal = see if I can get responses that satisfy the core goals of triadingDoes the response clearly specify a single logo option? Does the response provide an attribute?BOTH criteria are requiredTwo test instances (similar iteration, fine-tuning of the questions, instructions, etc.)
INSTRUCTIONS You will be asked questions about the ways three different logo designs compare to each other. YOU WILL NOT BE ASKED TO PICK A WINNER. NOTE: The company's values include "passion," "performance" and the "heat of competition."QUESTIONSBriefly state how one of the logos is particularly good or pleasing in a way that the other two are not.Briefly state how one of the logos is particularly bad in a way that the other two are not.
(response from Hawley) – seems to be some promise here as an abbreviated use of the method1. would try to see whether "different" instead of good or bad (less emphasis on "picking a winner" -- especially important in 5 secs, unmoderated)2. Only works with 33. appropriate for logos and small images or art work, but they will question it for larger sites or brands/experiences
You're wandering on the floor of a trade show and happen to notice this booth/display.
Ideally you’d do separate tests – First 2 call for specific recall of what was seen, next 2 are emotional response questions -- + a target recall at the end to see if the name is memorable