An invited seminar on using surveys to understand security and privacy behavior. The talk covers best practices for writing survey questions, analysis approaches, and a little bit of behavioral economics experimental design.
Given at the Ruhr University SecHuman Summer School by Elissa Redmiles.
This talk is accompanied by a companion handbook: https://drum.lib.umd.edu/bitstream/handle/1903/19227/CS-TR-5055.pdf
Contact eredmiles@cs.umd.edu
3. What is a Survey?
Set of questions
assessing different
constructs
Questions can be open
ended: respondent
inputs any answer
Or close ended: offer
answer choices and the
respondent selects one
or more of them
“identifying a specific group or category of people and collecting
information from some of them in order to gain insight into what the
entire group does or thinks”
Handbook of International Survey Methodology
5. Who Should Answer?
• So you select a sample
Rarely survey the whole population
• such that your results are generalizable (with some limitations)
Sample should be representative of your population
6. Ingredients of a Survey
Research Questions (What Do I Want to Know)?
Constructs (What Do I Need to Measure to Answer RQs)?
Questions (How Can I Design Valid Questions to Measure My Constructs?)
Sample (Who Can I Get To Take My Survey Given My Resources?)
Analysis (How Can I Answer My Research Question?)
8. Word Choice Matters
The word ‘usually’ was interpreted in 24 different ways
by one study’s participants
Variation creates data that’s hard to compare
Even worse for security and privacy surveys because
these often involve domain-specific / technical language
Research indicates that respondents often ignore
written definitions provided with questions
9. Likert Scales
Used to assess nuanced feelings e.g., agreement
Good scales are between 4 and 10 points.
Even scales elicit stronger responses (no neutral option)
Scales should always be balanced
10. Double Barreled Questions
“Do you believe that your employer should require you to update
your computer and change your password every six months?”
Requires respondents to provide a single answer about both:
a requirement to update their computer
a requirement to change their passwords
11. Social Desirability Bias
Pressure to Say the "Right"
Answer
• If you’re asking questions about a tool
you built, they may feel pressure to be
positive
• If you’re asking them questions that they
think have a correct answer or a
societally correct answer, they may feel
pressure to respond in a certain way
Mitigation: Softening Wording
• People have many different practices
when it comes to updating their
computers. Which of the following most
closely matches what you do …
12. Order Bias
Ordering of questions or
answer choices changes
responses
• Online people pick the top answer
choice most often
• On the phone they pick the last
choice
So, randomize that bias!
13. Demographic Questions &
Stereotype Threat
But don’t randomize the order in which demographic questions appear
Questions affect the answers to other questions
Research shows that asking women and minorities about their demographics
before asking them math questions makes them perform worse
Security and privacy questions are likely to have similar stereotype threat
Rule of thumb: ask about demographics at the very end
14. Length & Cognitive Load
The longer the survey the worse the quality of the
answers & the lower the response rates
20 minutes maximum is a good rule of thumb
Also think about how hard the questions are
19. Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with
20.
21.
22. Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with
Expert Reviews
• Recruit ~3 survey methodology experts to review the survey
• Your library or statistics department may offer these services
Piloting (BEWARE)
• Run a small sample of the survey
• This provides the least information, just because you get answers you like
doesn’t mean they are accurate!
24. Ethics!
Don't Know
• You should offer respondents the option to say “prefer
not to answer” or “don’t know” for required questions
• Otherwise, don’t require the question!
Code of
Conduct
• Requiring without this option is a violation of
American Association for Public Opinion Research
code of ethics
• AND explicitly disallowed by many Institutional
Review Boards
31. Instructions
Form groups
Select a research question:
Do people in intimate relationships (e.g., married couples)
share accounts? What do they share and why?
Do people with different educational backgrounds do
different security behaviors?
Do people understand what end to end encryption means?
Design a 5 question survey + demographic questions
Cognitive interview 3 people in the room and revise your
questions
33. • Good for low tech populations, rarely usedPaper
• Good for low tech populations
• Allows for “probabilistic” sample
• CATI: computer assisted telephone
interviewing
• …many more variations
Phone
• Often used in security, privacy, HCI studies
• Highest non-response rate of any mode
• Cheapest
Web
34. Survey Sampling
•So you select a sample
Rarely survey the whole population
•such that your results are generalizable (with some limitations)
Sample should be representative of your population
35. Types of Samples
Probabilistic
• (only possible with phone or paper)
Nearly probabilistic
• GFK Knowledge Panel
Census representative, non-probability
• SSI, Qualtrics
• Google Consumer Surveys
Crowdsourced samples
• Prolific, Amazon Mechanical Turk, Crowdflower
Convenience or Snowball Samples
• Posting on social media, asking friends to take your survey
Cost
36. How Well Do My Results Generalize?
Comparing Security and Privacy Survey Results
from MTurk and Web Panels to the U.S.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
go.umd.edu/sampleComparison
38. Four Sets of Questions
Internet Behavior
Information Sources: Online Protection
Knowledge: Protective Behaviors
Negative Experiences
39. • Do you ever use the internet to...?
• Use social media such as Facebook, Twitter, or Instagram
• Apply for a job
• Apply for government benefits or assistance
• Apply for a loan or cash advance
• Search for sensitive health information
• Buy a product, such as books, toys, music, or clothing
Internet Behavior
40. Internet Behavior
• To which of the following have you turned to for advice about how
to protect your personal information online?
• Friend or Peer
• Family Member
• Co-worker
• Librarian or resource at library
• Government website
• Website run by a private organization
• Teacher
Information Sources: Online Protection
41. Internet Behavior
Information Sources: Online Protection
• Do you feel as though you already know enough about...?
• Choosing strong passwords to protect your online accounts
• Managing privacy settings for the information you share online
• Understanding the privacy policies of the websites and
applications you use
• Protecting the security of your devices when using public WiFi
networks
• Protecting your computer or mobile devices from viruses and
malware
• Avoiding online scams and fraudulent requests for your personal
information
Knowledge: Protective Behaviors
42. Internet Behavior
Information Sources: Online Protection
Knowledge: Protective Behaviors
• As far as you know have you ever...?
• Had important personal information stolen such as your Social
Security Number, your credit card, or bank account information?
• Had inaccurate information show up in your credit report?
• Had an email or social networking account of yours
compromised or taken over without your permission by
someone else?
• Been the victim of an online scam and lost money?
• Experienced persistent and unwanted contact from someone
online?
• Lost a job opportunity or educational opportunity because of
something that was posted online?
• Experienced trouble in a relationship or friendship because of
something that was posted online?
• Had someone post something about you online that you didn't
want shared?
Negative Experiences
43. Both web samples systematically over-report online behavior
Census-rep. web pane systematically over-reports experiences
44. Both web samples over-report getting advice from websites
Web samples (esp. census rep) under-report knowledge/confidence
48. Open-Answer Survey Analyses
Qualitative open coding
Example approach:
select 10% of the responses, two researchers go
through and create a “codebook” (set of categories) that
represent the responses
Two researchers code the remaining responses
separately and then compute an intercoder agreement
metric(e.g., Krippendorf’s Alpha, Cohen’s Kappa)
49. Testing the relationship between a group of dependent variables and one
independent variable
What factors are (cor)related to whether people back up their computer?
How well can I predict whether people will backup their computers
based factors 1, 2, and 3?
Backup ~ factor1+factor2+factor3
Logistic Regression Analysis
Compare the responses of two or more groups
Do people who are A say X more than people who are B?
Hypothesis test (X2 test, t-test)
Closed-Answer Survey Analyses
50. Sample Sizes
Power analyses are often used to determine if your sample will be
big enough to see effects (e.g., differences in the data)
Different samples needed to answer different questions, typically
~500 is sufficient for most analyses
Regression Participants
5 variables, small effect 641
20 variables, small effect 1043
5 variables, medium effect 85
20 variables, medium effect 135
Correlation Participant
s
r=0.1 782
r=0.3 84
r=0.5 28
52. Asking for a Friend:
Evaluating Response
Biases in Security User
Studies
Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., and Mazurek, M.L.
To appear at CCS2018
53. The Question
When people self-report estimated
security behavior on a survey,
do the population estimates
match real life?
54. Two Sets of Data
Log Data
• Symantec
Antivirus host
records
• Responses to
software update
prompts
• System variables:
crashes, etc.
Survey Data
• Responses to the
same update
prompts
• Collect the same
"system" variables
55. The Question
Imagine that you see the message
below appear on your computer.
Would you install the update?
• Yes, the first time I saw this message.
• Yes, within a week of seeing this
message.
• Yes, within a few weeks of seeing this
message.
• Yes, within a few months of seeing this
message.
• No.
• I don’t know.
56. The Answer
• The speed with which people say they would update matches the
real world, but with a systematic bias:
• when asked about themselves, people overestimate by 2
points of frequency
• when asked about their friends they overestimate by 1 point of
frequency
• The text of the message matters in real life, but not in the survey
• Multiple hypotheses:
• Insufficient incentives to affect response
• Insufficient attention in surveys to notice specific text
• Insufficient context for the text to be relevant
57. Now That We (Maybe)
Believe The Surveys,
What Can We DO with Them?
58. Case Study 1:
Where is the Digital Divide? A Survey of Security, Privacy, and Socioeconomics.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
Published @ CHI2017
75. Case Study 2:
Human Perceptions of Fairness in Algorithmic Decision Making:
A Case Study of Criminal Risk Prediction.
Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller
Published @ WWW2018
76. Algorithmic Decision Making
Algorithms help people make decisions about
Hiring
Assigning social benefits
Granting bail
Human
decision
making
ML
Algorithmic
decision
making
Are these algorithms fair?
78. Is it Fair to Use a Feature?
Normative
• Prescribe how fair
decisions ought to be
made
• Anti-discrimination
laws
• Sensitive (race,
gender) vs non-
sensitive features
Descriptive
• Describe human
perceptions of
fairness
• Beyond
discrimination?
• Volitional (Father’s
history)
• Relevant
(Education)
• Reliable
• …
79. Case Study: COMPAS
Defendant’s answers to the
COMPAS questionnaire
• Current charge
• Family criminal history
• Performance in School
• Nothing Legally Sensitive
81. How are People Making Fairness
Judgments?
How do we determine if a feature is fair to be used?
Example
Many believe it is not fair to assign bail based on the
Criminal history of family and friends
Why?
A person is responsible only for their voluntary choices
Philosophical arguments on luck egalitarianism
Is this feature volitional?
82. How are People Making Fairness
Judgments?
How do we determine if a feature is fair to be used?
Example
Many believe it is not fair to assign bail based on usual
Grades in high-school
Why?
In the legal domain
Evidence is admissible only if it is relevant
Is this feature relevant?
84. Survey 2:
Do People Use Properties
74%
43% 41%
27% 23% 21% 17% 15%
3%
0%
25%
50%
75%
100%
UsageFrequency
85. We can predict fairness judgments with 88% accuracy
Analysis 2:
Predict Fairness from Properties?
Fairness of
Using the Feature
Reliable?
Relevant?
Private?
Volitional?
…
86. Do people reach consensus in their
fairness judgments?
0
0.25
0.5
0.75
1
Consensus
87. Disagreements on Latent Properties
Cause Disagreements on Fairness
Low
consensus
Consensus correlated with
fairness consensus
Reliable?
Relevant?
Private?
Volitional?
Causes Outcome?
Causes Vicious Cycle?
Causes Disparity in
Outcomes?
Caused by Sensitive
Group Membership?
Fairness of
Using the Feature
Causal
Properties
89. Does the Method Support the
Conclusion?
1. The authors conducted a survey on MTurk in order to explore sharing
behaviors. They conclude that socioeconomics (income, education) do not
affect sharing practices.
90. Does the Method Support the
Conclusion?
2. The authors are studying why people avoid using 2FA. Their survey design
contained all closed-answer questions. All questions were required and
none offered an “I don’t know” or other answer choice. They conclude that
people do not use 2FA because they are afraid of being locked out of their
accounts.
91. Does the Method Support the
Conclusion?
3. The authors observed how perception of data breach incidents changed
over time by conducting surveys using SSI at yearly intervals –
administering the same survey questionnaire each time (but asking about
different incidents from the past year). They concluded that people have
become fatigued to data breaches and are exhibiting increasingly weak
reactions.
94. Behavioral Economics
Method of observing decision-making in controlled
experiments with economic incentives
Unlike traditional economics: don’t assume that the user is
rational
95. Case Study 1:
What Is Privacy Worth?
Alessandro Acquisti, Leslie K. John, and George Loewenstein
In The Journal of Legal Studies
96. Experimental Design
Mall intercepts
Subjects offered visa gift cards for taking a survey
$10 gift card r “name will not be linked to the transactions
completed with this card.”
$12 gift card “name will be linked to the transactions completed
with this card.”
Alessandro Acquisti, Leslie K. John, and George Loewenstein
97. Experimental Design (cont.)
Four conditions:
$10 endowed: Keep the anonymous $10 card or exchange it for
an identified $12 card.
$12 endowed: Keep the identified $12 card or exchange it for an
anonymous $10 card.
$10 choice: Choose between an anonymous $10 card and an
identified $12 card.
$12 choice: Choose between an identified $12 card and an
anonymous $10 card.
$10 endowed condition, implicit choice to sell privacy for $2
$12 endowed condition implicit choice to pay $2 for privacy
Alessandro Acquisti, Leslie K. John, and George Loewenstein
98. Results
Less than half of the people who started with $10 were willing
to give up privacy for $2
Yet, less than 10% of people who started with $12 were willing
to pay $2 for more privacy
People’s willingness to give up privacy has been used to argue
for low privacy regulation
Yet, companies may just be playing framing tricks
Alessandro Acquisti, Leslie K. John, and George Loewenstein
99. Case Study 2:
Dancing Pigs or Externalities?
Measuring the Rationality of
Security Decisions
Elissa M. Redmiles, Michelle L. Mazurek, and John P. Dickerson
Appeared at EC2018
100. Theories of Security Behavior
Elissa Redmiles
The user's going to
pick dancing
pigs over security every
time. –Bruce Schneier
The user rationally ignores
security advice because the
costs outweigh the risk. --
Herley, 2009
boundedly rational security actor
with predictable and consistent,
but not always utility-optimal, behavior
based on risks and costs
CanWe
Prove?
101. Behavioral Economics Experimental System
Behavioral economics experiment
Amazon Mechnical Turk (Crowd Worker) participants
Online experimental system
Simulating a bank account
Make a security choice: enable/don’t enable 2FA
2FA was SMS-based
102. Measurement System
Elissa Redmiles
Create Account
on bank.cs
Learn risk of
hacking (H)
Learn protection
offered by 2FA (P)
Make 2FA
Decision
Log in to system
regularly
P = 50% or 90%
H = 1%, 20%,
50%
You begin the study with $ 1 in your bank account. Each time you login (at most once per
day )
day) you will earn an additional $1.
103. Variables Measured
Elissa Redmiles
Demographics: Gender, Age, Education
Security Decision: Enable/Don’t Enable 2FA
Password Strength: measured with neural net pwd. meter [Ur et al.]
Signup & Login Times: measured with tab in focus seconds
Security Behavior Intention [Egelman et al.]
Internet Skill [Hargittai & Hsieh]
104. Experiment
Elissa Redmiles
Round 1
5 days, up to $5
Round 2
5 days, up to $5
150
Break
5 days
H=1%, P=50%, Endow | Earn
H=1%, P=90% Endow
H=20%, P=50%, Endow | Earn
H=50%, P=50%, Endow
H=50%, P=90% Endow | Earn
107125
105. When given the choice, people leverage
cost and risk in “reasonable” directions
[but lots of influence from anchoring effects]
Lots of anchoring effects
(explains 35% of variance in security behavior we observe)
Also:
costs (time to login)
risks (chance of hacking, amount of protection offered by 2FA)
endowment effects
(an additional 26% of behavior variance)
What We Learned About
Human Decision-Makers (in Security)
106. We Can Measure Rationality
Elissa Redmiles
Cost is defined as wage-earning time loss
Utility of 2FA is defined the $$$ savings if a hack occurred
Rational 2FA use is when the utility of the users’ choice is greater than the cost
48% rational in Round 1
56% rational in Round 2 Significant, medium sized learning effect
107. This Was But a Brief Overview
You can also incentivize survey participants to respond in
ways that agree with their peers (peer prediction)
You can do other kinds of experiences (e.g., where people
actually buy sensitive products online and have choices of
where to buy from: private or not)
But: remember that the world isn’t just about money. Money
can only proxy for certain losses, etc. but it can help “up the
ante” on creating ecological validity in some cases
108. Thank you!
Elissa Redmiles: eredmiles@cs.umd.edu
UMaryland / UMichigan joint survey methodology program
Summer Institute in Survey Research Techniques:
http://si.isr.umich.edu/overview
Notas del editor
fewer options can create bias by requiring the respondent to pick something that doesn’t quite fit, while too many options might render differences in responses meaningless.
Example of paper with
Snowball – a few uses sex worker example
Crowdsourced– cheap good for piloting, also good for longitudinal and other tasks
Demographically diverse optimal $1500 minimum for some places, usually around $3/response
GFK does probabilistic sampling but is web so you get nonresponse that’s not predictable
Probabilistic $1500 per item for 1000 answers, so around $20-80k per survey
Really depends on conditions and number of variables
Machine learning algorithms are increasingly being used to assist or even replace human decision making.
Decisions that used to be made by humans are nowadays often made with the help of machine learning based decision making systems.
For example, nowadays, people use algorithms to help them decide
If they should hire someone or not
If they should give someone social benefits or not
And even if they should grant someone bail or not
All of these decisions have immense impacts on human lives.
Because of that, we need to make sure that the algorithms that are making these decisions are fair.
Before specifying what we exactly mean when we say that algorithmic decision making systems need to be fair, first, let’s take a look at how a typical algorithmic decision making system works.
Let’s consider a system built to help judges decide if they should grant someone bail or not.
A set of inputs, in this case features about the defendant, is fed to a decision making system. The decision making system in this case might be a classifier, which in turn produces some outputs, for example a binary decision of “grant bail” or “deny bail”.
We see that we have several distinct parts in this decision making pipeline.
In this talk we’ll focus on the first part, and consider the fairness of the inputs.
More precisely, we consider the fairness of the features used as inputs, and try to understand and account for human perceptions on whether a feature is fair to be used in that decision making system, or not.
How do we answer that question? How do we determine if it is fair to use a feature or not?
One could take a normative approach, and try to proscribe how fair decisions ought to be made.
For example, one could refer to anti-discrimination laws, and say that it is prohibited to use sensitive features, such as race or gender. On the other hand, non-sensitive features can be used freely.
However, we can take an alternative approach.
Instead of proscribing which features can be used, we could try a descriptive approach, and describe human perceptions on which features are fair to be used.
We could ask people which features they think it’s fair to use, and see if we discover some interesting findings, to see if human perceptions of fairness of using features goes beyond the binary distinction between sensitive and non-sensitive features.
But, what else could make a feature be perceived as unfair to be used, except for it being sensitive?Let’s go back to our example of a system designed to help judges make bail decisions.
That classifier might be using the defendant’s criminal history as an input feature.
But what if it used the criminal history of the defendant’s father, a feature that is not volitional?
Or, what if it used information about the defendant’s education, which might not be relevant to the task of making bail decisions?Or what if it used features that are not reliable, or ones that are privacy sensitive?
How do we answer that question? How do we determine if it is fair to use a feature or not?
One could take a normative approach, and try to proscribe how fair decisions ought to be made.
For example, one could refer to anti-discrimination laws, and say that it is prohibited to use sensitive features, such as race or gender. On the other hand, non-sensitive features can be used freely.
However, we can take an alternative approach.
Instead of proscribing which features can be used, we could try a descriptive approach, and describe human perceptions on which features are fair to be used.
We could ask people which features they think it’s fair to use, and see if we discover some interesting findings, to see if human perceptions of fairness of using features goes beyond the binary distinction between sensitive and non-sensitive features.
But, what else could make a feature be perceived as unfair to be used, except for it being sensitive?Let’s go back to our example of a system designed to help judges make bail decisions.
That classifier might be using the defendant’s criminal history as an input feature.
But what if it used the criminal history of the defendant’s father, a feature that is not volitional?
Or, what if it used information about the defendant’s education, which might not be relevant to the task of making bail decisions?Or what if it used features that are not reliable, or ones that are privacy sensitive?
Well, we decided to look into what people think about using these features.
We conducted a series of surveys asking people to tell us how fair they believe it is to use these features.
We asked them to rate the fairness on a 7 point Likert scale, where 1 denotes that the feature is completely unfair, while 7 that it is completely fair.
Below, we see 10 groups of questions from the COMPAS questionnaire, ranging from the criminal history of the defendant, to their personality, and all the way to the criminal history of their family, and their education.
We asked 196 Amazon Mechanical Turk master workers from the US to rate the fairness of using these feature to grant bail.
On the y axis, we show the average fairness rating assigned to each of these features.
We see that many of the features, such as the previously mentioned education and family criminal history, are considered unfair to be used in this decision making scenario.
This leads to ask why are these features considered unfair to be used?
How are people making these judgments about the fairness of using features?
Are there any particular reasons that naturally come to mind when you think about why it is fair or unfair to use a feature?
Let’s take a look, for example at the feature about the criminal history of the defendant’s family. Many people believe it is not fair to assign bail based on this feature.
Why is that the case?
Well, it is possible that people believe that people should be held responsible only for their voluntary choices, but not penalized for their unchosen circumstances.
Such reasoning would be consistent with philosophical arguments on luck egalitarianism.
So, is this feature volitional?
Let’s take a look at another feature.
Many people believe it is not fair to assign bail based on the defendant’s grades in high-school.
Why might that be the case?
Well, in the legal domain, a piece of evidence is admissible only if it is relevant.
Is a person’s GPA relevant for the task of assigning bail?
Let’s take a look at another feature.
Many people believe it is not fair to assign bail based on the defendant’s grades in high-school.
Why might that be the case?
Well, in the legal domain, a piece of evidence is admissible only if it is relevant.
Is a person’s GPA relevant for the task of assigning bail?
Let’s take a look at another feature.
Many people believe it is not fair to assign bail based on the defendant’s grades in high-school.
Why might that be the case?
Well, in the legal domain, a piece of evidence is admissible only if it is relevant.
Is a person’s GPA relevant for the task of assigning bail?
Let’s take a look at another feature.
Many people believe it is not fair to assign bail based on the defendant’s grades in high-school.
Why might that be the case?
Well, in the legal domain, a piece of evidence is admissible only if it is relevant.
Is a person’s GPA relevant for the task of assigning bail?
The “can’t care less model”
The “security is always ineffective” model Herley: In his theoretical paper, he provides a cost-bene.t analysis
of having end-users follow security advice. He argues that threats are so rare – and suggested
behaviors so ineffective – as to make it logical for end users to never adopt security behaviors.
The boundedly rational model: proposed in privacy, many point-studies on warning fatigue, message design, specific behaviors allude to this but no one has defined/measured itWhy do we care to define and measure? Once we know what they’re paying attention to when making decisions/how they make those decisions we can start to adjust what they are doing by adjusting those parameters.
Explain MTurk (only US users): crowdsourcing service where you can hire people to do tasks for money, Whiter and more technical than other countries, limited to one place to somewhat limit variance in value of money
Participants -> Amazon Mechanical Turk crowd workers talk about using hourly wage
Remove realistic -- Online
$5 is worth an average of an hour of an Mturkers time
They needed to create an account in our experimental system and make a security choice. We made our system similar to the concept of a bank account (their study money was stored there) – will walk through system in a minute.
The security choice they were asked to make was whether to turn on 2FA. In prior work that we have done, we found that people’s understanding of 2FA is in the middle of the road between passwords and updating/antivirus. 2FA is an explicit optimal choice that people understand relatively well, so that is what we picked.
we used SMS for convenience here, even though it's not maximally secure (and used in many places)
they also read consent/short description in MTurk to clarify hacking
$5 is worth an average of an hour of an Mturkers time
They needed to create an account in our experimental system and make a security choice. We made our system similar to the concept of a bank account (their study money was stored there) – will walk through system in a minute.
The security choice they were asked to make was whether to turn on 2FA. In prior work that we have done, we found that people’s understanding of 2FA is in the middle of the road between passwords and updating/antivirus. 2FA is an explicit optimal choice that people understand relatively well, so that is what we picked.