Survey Methodology for Security and Privacy Researchers

Measurement Meets Usability:
Applying Survey and
Economic Methods to
Usable Security
Elissa M. Redmiles
@eredmil1
eredmiles@cs.umd.edu

Survey Methods for
Security and Privacy Research

What is a Survey?
Set of questions
assessing different
constructs
Questions can be open
ended: respondent
inputs any answer
Or close ended: offer
answer choices and the
respondent selects one
or more of them
“identifying a specific group or category of people and collecting
information from some of them in order to gain insight into what the
entire group does or thinks”
Handbook of International Survey Methodology

Who Should Answer?
• So you select a sample
Rarely survey the whole population
• such that your results are generalizable (with some limitations)
Sample should be representative of your population

Ingredients of a Survey
Research Questions (What Do I Want to Know)?
Constructs (What Do I Need to Measure to Answer RQs)?
Questions (How Can I Design Valid Questions to Measure My Constructs?)
Sample (Who Can I Get To Take My Survey Given My Resources?)
Analysis (How Can I Answer My Research Question?)

Word Choice Matters
The word ‘usually’ was interpreted in 24 different ways
by one study’s participants
Variation creates data that’s hard to compare
Even worse for security and privacy surveys because
these often involve domain-specific / technical language
Research indicates that respondents often ignore
written definitions provided with questions

Likert Scales
Used to assess nuanced feelings e.g., agreement
 Good scales are between 4 and 10 points.
 Even scales elicit stronger responses (no neutral option)
 Scales should always be balanced

Double Barreled Questions
“Do you believe that your employer should require you to update
your computer and change your password every six months?”
 Requires respondents to provide a single answer about both:
 a requirement to update their computer
 a requirement to change their passwords

Social Desirability Bias
Pressure to Say the "Right"
Answer
• If you’re asking questions about a tool
you built, they may feel pressure to be
positive
• If you’re asking them questions that they
think have a correct answer or a
societally correct answer, they may feel
pressure to respond in a certain way
Mitigation: Softening Wording
• People have many different practices
when it comes to updating their
computers. Which of the following most
closely matches what you do …

Order Bias
Ordering of questions or
answer choices changes
responses
• Online people pick the top answer
choice most often
• On the phone they pick the last
choice
So, randomize that bias!

Demographic Questions &
Stereotype Threat
But don’t randomize the order in which demographic questions appear
Questions affect the answers to other questions
Research shows that asking women and minorities about their demographics
before asking them math questions makes them perform worse
Security and privacy questions are likely to have similar stereotype threat
Rule of thumb: ask about demographics at the very end

Length & Cognitive Load
The longer the survey the worse the quality of the
answers & the lower the response rates
20 minutes maximum is a good rule of thumb
Also think about how hard the questions are

Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/

Automated Tools
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with

Automated Tools
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with
Expert Reviews
• Recruit ~3 survey methodology experts to review the survey
• Your library or statistics department may offer these services
Piloting (BEWARE)
• Run a small sample of the survey
• This provides the least information, just because you get answers you like
doesn’t mean they are accurate!

Even BetterUseExisting(validated)
Questions
Pew Surveys
Reason-Rupe Surveys
iPoll Database of
Survey Questions
Security and Privacy
Specific
SEBIS (Egleman and Peer
2015)
Westin Privacy Index (or
IUIPC)
Rader & Wash 2015
Web Use Skill Index (Hargittai
& Hsieh 2013)

Ethics!
Don't Know
• You should offer respondents the option to say “prefer
not to answer” or “don’t know” for required questions
• Otherwise, don’t require the question!
Code of
Conduct
• Requiring without this option is a violation of
American Association for Public Opinion Research
code of ethics
• AND explicitly disallowed by many Institutional
Review Boards

Exercise: Design &
Pre-test a survey

Instructions
 Form groups
 Select a research question:
Do people in intimate relationships (e.g., married couples)
share accounts? What do they share and why?
Do people with different educational backgrounds do
different security behaviors?
Do people understand what end to end encryption means?
 Design a 5 question survey + demographic questions
 Cognitive interview 3 people in the room and revise your
questions

Conducting the Survey
Mode and Sampling

• Good for low tech populations, rarely usedPaper
• Good for low tech populations
• Allows for “probabilistic” sample
• CATI: computer assisted telephone
interviewing
• …many more variations
Phone
• Often used in security, privacy, HCI studies
• Highest non-response rate of any mode
• Cheapest
Web

Survey Sampling
•So you select a sample
Rarely survey the whole population
•such that your results are generalizable (with some limitations)
Sample should be representative of your population

Types of Samples
Probabilistic
• (only possible with phone or paper)
Nearly probabilistic
• GFK Knowledge Panel
Census representative, non-probability
• SSI, Qualtrics
• Google Consumer Surveys
Crowdsourced samples
• Prolific, Amazon Mechanical Turk, Crowdflower
Convenience or Snowball Samples
• Posting on social media, asking friends to take your survey
Cost

How Well Do My Results Generalize?
Comparing Security and Privacy Survey Results
from MTurk and Web Panels to the U.S.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
go.umd.edu/sampleComparison

Three Samples
PSRAI
Mode: telephone
Type: probabilistic
(CI 2.7%)
n=3,000
Price: ~$80,000
SSI
Mode: Web
Type: Census-
representative
panel
n=428
Price: $1500
MTurk
Mode: Web
Type:
Crowdsourced
n=480
Price: $500

Four Sets of Questions
Internet Behavior
Information Sources: Online Protection
Knowledge: Protective Behaviors
Negative Experiences

• Do you ever use the internet to...?
• Use social media such as Facebook, Twitter, or Instagram
• Apply for a job
• Apply for government benefits or assistance
• Apply for a loan or cash advance
• Search for sensitive health information
• Buy a product, such as books, toys, music, or clothing
Internet Behavior

Internet Behavior
• To which of the following have you turned to for advice about how
to protect your personal information online?
• Friend or Peer
• Family Member
• Co-worker
• Librarian or resource at library
• Government website
• Website run by a private organization
• Teacher

Internet Behavior
• Do you feel as though you already know enough about...?
• Choosing strong passwords to protect your online accounts
• Managing privacy settings for the information you share online
• Understanding the privacy policies of the websites and
applications you use
• Protecting the security of your devices when using public WiFi
networks
• Protecting your computer or mobile devices from viruses and
malware
• Avoiding online scams and fraudulent requests for your personal
information

Internet Behavior
• As far as you know have you ever...?
• Had important personal information stolen such as your Social
Security Number, your credit card, or bank account information?
• Had inaccurate information show up in your credit report?
• Had an email or social networking account of yours
compromised or taken over without your permission by
someone else?
• Been the victim of an online scam and lost money?
• Experienced persistent and unwanted contact from someone
online?
• Lost a job opportunity or educational opportunity because of
something that was posted online?
• Experienced trouble in a relationship or friendship because of
something that was posted online?
• Had someone post something about you online that you didn't
want shared?
Negative Experiences

Both web samples systematically over-report online behavior
Census-rep. web pane systematically over-reports experiences

Both web samples over-report getting advice from websites
Web samples (esp. census rep) under-report knowledge/confidence

Crowdsource
Age 18-49
Education:
Some College+
Census Rep.
Web
Age 50+
Education: HS
or less
Different Populations, Different Samples

MTurk
most
generalizable
overall
Weighting on
demographics
helps slightly (from
13 / 28 differences
to 11/28)
Caveat ethically:
Your results are an upper / lower bound
based on high skill / awareness population
CANNOT draw conclusions re: socioeconomic variance

Open-Answer Survey Analyses
 Qualitative open coding
Example approach:
select 10% of the responses, two researchers go
through and create a “codebook” (set of categories) that
represent the responses
Two researchers code the remaining responses
separately and then compute an intercoder agreement
metric(e.g., Krippendorf’s Alpha, Cohen’s Kappa)

 Testing the relationship between a group of dependent variables and one
independent variable
What factors are (cor)related to whether people back up their computer?
How well can I predict whether people will backup their computers
based factors 1, 2, and 3?
Backup ~ factor1+factor2+factor3
Logistic Regression Analysis
 Compare the responses of two or more groups
Do people who are A say X more than people who are B?
Hypothesis test (X2 test, t-test)
Closed-Answer Survey Analyses

Sample Sizes
 Power analyses are often used to determine if your sample will be
big enough to see effects (e.g., differences in the data)
 Different samples needed to answer different questions, typically
~500 is sufficient for most analyses
Regression Participants
5 variables, small effect 641
20 variables, small effect 1043
5 variables, medium effect 85
20 variables, medium effect 135
Correlation Participant
s
r=0.1 782
r=0.3 84
r=0.5 28

That’s Nice, But I Don’t
Believe You

Asking for a Friend:
Evaluating Response
Biases in Security User
Studies
Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., and Mazurek, M.L.
To appear at CCS2018

The Question
When people self-report estimated
security behavior on a survey,
do the population estimates
match real life?

Two Sets of Data
Log Data
• Symantec
Antivirus host
records
• Responses to
software update
prompts
• System variables:
crashes, etc.
Survey Data
• Responses to the
same update
prompts
• Collect the same
"system" variables

The Question
Imagine that you see the message
below appear on your computer.
Would you install the update?
• Yes, the first time I saw this message.
• Yes, within a week of seeing this
message.
• Yes, within a few weeks of seeing this
message.
• Yes, within a few months of seeing this
message.
• No.
• I don’t know.

The Answer
• The speed with which people say they would update matches the
real world, but with a systematic bias:
• when asked about themselves, people overestimate by 2
points of frequency
• when asked about their friends they overestimate by 1 point of
frequency
• The text of the message matters in real life, but not in the survey
• Multiple hypotheses:
• Insufficient incentives to affect response
• Insufficient attention in surveys to notice specific text
• Insufficient context for the text to be relevant

Now That We (Maybe)
Believe The Surveys,
What Can We DO with Them?

Case Study 1:
Where is the Digital Divide? A Survey of Security, Privacy, and Socioeconomics.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
Published @ CHI2017

Case Study 2:
Human Perceptions of Fairness in Algorithmic Decision Making:
A Case Study of Criminal Risk Prediction.
Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller
Published @ WWW2018

Algorithmic Decision Making
 Algorithms help people make decisions about
Hiring
Assigning social benefits
Granting bail
Human
decision
making
ML
Algorithmic
decision
making
Are these algorithms fair?

Decision Making Pipeline
Decision
Making
System
OutputsInputs
Is it fair to use
a feature?

Is it Fair to Use a Feature?
Normative
• Prescribe how fair
decisions ought to be
made
• Anti-discrimination
laws
• Sensitive (race,
gender) vs non-
sensitive features
Descriptive
• Describe human
perceptions of
fairness
• Beyond
discrimination?
• Volitional (Father’s
history)
• Relevant
(Education)
• Reliable
• …

Case Study: COMPAS
Defendant’s answers to the
COMPAS questionnaire
• Current charge
• Family criminal history
• Performance in School
• Nothing Legally Sensitive

1
2
3
4
5
6
7
AverageFairness
Survey 1:
Do Judgements of Fairness Differ

How are People Making Fairness
Judgments?
 How do we determine if a feature is fair to be used?
 Example
Many believe it is not fair to assign bail based on the
Criminal history of family and friends
Why?
 A person is responsible only for their voluntary choices
Philosophical arguments on luck egalitarianism
 Is this feature volitional?

How are People Making Fairness
Judgments?
 How do we determine if a feature is fair to be used?
 Example
Many believe it is not fair to assign bail based on usual
Grades in high-school
Why?
 In the legal domain
Evidence is admissible only if it is relevant
 Is this feature relevant?

Hypothesis: Latent Properties
Reliable?
Relevant?
Private?
Volitional?
Causes Outcome?
Causes Vicious Cycle?
Causes Disparity in
Outcomes?
Caused by Sensitive
Group Membership?
Fairness of
Using the Feature

Survey 2:
Do People Use Properties
74%
43% 41%
27% 23% 21% 17% 15%
3%
0%
25%
50%
75%
100%
UsageFrequency

We can predict fairness judgments with 88% accuracy
Analysis 2:
Predict Fairness from Properties?
Fairness of
Using the Feature
Reliable?
Relevant?
Private?
Volitional?
…

Do people reach consensus in their
fairness judgments?
0
0.25
0.5
0.75
1
Consensus

Disagreements on Latent Properties
Cause Disagreements on Fairness
Low
consensus
Consensus correlated with
fairness consensus
Reliable?
Relevant?
Private?
Volitional?
Causes Outcome?
Causes Vicious Cycle?
Causes Disparity in
Outcomes?
Caused by Sensitive
Group Membership?
Fairness of
Using the Feature
Causal
Properties

Exercise: Can you draw that
conclusion?

Does the Method Support the
Conclusion?
1. The authors conducted a survey on MTurk in order to explore sharing
behaviors. They conclude that socioeconomics (income, education) do not
affect sharing practices.

Conclusion?
2. The authors are studying why people avoid using 2FA. Their survey design
contained all closed-answer questions. All questions were required and
none offered an “I don’t know” or other answer choice. They conclude that
people do not use 2FA because they are afraid of being locked out of their
accounts.

Conclusion?
3. The authors observed how perception of data breach incidents changed
over time by conducting surveys using SSI at yearly intervals –
administering the same survey questionnaire each time (but asking about
different incidents from the past year). They concluded that people have
become fatigued to data breaches and are exhibiting increasingly weak
reactions.

Economic Methods for
Security and Privacy Research

There’s More to Life
Question Asking

Behavioral Economics
 Method of observing decision-making in controlled
experiments with economic incentives
 Unlike traditional economics: don’t assume that the user is
rational

Case Study 1:
What Is Privacy Worth?
Alessandro Acquisti, Leslie K. John, and George Loewenstein
In The Journal of Legal Studies

Experimental Design
 Mall intercepts
 Subjects offered visa gift cards for taking a survey
$10 gift card r “name will not be linked to the transactions
completed with this card.”
$12 gift card “name will be linked to the transactions completed
with this card.”

Experimental Design (cont.)
 Four conditions:
 $10 endowed: Keep the anonymous $10 card or exchange it for
an identified $12 card.
$12 endowed: Keep the identified $12 card or exchange it for an
anonymous $10 card.
$10 choice: Choose between an anonymous $10 card and an
identified $12 card.
$12 choice: Choose between an identified $12 card and an
anonymous $10 card.
 $10 endowed condition, implicit choice to sell privacy for $2
 $12 endowed condition implicit choice to pay $2 for privacy

Results
 Less than half of the people who started with $10 were willing
to give up privacy for $2
 Yet, less than 10% of people who started with $12 were willing
to pay $2 for more privacy
 People’s willingness to give up privacy has been used to argue
for low privacy regulation
 Yet, companies may just be playing framing tricks

Case Study 2:
Dancing Pigs or Externalities?
Measuring the Rationality of
Security Decisions
Elissa M. Redmiles, Michelle L. Mazurek, and John P. Dickerson
Appeared at EC2018

Theories of Security Behavior
Elissa Redmiles
The user's going to
pick dancing
pigs over security every
time. –Bruce Schneier
The user rationally ignores
security advice because the
costs outweigh the risk. --
Herley, 2009
boundedly rational security actor
with predictable and consistent,
but not always utility-optimal, behavior
based on risks and costs
CanWe
Prove?

Behavioral Economics Experimental System
Behavioral economics experiment
Amazon Mechnical Turk (Crowd Worker) participants
Online experimental system
Simulating a bank account
Make a security choice: enable/don’t enable 2FA
2FA was SMS-based

Measurement System
Elissa Redmiles
Create Account
on bank.cs
Learn risk of
hacking (H)
Learn protection
offered by 2FA (P)
Make 2FA
Decision
Log in to system
regularly
P = 50% or 90%
H = 1%, 20%,
50%
You begin the study with $ 1 in your bank account. Each time you login (at most once per
day )
day) you will earn an additional $1.

Variables Measured
Elissa Redmiles
Demographics: Gender, Age, Education
Security Decision: Enable/Don’t Enable 2FA
Password Strength: measured with neural net pwd. meter [Ur et al.]
Signup & Login Times: measured with tab in focus seconds
Security Behavior Intention [Egelman et al.]
Internet Skill [Hargittai & Hsieh]

Experiment
Elissa Redmiles
Round 1
5 days, up to $5
Round 2
5 days, up to $5
150
Break
5 days
H=1%, P=50%, Endow | Earn
H=1%, P=90% Endow
H=20%, P=50%, Endow | Earn
H=50%, P=50%, Endow
H=50%, P=90% Endow | Earn
107125

When given the choice, people leverage
cost and risk in “reasonable” directions
[but lots of influence from anchoring effects]
Lots of anchoring effects
(explains 35% of variance in security behavior we observe)
Also:
costs (time to login)
risks (chance of hacking, amount of protection offered by 2FA)
endowment effects
(an additional 26% of behavior variance)
What We Learned About
Human Decision-Makers (in Security)

We Can Measure Rationality
Elissa Redmiles
Cost is defined as wage-earning time loss
Utility of 2FA is defined the $$$ savings if a hack occurred
Rational 2FA use is when the utility of the users’ choice is greater than the cost
48% rational in Round 1
56% rational in Round 2 Significant, medium sized learning effect

This Was But a Brief Overview
 You can also incentivize survey participants to respond in
ways that agree with their peers (peer prediction)
 You can do other kinds of experiences (e.g., where people
actually buy sensitive products online and have choices of
where to buy from: private or not)
 But: remember that the world isn’t just about money. Money
can only proxy for certain losses, etc. but it can help “up the
ante” on creating ecological validity in some cases

Thank you!
Elissa Redmiles: eredmiles@cs.umd.edu
UMaryland / UMichigan joint survey methodology program
Summer Institute in Survey Research Techniques:
http://si.isr.umich.edu/overview

Survey Methodology for Security and Privacy Researchers

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (6)

Similar a Survey Methodology for Security and Privacy Researchers

Similar a Survey Methodology for Security and Privacy Researchers (20)

Último

Último (20)

Survey Methodology for Security and Privacy Researchers

Notas del editor