SlideShare una empresa de Scribd logo
1 de 108
Measurement Meets Usability:
Applying Survey and
Economic Methods to
Usable Security
Elissa M. Redmiles
@eredmil1
eredmiles@cs.umd.edu
Survey Methods for
Security and Privacy Research
What is a Survey?
Set of questions
assessing different
constructs
Questions can be open
ended: respondent
inputs any answer
Or close ended: offer
answer choices and the
respondent selects one
or more of them
“identifying a specific group or category of people and collecting
information from some of them in order to gain insight into what the
entire group does or thinks”
Handbook of International Survey Methodology
How People Answer Questions
Who Should Answer?
• So you select a sample
Rarely survey the whole population
• such that your results are generalizable (with some limitations)
Sample should be representative of your population
Ingredients of a Survey
Research Questions (What Do I Want to Know)?
Constructs (What Do I Need to Measure to Answer RQs)?
Questions (How Can I Design Valid Questions to Measure My Constructs?)
Sample (Who Can I Get To Take My Survey Given My Resources?)
Analysis (How Can I Answer My Research Question?)
Question Design Principles
Word Choice Matters
The word ‘usually’ was interpreted in 24 different ways
by one study’s participants
Variation creates data that’s hard to compare
Even worse for security and privacy surveys because
these often involve domain-specific / technical language
Research indicates that respondents often ignore
written definitions provided with questions
Likert Scales
Used to assess nuanced feelings e.g., agreement
 Good scales are between 4 and 10 points.
 Even scales elicit stronger responses (no neutral option)
 Scales should always be balanced
Double Barreled Questions
“Do you believe that your employer should require you to update
your computer and change your password every six months?”
 Requires respondents to provide a single answer about both:
 a requirement to update their computer
 a requirement to change their passwords
Social Desirability Bias
Pressure to Say the "Right"
Answer
• If you’re asking questions about a tool
you built, they may feel pressure to be
positive
• If you’re asking them questions that they
think have a correct answer or a
societally correct answer, they may feel
pressure to respond in a certain way
Mitigation: Softening Wording
• People have many different practices
when it comes to updating their
computers. Which of the following most
closely matches what you do …
Order Bias
Ordering of questions or
answer choices changes
responses
• Online people pick the top answer
choice most often
• On the phone they pick the last
choice
So, randomize that bias!
Demographic Questions &
Stereotype Threat
But don’t randomize the order in which demographic questions appear
Questions affect the answers to other questions
Research shows that asking women and minorities about their demographics
before asking them math questions makes them perform worse
Security and privacy questions are likely to have similar stereotype threat
Rule of thumb: ask about demographics at the very end
Length & Cognitive Load
The longer the survey the worse the quality of the
answers & the lower the response rates
20 minutes maximum is a good rule of thumb
Also think about how hard the questions are
Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/
Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with
Mitigation: Pretesting
Automated Tools
• QUAID: http://quaid.cohmetrix.com/
Cognitive Interviews
• Have respondents think aloud as they answer questions
• Prompt them on terms that they may interpret differently / struggle with
Expert Reviews
• Recruit ~3 survey methodology experts to review the survey
• Your library or statistics department may offer these services
Piloting (BEWARE)
• Run a small sample of the survey
• This provides the least information, just because you get answers you like
doesn’t mean they are accurate!
Even BetterUseExisting(validated)
Questions
Pew Surveys
Reason-Rupe Surveys
iPoll Database of
Survey Questions
Security and Privacy
Specific
SEBIS (Egleman and Peer
2015)
Westin Privacy Index (or
IUIPC)
Rader & Wash 2015
Web Use Skill Index (Hargittai
& Hsieh 2013)
Ethics!
Don't Know
• You should offer respondents the option to say “prefer
not to answer” or “don’t know” for required questions
• Otherwise, don’t require the question!
Code of
Conduct
• Requiring without this option is a violation of
American Association for Public Opinion Research
code of ethics
• AND explicitly disallowed by many Institutional
Review Boards
Question Game
Question Game
Question Game
Question Game
Question Game
Exercise: Design &
Pre-test a survey
Instructions
 Form groups
 Select a research question:
Do people in intimate relationships (e.g., married couples)
share accounts? What do they share and why?
Do people with different educational backgrounds do
different security behaviors?
Do people understand what end to end encryption means?
 Design a 5 question survey + demographic questions
 Cognitive interview 3 people in the room and revise your
questions
Conducting the Survey
Mode and Sampling
• Good for low tech populations, rarely usedPaper
• Good for low tech populations
• Allows for “probabilistic” sample
• CATI: computer assisted telephone
interviewing
• …many more variations
Phone
• Often used in security, privacy, HCI studies
• Highest non-response rate of any mode
• Cheapest
Web
Survey Sampling
•So you select a sample
Rarely survey the whole population
•such that your results are generalizable (with some limitations)
Sample should be representative of your population
Types of Samples
Probabilistic
• (only possible with phone or paper)
Nearly probabilistic
• GFK Knowledge Panel
Census representative, non-probability
• SSI, Qualtrics
• Google Consumer Surveys
Crowdsourced samples
• Prolific, Amazon Mechanical Turk, Crowdflower
Convenience or Snowball Samples
• Posting on social media, asking friends to take your survey
Cost
How Well Do My Results Generalize?
Comparing Security and Privacy Survey Results
from MTurk and Web Panels to the U.S.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
go.umd.edu/sampleComparison
Three Samples
PSRAI
Mode: telephone
Type: probabilistic
(CI 2.7%)
n=3,000
Price: ~$80,000
SSI
Mode: Web
Type: Census-
representative
panel
n=428
Price: $1500
MTurk
Mode: Web
Type:
Crowdsourced
n=480
Price: $500
Four Sets of Questions
Internet Behavior
Information Sources: Online Protection
Knowledge: Protective Behaviors
Negative Experiences
• Do you ever use the internet to...?
• Use social media such as Facebook, Twitter, or Instagram
• Apply for a job
• Apply for government benefits or assistance
• Apply for a loan or cash advance
• Search for sensitive health information
• Buy a product, such as books, toys, music, or clothing
Internet Behavior
Internet Behavior
• To which of the following have you turned to for advice about how
to protect your personal information online?
• Friend or Peer
• Family Member
• Co-worker
• Librarian or resource at library
• Government website
• Website run by a private organization
• Teacher
Information Sources: Online Protection
Internet Behavior
Information Sources: Online Protection
• Do you feel as though you already know enough about...?
• Choosing strong passwords to protect your online accounts
• Managing privacy settings for the information you share online
• Understanding the privacy policies of the websites and
applications you use
• Protecting the security of your devices when using public WiFi
networks
• Protecting your computer or mobile devices from viruses and
malware
• Avoiding online scams and fraudulent requests for your personal
information
Knowledge: Protective Behaviors
Internet Behavior
Information Sources: Online Protection
Knowledge: Protective Behaviors
• As far as you know have you ever...?
• Had important personal information stolen such as your Social
Security Number, your credit card, or bank account information?
• Had inaccurate information show up in your credit report?
• Had an email or social networking account of yours
compromised or taken over without your permission by
someone else?
• Been the victim of an online scam and lost money?
• Experienced persistent and unwanted contact from someone
online?
• Lost a job opportunity or educational opportunity because of
something that was posted online?
• Experienced trouble in a relationship or friendship because of
something that was posted online?
• Had someone post something about you online that you didn't
want shared?
Negative Experiences
Both web samples systematically over-report online behavior
Census-rep. web pane systematically over-reports experiences
Both web samples over-report getting advice from websites
Web samples (esp. census rep) under-report knowledge/confidence
Crowdsource
Age 18-49
Education:
Some College+
Census Rep.
Web
Age 50+
Education: HS
or less
Different Populations, Different Samples
MTurk
most
generalizable
overall
Weighting on
demographics
helps slightly (from
13 / 28 differences
to 11/28)
Caveat ethically:
Your results are an upper / lower bound
based on high skill / awareness population
CANNOT draw conclusions re: socioeconomic variance
Survey Analysis
Open-Answer Survey Analyses
 Qualitative open coding
Example approach:
select 10% of the responses, two researchers go
through and create a “codebook” (set of categories) that
represent the responses
Two researchers code the remaining responses
separately and then compute an intercoder agreement
metric(e.g., Krippendorf’s Alpha, Cohen’s Kappa)
 Testing the relationship between a group of dependent variables and one
independent variable
What factors are (cor)related to whether people back up their computer?
How well can I predict whether people will backup their computers
based factors 1, 2, and 3?
Backup ~ factor1+factor2+factor3
Logistic Regression Analysis
 Compare the responses of two or more groups
Do people who are A say X more than people who are B?
Hypothesis test (X2 test, t-test)
Closed-Answer Survey Analyses
Sample Sizes
 Power analyses are often used to determine if your sample will be
big enough to see effects (e.g., differences in the data)
 Different samples needed to answer different questions, typically
~500 is sufficient for most analyses
Regression Participants
5 variables, small effect 641
20 variables, small effect 1043
5 variables, medium effect 85
20 variables, medium effect 135
Correlation Participant
s
r=0.1 782
r=0.3 84
r=0.5 28
That’s Nice, But I Don’t
Believe You
Asking for a Friend:
Evaluating Response
Biases in Security User
Studies
Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., and Mazurek, M.L.
To appear at CCS2018
The Question
When people self-report estimated
security behavior on a survey,
do the population estimates
match real life?
Two Sets of Data
Log Data
• Symantec
Antivirus host
records
• Responses to
software update
prompts
• System variables:
crashes, etc.
Survey Data
• Responses to the
same update
prompts
• Collect the same
"system" variables
The Question
Imagine that you see the message
below appear on your computer.
Would you install the update?
• Yes, the first time I saw this message.
• Yes, within a week of seeing this
message.
• Yes, within a few weeks of seeing this
message.
• Yes, within a few months of seeing this
message.
• No.
• I don’t know.
The Answer
• The speed with which people say they would update matches the
real world, but with a systematic bias:
• when asked about themselves, people overestimate by 2
points of frequency
• when asked about their friends they overestimate by 1 point of
frequency
• The text of the message matters in real life, but not in the survey
• Multiple hypotheses:
• Insufficient incentives to affect response
• Insufficient attention in surveys to notice specific text
• Insufficient context for the text to be relevant
Now That We (Maybe)
Believe The Surveys,
What Can We DO with Them?
Case Study 1:
Where is the Digital Divide? A Survey of Security, Privacy, and Socioeconomics.
Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek
Published @ CHI2017
Elissa Redmiles
Case Study 2:
Human Perceptions of Fairness in Algorithmic Decision Making:
A Case Study of Criminal Risk Prediction.
Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller
Published @ WWW2018
Algorithmic Decision Making
 Algorithms help people make decisions about
Hiring
Assigning social benefits
Granting bail
Human
decision
making
ML
Algorithmic
decision
making
Are these algorithms fair?
Decision Making Pipeline
Decision
Making
System
OutputsInputs
Is it fair to use
a feature?
Is it Fair to Use a Feature?
Normative
• Prescribe how fair
decisions ought to be
made
• Anti-discrimination
laws
• Sensitive (race,
gender) vs non-
sensitive features
Descriptive
• Describe human
perceptions of
fairness
• Beyond
discrimination?
• Volitional (Father’s
history)
• Relevant
(Education)
• Reliable
• …
Case Study: COMPAS
Defendant’s answers to the
COMPAS questionnaire
• Current charge
• Family criminal history
• Performance in School
• Nothing Legally Sensitive
1
2
3
4
5
6
7
AverageFairness
Survey 1:
Do Judgements of Fairness Differ
How are People Making Fairness
Judgments?
 How do we determine if a feature is fair to be used?
 Example
Many believe it is not fair to assign bail based on the
Criminal history of family and friends
Why?
 A person is responsible only for their voluntary choices
Philosophical arguments on luck egalitarianism
 Is this feature volitional?
How are People Making Fairness
Judgments?
 How do we determine if a feature is fair to be used?
 Example
Many believe it is not fair to assign bail based on usual
Grades in high-school
Why?
 In the legal domain
Evidence is admissible only if it is relevant
 Is this feature relevant?
Hypothesis: Latent Properties
Reliable?
Relevant?
Private?
Volitional?
Causes Outcome?
Causes Vicious Cycle?
Causes Disparity in
Outcomes?
Caused by Sensitive
Group Membership?
Fairness of
Using the Feature
Survey 2:
Do People Use Properties
74%
43% 41%
27% 23% 21% 17% 15%
3%
0%
25%
50%
75%
100%
UsageFrequency
We can predict fairness judgments with 88% accuracy
Analysis 2:
Predict Fairness from Properties?
Fairness of
Using the Feature
Reliable?
Relevant?
Private?
Volitional?
…
Do people reach consensus in their
fairness judgments?
0
0.25
0.5
0.75
1
Consensus
Disagreements on Latent Properties
Cause Disagreements on Fairness
Low
consensus
Consensus correlated with
fairness consensus
Reliable?
Relevant?
Private?
Volitional?
Causes Outcome?
Causes Vicious Cycle?
Causes Disparity in
Outcomes?
Caused by Sensitive
Group Membership?
Fairness of
Using the Feature
Causal
Properties
Exercise: Can you draw that
conclusion?
Does the Method Support the
Conclusion?
1. The authors conducted a survey on MTurk in order to explore sharing
behaviors. They conclude that socioeconomics (income, education) do not
affect sharing practices.
Does the Method Support the
Conclusion?
2. The authors are studying why people avoid using 2FA. Their survey design
contained all closed-answer questions. All questions were required and
none offered an “I don’t know” or other answer choice. They conclude that
people do not use 2FA because they are afraid of being locked out of their
accounts.
Does the Method Support the
Conclusion?
3. The authors observed how perception of data breach incidents changed
over time by conducting surveys using SSI at yearly intervals –
administering the same survey questionnaire each time (but asking about
different incidents from the past year). They concluded that people have
become fatigued to data breaches and are exhibiting increasingly weak
reactions.
Economic Methods for
Security and Privacy Research
There’s More to Life
Question Asking
Behavioral Economics
 Method of observing decision-making in controlled
experiments with economic incentives
 Unlike traditional economics: don’t assume that the user is
rational
Case Study 1:
What Is Privacy Worth?
Alessandro Acquisti, Leslie K. John, and George Loewenstein
In The Journal of Legal Studies
Experimental Design
 Mall intercepts
 Subjects offered visa gift cards for taking a survey
$10 gift card r “name will not be linked to the transactions
completed with this card.”
$12 gift card “name will be linked to the transactions completed
with this card.”
Alessandro Acquisti, Leslie K. John, and George Loewenstein
Experimental Design (cont.)
 Four conditions:
 $10 endowed: Keep the anonymous $10 card or exchange it for
an identified $12 card.
$12 endowed: Keep the identified $12 card or exchange it for an
anonymous $10 card.
$10 choice: Choose between an anonymous $10 card and an
identified $12 card.
$12 choice: Choose between an identified $12 card and an
anonymous $10 card.
 $10 endowed condition, implicit choice to sell privacy for $2
 $12 endowed condition implicit choice to pay $2 for privacy
Alessandro Acquisti, Leslie K. John, and George Loewenstein
Results
 Less than half of the people who started with $10 were willing
to give up privacy for $2
 Yet, less than 10% of people who started with $12 were willing
to pay $2 for more privacy
 People’s willingness to give up privacy has been used to argue
for low privacy regulation
 Yet, companies may just be playing framing tricks
Alessandro Acquisti, Leslie K. John, and George Loewenstein
Case Study 2:
Dancing Pigs or Externalities?
Measuring the Rationality of
Security Decisions
Elissa M. Redmiles, Michelle L. Mazurek, and John P. Dickerson
Appeared at EC2018
Theories of Security Behavior
Elissa Redmiles
The user's going to
pick dancing
pigs over security every
time. –Bruce Schneier
The user rationally ignores
security advice because the
costs outweigh the risk. --
Herley, 2009
boundedly rational security actor
with predictable and consistent,
but not always utility-optimal, behavior
based on risks and costs
CanWe
Prove?
Behavioral Economics Experimental System
Behavioral economics experiment
Amazon Mechnical Turk (Crowd Worker) participants
Online experimental system
Simulating a bank account
Make a security choice: enable/don’t enable 2FA
2FA was SMS-based
Measurement System
Elissa Redmiles
Create Account
on bank.cs
Learn risk of
hacking (H)
Learn protection
offered by 2FA (P)
Make 2FA
Decision
Log in to system
regularly
P = 50% or 90%
H = 1%, 20%,
50%
You begin the study with $ 1 in your bank account. Each time you login (at most once per
day )
day) you will earn an additional $1.
Variables Measured
Elissa Redmiles
Demographics: Gender, Age, Education
Security Decision: Enable/Don’t Enable 2FA
Password Strength: measured with neural net pwd. meter [Ur et al.]
Signup & Login Times: measured with tab in focus seconds
Security Behavior Intention [Egelman et al.]
Internet Skill [Hargittai & Hsieh]
Experiment
Elissa Redmiles
Round 1
5 days, up to $5
Round 2
5 days, up to $5
150
Break
5 days
H=1%, P=50%, Endow | Earn
H=1%, P=90% Endow
H=20%, P=50%, Endow | Earn
H=50%, P=50%, Endow
H=50%, P=90% Endow | Earn
107125
When given the choice, people leverage
cost and risk in “reasonable” directions
[but lots of influence from anchoring effects]
Lots of anchoring effects
(explains 35% of variance in security behavior we observe)
Also:
costs (time to login)
risks (chance of hacking, amount of protection offered by 2FA)
endowment effects
(an additional 26% of behavior variance)
What We Learned About
Human Decision-Makers (in Security)
We Can Measure Rationality
Elissa Redmiles
Cost is defined as wage-earning time loss
Utility of 2FA is defined the $$$ savings if a hack occurred
Rational 2FA use is when the utility of the users’ choice is greater than the cost
48% rational in Round 1
56% rational in Round 2 Significant, medium sized learning effect
This Was But a Brief Overview
 You can also incentivize survey participants to respond in
ways that agree with their peers (peer prediction)
 You can do other kinds of experiences (e.g., where people
actually buy sensitive products online and have choices of
where to buy from: private or not)
 But: remember that the world isn’t just about money. Money
can only proxy for certain losses, etc. but it can help “up the
ante” on creating ecological validity in some cases
Thank you!
Elissa Redmiles: eredmiles@cs.umd.edu
UMaryland / UMichigan joint survey methodology program
Summer Institute in Survey Research Techniques:
http://si.isr.umich.edu/overview

Más contenido relacionado

La actualidad más candente (6)

7 15-16 digging deeper-online-inquiry #digiuri
7 15-16 digging deeper-online-inquiry #digiuri7 15-16 digging deeper-online-inquiry #digiuri
7 15-16 digging deeper-online-inquiry #digiuri
 
Ethics Update for School Counselors
Ethics Update for School CounselorsEthics Update for School Counselors
Ethics Update for School Counselors
 
Cyberbullying
CyberbullyingCyberbullying
Cyberbullying
 
Cyberbullying pp-bt28th
Cyberbullying pp-bt28thCyberbullying pp-bt28th
Cyberbullying pp-bt28th
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
 
Tp tech presentation2
Tp tech presentation2Tp tech presentation2
Tp tech presentation2
 

Similar a Survey Methodology for Security and Privacy Researchers

PO 375 Intro to Survey Research
PO 375 Intro to Survey ResearchPO 375 Intro to Survey Research
PO 375 Intro to Survey Research
atrantham
 
A Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
A Journalist’s Guide to Survey Research and Election Polls by Cliff ZuskinA Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
A Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
Fincher Consulting
 
How to Conduct a Survey gf form to anylyz
How to Conduct a Survey gf form to anylyzHow to Conduct a Survey gf form to anylyz
How to Conduct a Survey gf form to anylyz
edenjrodrigo
 

Similar a Survey Methodology for Security and Privacy Researchers (20)

POL SOC 360 Survey Research
POL SOC 360 Survey ResearchPOL SOC 360 Survey Research
POL SOC 360 Survey Research
 
Survey Research
Survey Research Survey Research
Survey Research
 
PO 375 Intro to Survey Research
PO 375 Intro to Survey ResearchPO 375 Intro to Survey Research
PO 375 Intro to Survey Research
 
Online marketing research 2011
Online marketing research 2011Online marketing research 2011
Online marketing research 2011
 
Designing Indicators
Designing IndicatorsDesigning Indicators
Designing Indicators
 
Chap013
Chap013Chap013
Chap013
 
Edmedia2011 online.surveys
Edmedia2011 online.surveysEdmedia2011 online.surveys
Edmedia2011 online.surveys
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey Research
 
G-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptxG-51-Collecting-Effective-Data-in-Counseling.pptx
G-51-Collecting-Effective-Data-in-Counseling.pptx
 
Survey design basics
Survey design basicsSurvey design basics
Survey design basics
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social Media
 
Survey Research
Survey ResearchSurvey Research
Survey Research
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 
Safps introducing integrity tests
Safps introducing integrity testsSafps introducing integrity tests
Safps introducing integrity tests
 
Managerialstatistics
ManagerialstatisticsManagerialstatistics
Managerialstatistics
 
A Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
A Journalist’s Guide to Survey Research and Election Polls by Cliff ZuskinA Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
A Journalist’s Guide to Survey Research and Election Polls by Cliff Zuskin
 
Ethics Update for School Counselors
Ethics Update for School CounselorsEthics Update for School Counselors
Ethics Update for School Counselors
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Measuring Impact
Measuring ImpactMeasuring Impact
Measuring Impact
 
How to Conduct a Survey gf form to anylyz
How to Conduct a Survey gf form to anylyzHow to Conduct a Survey gf form to anylyz
How to Conduct a Survey gf form to anylyz
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Survey Methodology for Security and Privacy Researchers

  • 1. Measurement Meets Usability: Applying Survey and Economic Methods to Usable Security Elissa M. Redmiles @eredmil1 eredmiles@cs.umd.edu
  • 2. Survey Methods for Security and Privacy Research
  • 3. What is a Survey? Set of questions assessing different constructs Questions can be open ended: respondent inputs any answer Or close ended: offer answer choices and the respondent selects one or more of them “identifying a specific group or category of people and collecting information from some of them in order to gain insight into what the entire group does or thinks” Handbook of International Survey Methodology
  • 4. How People Answer Questions
  • 5. Who Should Answer? • So you select a sample Rarely survey the whole population • such that your results are generalizable (with some limitations) Sample should be representative of your population
  • 6. Ingredients of a Survey Research Questions (What Do I Want to Know)? Constructs (What Do I Need to Measure to Answer RQs)? Questions (How Can I Design Valid Questions to Measure My Constructs?) Sample (Who Can I Get To Take My Survey Given My Resources?) Analysis (How Can I Answer My Research Question?)
  • 8. Word Choice Matters The word ‘usually’ was interpreted in 24 different ways by one study’s participants Variation creates data that’s hard to compare Even worse for security and privacy surveys because these often involve domain-specific / technical language Research indicates that respondents often ignore written definitions provided with questions
  • 9. Likert Scales Used to assess nuanced feelings e.g., agreement  Good scales are between 4 and 10 points.  Even scales elicit stronger responses (no neutral option)  Scales should always be balanced
  • 10. Double Barreled Questions “Do you believe that your employer should require you to update your computer and change your password every six months?”  Requires respondents to provide a single answer about both:  a requirement to update their computer  a requirement to change their passwords
  • 11. Social Desirability Bias Pressure to Say the "Right" Answer • If you’re asking questions about a tool you built, they may feel pressure to be positive • If you’re asking them questions that they think have a correct answer or a societally correct answer, they may feel pressure to respond in a certain way Mitigation: Softening Wording • People have many different practices when it comes to updating their computers. Which of the following most closely matches what you do …
  • 12. Order Bias Ordering of questions or answer choices changes responses • Online people pick the top answer choice most often • On the phone they pick the last choice So, randomize that bias!
  • 13. Demographic Questions & Stereotype Threat But don’t randomize the order in which demographic questions appear Questions affect the answers to other questions Research shows that asking women and minorities about their demographics before asking them math questions makes them perform worse Security and privacy questions are likely to have similar stereotype threat Rule of thumb: ask about demographics at the very end
  • 14. Length & Cognitive Load The longer the survey the worse the quality of the answers & the lower the response rates 20 minutes maximum is a good rule of thumb Also think about how hard the questions are
  • 15.
  • 16.
  • 17. Mitigation: Pretesting Automated Tools • QUAID: http://quaid.cohmetrix.com/
  • 18.
  • 19. Mitigation: Pretesting Automated Tools • QUAID: http://quaid.cohmetrix.com/ Cognitive Interviews • Have respondents think aloud as they answer questions • Prompt them on terms that they may interpret differently / struggle with
  • 20.
  • 21.
  • 22. Mitigation: Pretesting Automated Tools • QUAID: http://quaid.cohmetrix.com/ Cognitive Interviews • Have respondents think aloud as they answer questions • Prompt them on terms that they may interpret differently / struggle with Expert Reviews • Recruit ~3 survey methodology experts to review the survey • Your library or statistics department may offer these services Piloting (BEWARE) • Run a small sample of the survey • This provides the least information, just because you get answers you like doesn’t mean they are accurate!
  • 23. Even BetterUseExisting(validated) Questions Pew Surveys Reason-Rupe Surveys iPoll Database of Survey Questions Security and Privacy Specific SEBIS (Egleman and Peer 2015) Westin Privacy Index (or IUIPC) Rader & Wash 2015 Web Use Skill Index (Hargittai & Hsieh 2013)
  • 24. Ethics! Don't Know • You should offer respondents the option to say “prefer not to answer” or “don’t know” for required questions • Otherwise, don’t require the question! Code of Conduct • Requiring without this option is a violation of American Association for Public Opinion Research code of ethics • AND explicitly disallowed by many Institutional Review Boards
  • 31. Instructions  Form groups  Select a research question: Do people in intimate relationships (e.g., married couples) share accounts? What do they share and why? Do people with different educational backgrounds do different security behaviors? Do people understand what end to end encryption means?  Design a 5 question survey + demographic questions  Cognitive interview 3 people in the room and revise your questions
  • 33. • Good for low tech populations, rarely usedPaper • Good for low tech populations • Allows for “probabilistic” sample • CATI: computer assisted telephone interviewing • …many more variations Phone • Often used in security, privacy, HCI studies • Highest non-response rate of any mode • Cheapest Web
  • 34. Survey Sampling •So you select a sample Rarely survey the whole population •such that your results are generalizable (with some limitations) Sample should be representative of your population
  • 35. Types of Samples Probabilistic • (only possible with phone or paper) Nearly probabilistic • GFK Knowledge Panel Census representative, non-probability • SSI, Qualtrics • Google Consumer Surveys Crowdsourced samples • Prolific, Amazon Mechanical Turk, Crowdflower Convenience or Snowball Samples • Posting on social media, asking friends to take your survey Cost
  • 36. How Well Do My Results Generalize? Comparing Security and Privacy Survey Results from MTurk and Web Panels to the U.S. Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek go.umd.edu/sampleComparison
  • 37. Three Samples PSRAI Mode: telephone Type: probabilistic (CI 2.7%) n=3,000 Price: ~$80,000 SSI Mode: Web Type: Census- representative panel n=428 Price: $1500 MTurk Mode: Web Type: Crowdsourced n=480 Price: $500
  • 38. Four Sets of Questions Internet Behavior Information Sources: Online Protection Knowledge: Protective Behaviors Negative Experiences
  • 39. • Do you ever use the internet to...? • Use social media such as Facebook, Twitter, or Instagram • Apply for a job • Apply for government benefits or assistance • Apply for a loan or cash advance • Search for sensitive health information • Buy a product, such as books, toys, music, or clothing Internet Behavior
  • 40. Internet Behavior • To which of the following have you turned to for advice about how to protect your personal information online? • Friend or Peer • Family Member • Co-worker • Librarian or resource at library • Government website • Website run by a private organization • Teacher Information Sources: Online Protection
  • 41. Internet Behavior Information Sources: Online Protection • Do you feel as though you already know enough about...? • Choosing strong passwords to protect your online accounts • Managing privacy settings for the information you share online • Understanding the privacy policies of the websites and applications you use • Protecting the security of your devices when using public WiFi networks • Protecting your computer or mobile devices from viruses and malware • Avoiding online scams and fraudulent requests for your personal information Knowledge: Protective Behaviors
  • 42. Internet Behavior Information Sources: Online Protection Knowledge: Protective Behaviors • As far as you know have you ever...? • Had important personal information stolen such as your Social Security Number, your credit card, or bank account information? • Had inaccurate information show up in your credit report? • Had an email or social networking account of yours compromised or taken over without your permission by someone else? • Been the victim of an online scam and lost money? • Experienced persistent and unwanted contact from someone online? • Lost a job opportunity or educational opportunity because of something that was posted online? • Experienced trouble in a relationship or friendship because of something that was posted online? • Had someone post something about you online that you didn't want shared? Negative Experiences
  • 43. Both web samples systematically over-report online behavior Census-rep. web pane systematically over-reports experiences
  • 44. Both web samples over-report getting advice from websites Web samples (esp. census rep) under-report knowledge/confidence
  • 45. Crowdsource Age 18-49 Education: Some College+ Census Rep. Web Age 50+ Education: HS or less Different Populations, Different Samples
  • 46. MTurk most generalizable overall Weighting on demographics helps slightly (from 13 / 28 differences to 11/28) Caveat ethically: Your results are an upper / lower bound based on high skill / awareness population CANNOT draw conclusions re: socioeconomic variance
  • 48. Open-Answer Survey Analyses  Qualitative open coding Example approach: select 10% of the responses, two researchers go through and create a “codebook” (set of categories) that represent the responses Two researchers code the remaining responses separately and then compute an intercoder agreement metric(e.g., Krippendorf’s Alpha, Cohen’s Kappa)
  • 49.  Testing the relationship between a group of dependent variables and one independent variable What factors are (cor)related to whether people back up their computer? How well can I predict whether people will backup their computers based factors 1, 2, and 3? Backup ~ factor1+factor2+factor3 Logistic Regression Analysis  Compare the responses of two or more groups Do people who are A say X more than people who are B? Hypothesis test (X2 test, t-test) Closed-Answer Survey Analyses
  • 50. Sample Sizes  Power analyses are often used to determine if your sample will be big enough to see effects (e.g., differences in the data)  Different samples needed to answer different questions, typically ~500 is sufficient for most analyses Regression Participants 5 variables, small effect 641 20 variables, small effect 1043 5 variables, medium effect 85 20 variables, medium effect 135 Correlation Participant s r=0.1 782 r=0.3 84 r=0.5 28
  • 51. That’s Nice, But I Don’t Believe You
  • 52. Asking for a Friend: Evaluating Response Biases in Security User Studies Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., and Mazurek, M.L. To appear at CCS2018
  • 53. The Question When people self-report estimated security behavior on a survey, do the population estimates match real life?
  • 54. Two Sets of Data Log Data • Symantec Antivirus host records • Responses to software update prompts • System variables: crashes, etc. Survey Data • Responses to the same update prompts • Collect the same "system" variables
  • 55. The Question Imagine that you see the message below appear on your computer. Would you install the update? • Yes, the first time I saw this message. • Yes, within a week of seeing this message. • Yes, within a few weeks of seeing this message. • Yes, within a few months of seeing this message. • No. • I don’t know.
  • 56. The Answer • The speed with which people say they would update matches the real world, but with a systematic bias: • when asked about themselves, people overestimate by 2 points of frequency • when asked about their friends they overestimate by 1 point of frequency • The text of the message matters in real life, but not in the survey • Multiple hypotheses: • Insufficient incentives to affect response • Insufficient attention in surveys to notice specific text • Insufficient context for the text to be relevant
  • 57. Now That We (Maybe) Believe The Surveys, What Can We DO with Them?
  • 58. Case Study 1: Where is the Digital Divide? A Survey of Security, Privacy, and Socioeconomics. Elissa M. Redmiles, Sean Kross, and Michelle L. Mazurek Published @ CHI2017
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75. Case Study 2: Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction. Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller Published @ WWW2018
  • 76. Algorithmic Decision Making  Algorithms help people make decisions about Hiring Assigning social benefits Granting bail Human decision making ML Algorithmic decision making Are these algorithms fair?
  • 78. Is it Fair to Use a Feature? Normative • Prescribe how fair decisions ought to be made • Anti-discrimination laws • Sensitive (race, gender) vs non- sensitive features Descriptive • Describe human perceptions of fairness • Beyond discrimination? • Volitional (Father’s history) • Relevant (Education) • Reliable • …
  • 79. Case Study: COMPAS Defendant’s answers to the COMPAS questionnaire • Current charge • Family criminal history • Performance in School • Nothing Legally Sensitive
  • 81. How are People Making Fairness Judgments?  How do we determine if a feature is fair to be used?  Example Many believe it is not fair to assign bail based on the Criminal history of family and friends Why?  A person is responsible only for their voluntary choices Philosophical arguments on luck egalitarianism  Is this feature volitional?
  • 82. How are People Making Fairness Judgments?  How do we determine if a feature is fair to be used?  Example Many believe it is not fair to assign bail based on usual Grades in high-school Why?  In the legal domain Evidence is admissible only if it is relevant  Is this feature relevant?
  • 83. Hypothesis: Latent Properties Reliable? Relevant? Private? Volitional? Causes Outcome? Causes Vicious Cycle? Causes Disparity in Outcomes? Caused by Sensitive Group Membership? Fairness of Using the Feature
  • 84. Survey 2: Do People Use Properties 74% 43% 41% 27% 23% 21% 17% 15% 3% 0% 25% 50% 75% 100% UsageFrequency
  • 85. We can predict fairness judgments with 88% accuracy Analysis 2: Predict Fairness from Properties? Fairness of Using the Feature Reliable? Relevant? Private? Volitional? …
  • 86. Do people reach consensus in their fairness judgments? 0 0.25 0.5 0.75 1 Consensus
  • 87. Disagreements on Latent Properties Cause Disagreements on Fairness Low consensus Consensus correlated with fairness consensus Reliable? Relevant? Private? Volitional? Causes Outcome? Causes Vicious Cycle? Causes Disparity in Outcomes? Caused by Sensitive Group Membership? Fairness of Using the Feature Causal Properties
  • 88. Exercise: Can you draw that conclusion?
  • 89. Does the Method Support the Conclusion? 1. The authors conducted a survey on MTurk in order to explore sharing behaviors. They conclude that socioeconomics (income, education) do not affect sharing practices.
  • 90. Does the Method Support the Conclusion? 2. The authors are studying why people avoid using 2FA. Their survey design contained all closed-answer questions. All questions were required and none offered an “I don’t know” or other answer choice. They conclude that people do not use 2FA because they are afraid of being locked out of their accounts.
  • 91. Does the Method Support the Conclusion? 3. The authors observed how perception of data breach incidents changed over time by conducting surveys using SSI at yearly intervals – administering the same survey questionnaire each time (but asking about different incidents from the past year). They concluded that people have become fatigued to data breaches and are exhibiting increasingly weak reactions.
  • 92. Economic Methods for Security and Privacy Research
  • 93. There’s More to Life Question Asking
  • 94. Behavioral Economics  Method of observing decision-making in controlled experiments with economic incentives  Unlike traditional economics: don’t assume that the user is rational
  • 95. Case Study 1: What Is Privacy Worth? Alessandro Acquisti, Leslie K. John, and George Loewenstein In The Journal of Legal Studies
  • 96. Experimental Design  Mall intercepts  Subjects offered visa gift cards for taking a survey $10 gift card r “name will not be linked to the transactions completed with this card.” $12 gift card “name will be linked to the transactions completed with this card.” Alessandro Acquisti, Leslie K. John, and George Loewenstein
  • 97. Experimental Design (cont.)  Four conditions:  $10 endowed: Keep the anonymous $10 card or exchange it for an identified $12 card. $12 endowed: Keep the identified $12 card or exchange it for an anonymous $10 card. $10 choice: Choose between an anonymous $10 card and an identified $12 card. $12 choice: Choose between an identified $12 card and an anonymous $10 card.  $10 endowed condition, implicit choice to sell privacy for $2  $12 endowed condition implicit choice to pay $2 for privacy Alessandro Acquisti, Leslie K. John, and George Loewenstein
  • 98. Results  Less than half of the people who started with $10 were willing to give up privacy for $2  Yet, less than 10% of people who started with $12 were willing to pay $2 for more privacy  People’s willingness to give up privacy has been used to argue for low privacy regulation  Yet, companies may just be playing framing tricks Alessandro Acquisti, Leslie K. John, and George Loewenstein
  • 99. Case Study 2: Dancing Pigs or Externalities? Measuring the Rationality of Security Decisions Elissa M. Redmiles, Michelle L. Mazurek, and John P. Dickerson Appeared at EC2018
  • 100. Theories of Security Behavior Elissa Redmiles The user's going to pick dancing pigs over security every time. –Bruce Schneier The user rationally ignores security advice because the costs outweigh the risk. -- Herley, 2009 boundedly rational security actor with predictable and consistent, but not always utility-optimal, behavior based on risks and costs CanWe Prove?
  • 101. Behavioral Economics Experimental System Behavioral economics experiment Amazon Mechnical Turk (Crowd Worker) participants Online experimental system Simulating a bank account Make a security choice: enable/don’t enable 2FA 2FA was SMS-based
  • 102. Measurement System Elissa Redmiles Create Account on bank.cs Learn risk of hacking (H) Learn protection offered by 2FA (P) Make 2FA Decision Log in to system regularly P = 50% or 90% H = 1%, 20%, 50% You begin the study with $ 1 in your bank account. Each time you login (at most once per day ) day) you will earn an additional $1.
  • 103. Variables Measured Elissa Redmiles Demographics: Gender, Age, Education Security Decision: Enable/Don’t Enable 2FA Password Strength: measured with neural net pwd. meter [Ur et al.] Signup & Login Times: measured with tab in focus seconds Security Behavior Intention [Egelman et al.] Internet Skill [Hargittai & Hsieh]
  • 104. Experiment Elissa Redmiles Round 1 5 days, up to $5 Round 2 5 days, up to $5 150 Break 5 days H=1%, P=50%, Endow | Earn H=1%, P=90% Endow H=20%, P=50%, Endow | Earn H=50%, P=50%, Endow H=50%, P=90% Endow | Earn 107125
  • 105. When given the choice, people leverage cost and risk in “reasonable” directions [but lots of influence from anchoring effects] Lots of anchoring effects (explains 35% of variance in security behavior we observe) Also: costs (time to login) risks (chance of hacking, amount of protection offered by 2FA) endowment effects (an additional 26% of behavior variance) What We Learned About Human Decision-Makers (in Security)
  • 106. We Can Measure Rationality Elissa Redmiles Cost is defined as wage-earning time loss Utility of 2FA is defined the $$$ savings if a hack occurred Rational 2FA use is when the utility of the users’ choice is greater than the cost 48% rational in Round 1 56% rational in Round 2 Significant, medium sized learning effect
  • 107. This Was But a Brief Overview  You can also incentivize survey participants to respond in ways that agree with their peers (peer prediction)  You can do other kinds of experiences (e.g., where people actually buy sensitive products online and have choices of where to buy from: private or not)  But: remember that the world isn’t just about money. Money can only proxy for certain losses, etc. but it can help “up the ante” on creating ecological validity in some cases
  • 108. Thank you! Elissa Redmiles: eredmiles@cs.umd.edu UMaryland / UMichigan joint survey methodology program Summer Institute in Survey Research Techniques: http://si.isr.umich.edu/overview

Notas del editor

  1. fewer options can create bias by requiring the respondent to pick something that doesn’t quite fit, while too many options might render differences in responses meaningless.
  2. Example of paper with
  3. Snowball – a few uses sex worker example Crowdsourced– cheap good for piloting, also good for longitudinal and other tasks Demographically diverse optimal $1500 minimum for some places, usually around $3/response GFK does probabilistic sampling but is web so you get nonresponse that’s not predictable Probabilistic $1500 per item for 1000 answers, so around $20-80k per survey
  4. Really depends on conditions and number of variables
  5. Machine learning algorithms are increasingly being used to assist or even replace human decision making. Decisions that used to be made by humans are nowadays often made with the help of machine learning based decision making systems. For example, nowadays, people use algorithms to help them decide If they should hire someone or not If they should give someone social benefits or not And even if they should grant someone bail or not All of these decisions have immense impacts on human lives. Because of that, we need to make sure that the algorithms that are making these decisions are fair.
  6. Before specifying what we exactly mean when we say that algorithmic decision making systems need to be fair, first, let’s take a look at how a typical algorithmic decision making system works. Let’s consider a system built to help judges decide if they should grant someone bail or not. A set of inputs, in this case features about the defendant, is fed to a decision making system. The decision making system in this case might be a classifier, which in turn produces some outputs, for example a binary decision of “grant bail” or “deny bail”. We see that we have several distinct parts in this decision making pipeline. In this talk we’ll focus on the first part, and consider the fairness of the inputs. More precisely, we consider the fairness of the features used as inputs, and try to understand and account for human perceptions on whether a feature is fair to be used in that decision making system, or not.
  7. How do we answer that question? How do we determine if it is fair to use a feature or not? One could take a normative approach, and try to proscribe how fair decisions ought to be made. For example, one could refer to anti-discrimination laws, and say that it is prohibited to use sensitive features, such as race or gender. On the other hand, non-sensitive features can be used freely. However, we can take an alternative approach. Instead of proscribing which features can be used, we could try a descriptive approach, and describe human perceptions on which features are fair to be used. We could ask people which features they think it’s fair to use, and see if we discover some interesting findings, to see if human perceptions of fairness of using features goes beyond the binary distinction between sensitive and non-sensitive features. But, what else could make a feature be perceived as unfair to be used, except for it being sensitive? Let’s go back to our example of a system designed to help judges make bail decisions. That classifier might be using the defendant’s criminal history as an input feature. But what if it used the criminal history of the defendant’s father, a feature that is not volitional? Or, what if it used information about the defendant’s education, which might not be relevant to the task of making bail decisions? Or what if it used features that are not reliable, or ones that are privacy sensitive?
  8. How do we answer that question? How do we determine if it is fair to use a feature or not? One could take a normative approach, and try to proscribe how fair decisions ought to be made. For example, one could refer to anti-discrimination laws, and say that it is prohibited to use sensitive features, such as race or gender. On the other hand, non-sensitive features can be used freely. However, we can take an alternative approach. Instead of proscribing which features can be used, we could try a descriptive approach, and describe human perceptions on which features are fair to be used. We could ask people which features they think it’s fair to use, and see if we discover some interesting findings, to see if human perceptions of fairness of using features goes beyond the binary distinction between sensitive and non-sensitive features. But, what else could make a feature be perceived as unfair to be used, except for it being sensitive? Let’s go back to our example of a system designed to help judges make bail decisions. That classifier might be using the defendant’s criminal history as an input feature. But what if it used the criminal history of the defendant’s father, a feature that is not volitional? Or, what if it used information about the defendant’s education, which might not be relevant to the task of making bail decisions? Or what if it used features that are not reliable, or ones that are privacy sensitive?
  9. Well, we decided to look into what people think about using these features. We conducted a series of surveys asking people to tell us how fair they believe it is to use these features. We asked them to rate the fairness on a 7 point Likert scale, where 1 denotes that the feature is completely unfair, while 7 that it is completely fair. Below, we see 10 groups of questions from the COMPAS questionnaire, ranging from the criminal history of the defendant, to their personality, and all the way to the criminal history of their family, and their education. We asked 196 Amazon Mechanical Turk master workers from the US to rate the fairness of using these feature to grant bail. On the y axis, we show the average fairness rating assigned to each of these features. We see that many of the features, such as the previously mentioned education and family criminal history, are considered unfair to be used in this decision making scenario. This leads to ask why are these features considered unfair to be used?
  10. How are people making these judgments about the fairness of using features? Are there any particular reasons that naturally come to mind when you think about why it is fair or unfair to use a feature? Let’s take a look, for example at the feature about the criminal history of the defendant’s family. Many people believe it is not fair to assign bail based on this feature. Why is that the case? Well, it is possible that people believe that people should be held responsible only for their voluntary choices, but not penalized for their unchosen circumstances. Such reasoning would be consistent with philosophical arguments on luck egalitarianism. So, is this feature volitional?
  11. Let’s take a look at another feature. Many people believe it is not fair to assign bail based on the defendant’s grades in high-school. Why might that be the case? Well, in the legal domain, a piece of evidence is admissible only if it is relevant. Is a person’s GPA relevant for the task of assigning bail?
  12. Let’s take a look at another feature. Many people believe it is not fair to assign bail based on the defendant’s grades in high-school. Why might that be the case? Well, in the legal domain, a piece of evidence is admissible only if it is relevant. Is a person’s GPA relevant for the task of assigning bail?
  13. Let’s take a look at another feature. Many people believe it is not fair to assign bail based on the defendant’s grades in high-school. Why might that be the case? Well, in the legal domain, a piece of evidence is admissible only if it is relevant. Is a person’s GPA relevant for the task of assigning bail?
  14. Let’s take a look at another feature. Many people believe it is not fair to assign bail based on the defendant’s grades in high-school. Why might that be the case? Well, in the legal domain, a piece of evidence is admissible only if it is relevant. Is a person’s GPA relevant for the task of assigning bail?
  15. The “can’t care less model” The “security is always ineffective” model Herley: In his theoretical paper, he provides a cost-bene.t analysis of having end-users follow security advice. He argues that threats are so rare – and suggested behaviors so ineffective – as to make it logical for end users to never adopt security behaviors. The boundedly rational model: proposed in privacy, many point-studies on warning fatigue, message design, specific behaviors allude to this but no one has defined/measured it Why do we care to define and measure? Once we know what they’re paying attention to when making decisions/how they make those decisions we can start to adjust what they are doing by adjusting those parameters.
  16. Explain MTurk (only US users): crowdsourcing service where you can hire people to do tasks for money, Whiter and more technical than other countries, limited to one place to somewhat limit variance in value of money Participants -> Amazon Mechanical Turk crowd workers talk about using hourly wage Remove realistic -- Online  $5 is worth an average of an hour of an Mturkers time They needed to create an account in our experimental system and make a security choice. We made our system similar to the concept of a bank account (their study money was stored there) – will walk through system in a minute.  The security choice they were asked to make was whether to turn on 2FA. In prior work that we have done, we found that people’s understanding of 2FA is in the middle of the road between passwords and updating/antivirus. 2FA is an explicit optimal choice that people understand relatively well, so that is what we picked.  we used SMS for convenience here, even though it's not maximally secure (and used in many places)
  17. they also read consent/short description in MTurk to clarify hacking
  18. $5 is worth an average of an hour of an Mturkers time They needed to create an account in our experimental system and make a security choice. We made our system similar to the concept of a bank account (their study money was stored there) – will walk through system in a minute. The security choice they were asked to make was whether to turn on 2FA. In prior work that we have done, we found that people’s understanding of 2FA is in the middle of the road between passwords and updating/antivirus. 2FA is an explicit optimal choice that people understand relatively well, so that is what we picked.