SlideShare una empresa de Scribd logo
1 de 62
Randomized & natural
experiments
MODELING SOCIAL DATA
JAKE HOFMAN
COLUMBIA UNIVERSITY
Example: Hospitalization on health
What’s wrong with estimating this model from observational data?
Health
tomorrow
Hospital
visit today Effect?
Arrow means “X causes Y”
Confounds
The effect and cause might be
confounded by a common cause,
and be changing together as a
result
Health
tomorrow
Hospital
visit today Effect?
Health
today
Dashed circle means “unobserved”
Confounds
If we only get to observe them
changing together, we can’t
estimate the effect of
hospitalization changing alone
Health
tomorrow
Hospital
visit today Effect?
Health
today
“To find out what happens when you change
something, it is necessary to change it.”
-GEORGE BOX
Hospital
visit today
Random assignment
Random assignment determines the treatment independent of any
confounds
Health
tomorrowEffect?
Health
today
Coin flip
Double lines mean
“intervention”
Counterfactuals
To isolate the causal effect, we have to change one and only one
thing (hospital visits), and compare outcomes
+ vs
(what happened)
Reality
(what would have happened)
Counterfactual
Counterfactuals
We never get to observe what would have happened if we did
something else, so we have to estimate it
+ vs
(what happened)
Reality
(what would have happened)
Counterfactual
Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
Heads Tails
Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
Heads Tails
Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
Random assignment
Dunning (2012)
Problems
Random assignment is the “gold standard” for causal inference, but
can be misleading under certain circumstances
◦ Small sample sizes
◦ Researcher degrees of freedom
◦ Publication bias
◦ P-hacking
http://bit.ly/fdrtree
Electronic copy available at: https://ssrn.com/abstract=1850704
designed to demonstrate something false: that certain songs
can change listeners’ age. Everything reported here actually
happened.1
Study 1:musical contrast and subjective age
In Study 1, we investigated whether listening to a children’s
song induces an age contrast, making people feel older. In
exchange for payment, 30 University of Pennsylvania under-
graduates sat at computer terminals, donned headphones, and
were randomly assigned to listen to either a control song
(“Kalimba,” an instrumental song by Mr. Scruff that comes
free with the Windows 7 operating system) or a children’s
song (“Hot Potato,” performed by The Wiggles).
After listening to part of the song, participants com-
pleted an ostensibly unrelated survey: They answered the
question “How old do you feel right now?” by choosing
among five options (very young, young, neither young nor
old, old, and very old). They also reported their father’s
age, allowing us to control for variation in baseline age
across participants.
An analysis of covariance (ANCOVA) revealed the pre-
dicted effect: People felt older after listening to “Hot Potato”
tematic analysis of how researcher degrees of freedom influ-
ence statistical significance. Impatient readers can consult
Table 3.
“How Bad Can It Be?” Simulations
Simulationsof common researcher degreesof
freedom
We used computer simulations of experimental data to esti-
mate how researcher degrees of freedom influence the proba-
bility of a false-positive result. These simulations assessed
the impact of four common degrees of freedom: flexibility in
(a) choosing among dependent variables, (b) choosing sample
size, (c) using covariates, and (d) reporting subsets of experi-
mental conditions. We also investigated various combinations
of these degrees of freedom.
We generated random samples with each observation inde-
pendently drawn from a normal distribution, performed sets of
analyses on each sample, and observed how often at least one
of the resulting p values in each sample was below standard
significance levels. For example, imagine a researcher who
collects two dependent variables, say liking and willingness to
by guest on November 20, 2011pss.sagepub.comDownloaded from
1360 Simmonset al.
of ambiguous information and remarkably adept at reaching
justifiable conclusions that mesh with their desires (Babcock
& Loewenstein, 1997; Dawson, Gilovich, & Regan, 2002;
Gilovich, 1983; Hastorf & Cantril, 1954; Kunda, 1990; Zuck-
erman, 1979). This literature suggests that when we as
researchers face ambiguous analytic decisions, we will tend to
conclude, with convincing self-justification, that the appropri-
ate decisions are those that result in statistical significance
(p ≤ .05).
Ambiguity is rampant in empirical research. As an exam-
ple, consider a very simple decision faced by researchers ana-
lyzing reaction times: how to treat outliers. In a perusal of
roughly 30 Psychological Science articles, we discovered con-
siderable inconsistency in, and hence considerable ambiguity
about, this decision. Most (but not all) researchers excluded
some responses for being too fast, but what constituted “too
fast” varied enormously: the fastest 2.5%, or faster than 2 stan-
dard deviations from the mean, or faster than 100 or 150 or
200 or 300 ms. Similarly, what constituted “too slow” varied
enormously: the slowest 2.5% or 10%, or 2 or 2.5 or 3 stan-
dard deviations slower than the mean, or 1.5 standard devia-
tions slower from that condition’s mean, or slower than 1,000
or 1,200 or 1,500 or 2,000 or 3,000 or 5,000 ms. None of these
decisions is necessarily incorrect, but that fact makes any of
them justifiable and hence potential fodder for self-serving
(adjusted M = 2.54 years) than after listening to the control
song (adjusted M = 2.06 years), F(1, 27) = 5.06, p = .033.
In Study 2, we sought to conceptually replicate and extend
Study 1. Having demonstrated that listening to a children’s
song makes people feel older, Study 2 investigated whether
listening to a song about older age makes people actually
younger.
Study 2:musical contrast and chronological
rejuvenation
Using the same method as in Study 1, we asked 20 University
of Pennsylvania undergraduates to listen to either “When I’m
Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensi-
bly unrelated task, they indicated their birth date (mm/dd/
yyyy) and their father’s age. We used father’s age to control
for variation in baseline age across participants.
An ANCOVA revealed the predicted effect: According to
their birth dates, people were nearly a year-and-a-half younger
after listening to “When I’m Sixty-Four” (adjusted M = 20.1
years) rather than to “Kalimba” (adjusted M = 21.5 years),
F(1, 17) = 4.92, p = .040.
Discussion
Psychological Science
22(11) 1359–1366
© TheAuthor(s) 2011
Reprintsand permission:
sagepub.com/journalsPermissions.nav
DOI:10.1177/0956797611417632
http://pss.sagepub.com
False-Positive Psychology:Undisclosed
Flexibility in Data Collection and Analysis
AllowsPresentingAnything asSignificant
Joseph P. Simmons1
, Leif D. Nelson2
, and Uri Simonsohn1
1
TheWharton School, University of Pennsylvania, and 2
Haas School of Business, University of California,Berkeley
Abstract
In thisarticle,weaccomplish two things.First,weshow that despite empirical psychologists’ nominal endorsement of alow rate
of false-positive findings (£ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive
rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence
that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy
it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost,
and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for
authors and four guidelines for reviewers,all of which impose aminimal bur den on the publication process.
Keywords
General Article
False-Positive Psychology 1361
Table 1. Likelihood of ObtainingaFalse-Positive Result
Significance level
Researcher degrees of freedom p < .1 p < .05 p < .01
Situation A:two dependent variables (r = .50) 17.8% 9.5% 2.2%
Situation B: addition of 10 more observations
per cell
14.5% 7.7% 1.6%
Situation C: controllingfor gender or interaction
of gender with treatment
21.6% 11.7% 2.7%
Situation D: dropping(or not dropping) one of
three conditions
23.2% 12.6% 2.8%
Combine Situations A and B 26.0% 14.4% 3.3%
Combine Situations A,B,and C 50.9% 30.9% 8.4%
Combine Situations A,B,C,and D 81.5% 60.7% 21.5%
Note: The table reportsthe percentage of 15,000 simulated samplesin which at least one of a
set of analyseswassignificant. Observationswere drawn independently from anormal distribu-
tion.Baseline isatwo-condition design with 20 observationsper cell.Resultsfor SituationA were
obtained by conductingthree t tests,one on each of two dependent variablesand athird on the
average of these two variables.Resultsfor Situation B were obtained by conductingone t test after
collecting20 observationsper cell and another after collectingan additional 10 observationsper
cell.Resultsfor Situation C were obtained by conductinga t test,an analysisof covariance with a
Caveats / limitations
Random assignment is the “gold standard” for causal inference, but
it has some limitations:
◦ Randomization often isn’t feasible and/or ethical
◦ Experiments are costly in terms of time and money
◦ It’s difficult to create convincing parallel worlds
◦ Inevitably people deviate from their random assignments
Anyone can flip a coin, but it’s difficult to create convincing parallel
worlds
Natural experiments
Natural experiments
Sometimes we get lucky and nature effectively runs experiments for
us, e.g.:
◦ As-if random: People are randomly exposed to water sources
◦ Instrumental variables: A lottery influences military service
◦ Discontinuities: Star ratings get arbitrarily rounded
◦ Difference in differences: Minimum wage changes in just one state
Natural experiments
Sometimes we get lucky and nature effectively runs experiments for
us, e.g.:
◦ As-if random: People are randomly exposed to water sources
◦ Instrumental variables: A lottery influences military service
◦ Discontinuities: Star ratings get arbitrarily rounded
◦ Difference in differences: Minimum wage changes in just one state
Experiments happen all the time, we just have to notice them
As-if random
Idea: Nature randomly assigns
conditions
Example: People are randomly
exposed to water sources (Snow,
1854)
http://bit.ly/johnsnowmap
Instrumental
variables
Idea: An instrument
independently shifts the
distribution of a
treatment
Example: A lottery
influences military
service (Angrist, 1990)
Military
service
Future
earningsEffect?
Confounds
Lottery
Dunning (2012)
Figure 4: Average Revenue around Discontinuous Changes in Rating
Notes: Each restaurant’s log revenue is de-meaned to normalize a restaurant’s average log
revenue to zero. Normalized log revenues are then averaged within bins based on how far the
restaurant’s rating is from a rounding threshold in that quarter. The graph plots average log
revenue as a function of how far the rating is from a rounding threshold. All points with a
positive (negative) distance from a discontinuity are rounded up (down).
Regression
discontinuities
Idea: Things change around an
arbitrarily chosen threshold
Example: Star ratings get
arbitrarily rounded (Luca, 2011)
http://bit.ly/yelpstars
Difference in
differences
Idea: Compare differences after a
sudden change with trends in a
control group
Example: Minimum wage changes
in just one state (Card & Krueger,
1994)
http://stats.stackexchange.com/a/125266
Natural experiments: Caveats
Natural experiments are great, but:
◦ Good natural experiments are hard to find
◦ They rely on many (untestable) assumptions
◦ The treated population may not be the one of interest
Natural experiments: Caveats
Natural experiments are great, but:
◦ Good natural experiments are hard to find
◦ They rely on many (untestable) assumptions
◦ The treated population may not be the one of interest
Sometimes we can use additional data + algorithms to
automatically find natural experiments
Discovering natural
experiments
Example: How
much traffic to
recommender
systems cause?
(Sharma, Hofman & Watts, 2015)
Observational estimates
Naively, up to 30% of pageviews come through recommendations
Observational estimates
Naively, up to 30% of pageviews come through recommendations
Observational estimates
Naively, up to 30% of pageviews come through recommendations
But this is almost surely an overestimate of the effect
Typical browsing
WITH RECOMMENDATIONS
Typical browsing: Search
Typical browsing: Focal product
Typical browsing: Recommended product
Counterfactual browsing
WITHOUT RECOMMENDATIONS
Counterfactual browsing: Search
Counterfactual browsing: Focal product
Counterfactual browsing: Search again
Counterfactual browsing
Confound: Correlated demand
Some views would have
happened anyway due to
correlated demand
We call these convenience clicks
Direct
views of
hats
Referred
click-
throughs
to gloves
Effect?
Demand
for winter
items
Ideally we would run an A/B test where randomly selected people
see recommendations
Ideal experiment
World 1 World 2
Natural experiment
Instead, we can exploits sudden shocks in traffic to focal products as
an instrumental variable
Direct
views of
focal
product
Referred
click-
throughsEffect?
Demand
External
shock
Usual
approach
Think hard for a source
of random variation that
only directly affects focal
product (e.g., author
wins award)
Usual
approach
Think hard for a source
of random variation that
only directly affects focal
product (e.g., author
wins award)
Problem: Impossible to
rule out side effects
New approach: Automatically discovering
natural experiments
Look for products that receive shocks in direct traffic, while their
recommendations do not
New approach: Automatically discovering
natural experiments
Look for products that receive shocks in direct traffic, while their
recommendations do not
New approach: Automatically discovering
natural experiments
Applying this method to 23 million pageviews in the Bing Toolbar
logs, we find over 4,000 such natural experiments
New approach: Automatically discovering
natural experiments
Causal click-through rate is just the marginal gain in clicks during the
shock
Causal effect = Change in recommendation clicks / Size of shock
Causal click-through rate is just the
marginal gain in clicks during the
shock
Causal effect = Change in rec clicks /
Size of shock
Results
Although recommendation click-
throughs account for a large fraction
of traffic, at least 75% of this activity
would likely occur in the absence of
recommendations*
Results
*With lots of caveats
Closing thoughts
Large-scale observational data is useful for building predictive
models of a static world
Closing thoughts
But without appropriate random variation, it’s hard to
predict what happens when you change something in the
world
Closing thoughts
Randomized experiments are like custom-made datasets to
answer a specific question
Closing thoughts
Additional data + algorithms can help us discover and
analyze these examples in the wild

Más contenido relacionado

Similar a Modeling Social Data, Lecture 12: Causality & Experiments, Part 2

8 chapter eightpowerpoint
8 chapter eightpowerpoint8 chapter eightpowerpoint
8 chapter eightpowerpoint
sagebennet
 
ANSWER TO MODULE 5 ASSIGNMENT Essay
ANSWER TO MODULE 5 ASSIGNMENT EssayANSWER TO MODULE 5 ASSIGNMENT Essay
ANSWER TO MODULE 5 ASSIGNMENT Essay
Tammy Moncrief
 
Psy intro to research methods reading
Psy intro to research methods readingPsy intro to research methods reading
Psy intro to research methods reading
Robert Matek
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
karlhennesey
 
General Psychology Interpret an instance of behavior (individual .docx
General Psychology Interpret an instance of behavior (individual .docxGeneral Psychology Interpret an instance of behavior (individual .docx
General Psychology Interpret an instance of behavior (individual .docx
lianaalbee2qly
 
Functional Braces And Its Effects On The Body
Functional Braces And Its Effects On The BodyFunctional Braces And Its Effects On The Body
Functional Braces And Its Effects On The Body
Deb Birch
 
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
1  A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd1  A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
VannaJoy20
 
The Relationship Between Body Image And The Media
The Relationship Between Body Image And The MediaThe Relationship Between Body Image And The Media
The Relationship Between Body Image And The Media
Jessica Myers
 
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docxSheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
lesleyryder69361
 

Similar a Modeling Social Data, Lecture 12: Causality & Experiments, Part 2 (20)

8 chapter eightpowerpoint
8 chapter eightpowerpoint8 chapter eightpowerpoint
8 chapter eightpowerpoint
 
ANSWER TO MODULE 5 ASSIGNMENT Essay
ANSWER TO MODULE 5 ASSIGNMENT EssayANSWER TO MODULE 5 ASSIGNMENT Essay
ANSWER TO MODULE 5 ASSIGNMENT Essay
 
Statistics basics
Statistics basicsStatistics basics
Statistics basics
 
Bradford Hill Criteria.ppt
Bradford Hill Criteria.pptBradford Hill Criteria.ppt
Bradford Hill Criteria.ppt
 
Lab Report 3
Lab Report 3Lab Report 3
Lab Report 3
 
Psy intro to research methods reading
Psy intro to research methods readingPsy intro to research methods reading
Psy intro to research methods reading
 
3 understanding and applying epidemiology hb lead-conf
3  understanding and applying epidemiology hb lead-conf3  understanding and applying epidemiology hb lead-conf
3 understanding and applying epidemiology hb lead-conf
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
Basics of Research and Bias
Basics of Research and BiasBasics of Research and Bias
Basics of Research and Bias
 
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes AbyanaPSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QN
 
General Psychology Interpret an instance of behavior (individual .docx
General Psychology Interpret an instance of behavior (individual .docxGeneral Psychology Interpret an instance of behavior (individual .docx
General Psychology Interpret an instance of behavior (individual .docx
 
Module7_RamdomError.pptx
Module7_RamdomError.pptxModule7_RamdomError.pptx
Module7_RamdomError.pptx
 
Module5_Study_Design.pptx
Module5_Study_Design.pptxModule5_Study_Design.pptx
Module5_Study_Design.pptx
 
Functional Braces And Its Effects On The Body
Functional Braces And Its Effects On The BodyFunctional Braces And Its Effects On The Body
Functional Braces And Its Effects On The Body
 
Study-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdfStudy-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdf
 
Lecture 2 Research Methods.pptx
Lecture 2 Research Methods.pptxLecture 2 Research Methods.pptx
Lecture 2 Research Methods.pptx
 
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
1  A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd1  A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
 
The Relationship Between Body Image And The Media
The Relationship Between Body Image And The MediaThe Relationship Between Body Image And The Media
The Relationship Between Body Image And The Media
 
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docxSheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
 

Más de jakehofman

NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
jakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
jakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
jakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
jakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
jakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
jakehofman
 

Más de jakehofman (20)

Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2

  • 1. Randomized & natural experiments MODELING SOCIAL DATA JAKE HOFMAN COLUMBIA UNIVERSITY
  • 2. Example: Hospitalization on health What’s wrong with estimating this model from observational data? Health tomorrow Hospital visit today Effect? Arrow means “X causes Y”
  • 3. Confounds The effect and cause might be confounded by a common cause, and be changing together as a result Health tomorrow Hospital visit today Effect? Health today Dashed circle means “unobserved”
  • 4. Confounds If we only get to observe them changing together, we can’t estimate the effect of hospitalization changing alone Health tomorrow Hospital visit today Effect? Health today
  • 5. “To find out what happens when you change something, it is necessary to change it.” -GEORGE BOX
  • 6. Hospital visit today Random assignment Random assignment determines the treatment independent of any confounds Health tomorrowEffect? Health today Coin flip Double lines mean “intervention”
  • 7. Counterfactuals To isolate the causal effect, we have to change one and only one thing (hospital visits), and compare outcomes + vs (what happened) Reality (what would have happened) Counterfactual
  • 8. Counterfactuals We never get to observe what would have happened if we did something else, so we have to estimate it + vs (what happened) Reality (what would have happened) Counterfactual
  • 9. Random assignment We can use randomization to create two groups that differ only in which treatment they receive, restoring symmetry + World 1 World 2 Heads Tails
  • 10. Random assignment We can use randomization to create two groups that differ only in which treatment they receive, restoring symmetry + World 1 World 2 Heads Tails
  • 11. Random assignment We can use randomization to create two groups that differ only in which treatment they receive, restoring symmetry + World 1 World 2
  • 13. Problems Random assignment is the “gold standard” for causal inference, but can be misleading under certain circumstances ◦ Small sample sizes ◦ Researcher degrees of freedom ◦ Publication bias ◦ P-hacking
  • 15.
  • 16. Electronic copy available at: https://ssrn.com/abstract=1850704 designed to demonstrate something false: that certain songs can change listeners’ age. Everything reported here actually happened.1 Study 1:musical contrast and subjective age In Study 1, we investigated whether listening to a children’s song induces an age contrast, making people feel older. In exchange for payment, 30 University of Pennsylvania under- graduates sat at computer terminals, donned headphones, and were randomly assigned to listen to either a control song (“Kalimba,” an instrumental song by Mr. Scruff that comes free with the Windows 7 operating system) or a children’s song (“Hot Potato,” performed by The Wiggles). After listening to part of the song, participants com- pleted an ostensibly unrelated survey: They answered the question “How old do you feel right now?” by choosing among five options (very young, young, neither young nor old, old, and very old). They also reported their father’s age, allowing us to control for variation in baseline age across participants. An analysis of covariance (ANCOVA) revealed the pre- dicted effect: People felt older after listening to “Hot Potato” tematic analysis of how researcher degrees of freedom influ- ence statistical significance. Impatient readers can consult Table 3. “How Bad Can It Be?” Simulations Simulationsof common researcher degreesof freedom We used computer simulations of experimental data to esti- mate how researcher degrees of freedom influence the proba- bility of a false-positive result. These simulations assessed the impact of four common degrees of freedom: flexibility in (a) choosing among dependent variables, (b) choosing sample size, (c) using covariates, and (d) reporting subsets of experi- mental conditions. We also investigated various combinations of these degrees of freedom. We generated random samples with each observation inde- pendently drawn from a normal distribution, performed sets of analyses on each sample, and observed how often at least one of the resulting p values in each sample was below standard significance levels. For example, imagine a researcher who collects two dependent variables, say liking and willingness to by guest on November 20, 2011pss.sagepub.comDownloaded from 1360 Simmonset al. of ambiguous information and remarkably adept at reaching justifiable conclusions that mesh with their desires (Babcock & Loewenstein, 1997; Dawson, Gilovich, & Regan, 2002; Gilovich, 1983; Hastorf & Cantril, 1954; Kunda, 1990; Zuck- erman, 1979). This literature suggests that when we as researchers face ambiguous analytic decisions, we will tend to conclude, with convincing self-justification, that the appropri- ate decisions are those that result in statistical significance (p ≤ .05). Ambiguity is rampant in empirical research. As an exam- ple, consider a very simple decision faced by researchers ana- lyzing reaction times: how to treat outliers. In a perusal of roughly 30 Psychological Science articles, we discovered con- siderable inconsistency in, and hence considerable ambiguity about, this decision. Most (but not all) researchers excluded some responses for being too fast, but what constituted “too fast” varied enormously: the fastest 2.5%, or faster than 2 stan- dard deviations from the mean, or faster than 100 or 150 or 200 or 300 ms. Similarly, what constituted “too slow” varied enormously: the slowest 2.5% or 10%, or 2 or 2.5 or 3 stan- dard deviations slower than the mean, or 1.5 standard devia- tions slower from that condition’s mean, or slower than 1,000 or 1,200 or 1,500 or 2,000 or 3,000 or 5,000 ms. None of these decisions is necessarily incorrect, but that fact makes any of them justifiable and hence potential fodder for self-serving (adjusted M = 2.54 years) than after listening to the control song (adjusted M = 2.06 years), F(1, 27) = 5.06, p = .033. In Study 2, we sought to conceptually replicate and extend Study 1. Having demonstrated that listening to a children’s song makes people feel older, Study 2 investigated whether listening to a song about older age makes people actually younger. Study 2:musical contrast and chronological rejuvenation Using the same method as in Study 1, we asked 20 University of Pennsylvania undergraduates to listen to either “When I’m Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensi- bly unrelated task, they indicated their birth date (mm/dd/ yyyy) and their father’s age. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040. Discussion
  • 17. Psychological Science 22(11) 1359–1366 © TheAuthor(s) 2011 Reprintsand permission: sagepub.com/journalsPermissions.nav DOI:10.1177/0956797611417632 http://pss.sagepub.com False-Positive Psychology:Undisclosed Flexibility in Data Collection and Analysis AllowsPresentingAnything asSignificant Joseph P. Simmons1 , Leif D. Nelson2 , and Uri Simonsohn1 1 TheWharton School, University of Pennsylvania, and 2 Haas School of Business, University of California,Berkeley Abstract In thisarticle,weaccomplish two things.First,weshow that despite empirical psychologists’ nominal endorsement of alow rate of false-positive findings (£ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers,all of which impose aminimal bur den on the publication process. Keywords General Article
  • 18. False-Positive Psychology 1361 Table 1. Likelihood of ObtainingaFalse-Positive Result Significance level Researcher degrees of freedom p < .1 p < .05 p < .01 Situation A:two dependent variables (r = .50) 17.8% 9.5% 2.2% Situation B: addition of 10 more observations per cell 14.5% 7.7% 1.6% Situation C: controllingfor gender or interaction of gender with treatment 21.6% 11.7% 2.7% Situation D: dropping(or not dropping) one of three conditions 23.2% 12.6% 2.8% Combine Situations A and B 26.0% 14.4% 3.3% Combine Situations A,B,and C 50.9% 30.9% 8.4% Combine Situations A,B,C,and D 81.5% 60.7% 21.5% Note: The table reportsthe percentage of 15,000 simulated samplesin which at least one of a set of analyseswassignificant. Observationswere drawn independently from anormal distribu- tion.Baseline isatwo-condition design with 20 observationsper cell.Resultsfor SituationA were obtained by conductingthree t tests,one on each of two dependent variablesand athird on the average of these two variables.Resultsfor Situation B were obtained by conductingone t test after collecting20 observationsper cell and another after collectingan additional 10 observationsper cell.Resultsfor Situation C were obtained by conductinga t test,an analysisof covariance with a
  • 19.
  • 20. Caveats / limitations Random assignment is the “gold standard” for causal inference, but it has some limitations: ◦ Randomization often isn’t feasible and/or ethical ◦ Experiments are costly in terms of time and money ◦ It’s difficult to create convincing parallel worlds ◦ Inevitably people deviate from their random assignments Anyone can flip a coin, but it’s difficult to create convincing parallel worlds
  • 21.
  • 22.
  • 23.
  • 25. Natural experiments Sometimes we get lucky and nature effectively runs experiments for us, e.g.: ◦ As-if random: People are randomly exposed to water sources ◦ Instrumental variables: A lottery influences military service ◦ Discontinuities: Star ratings get arbitrarily rounded ◦ Difference in differences: Minimum wage changes in just one state
  • 26. Natural experiments Sometimes we get lucky and nature effectively runs experiments for us, e.g.: ◦ As-if random: People are randomly exposed to water sources ◦ Instrumental variables: A lottery influences military service ◦ Discontinuities: Star ratings get arbitrarily rounded ◦ Difference in differences: Minimum wage changes in just one state Experiments happen all the time, we just have to notice them
  • 27. As-if random Idea: Nature randomly assigns conditions Example: People are randomly exposed to water sources (Snow, 1854) http://bit.ly/johnsnowmap
  • 28. Instrumental variables Idea: An instrument independently shifts the distribution of a treatment Example: A lottery influences military service (Angrist, 1990) Military service Future earningsEffect? Confounds Lottery
  • 30. Figure 4: Average Revenue around Discontinuous Changes in Rating Notes: Each restaurant’s log revenue is de-meaned to normalize a restaurant’s average log revenue to zero. Normalized log revenues are then averaged within bins based on how far the restaurant’s rating is from a rounding threshold in that quarter. The graph plots average log revenue as a function of how far the rating is from a rounding threshold. All points with a positive (negative) distance from a discontinuity are rounded up (down). Regression discontinuities Idea: Things change around an arbitrarily chosen threshold Example: Star ratings get arbitrarily rounded (Luca, 2011) http://bit.ly/yelpstars
  • 31. Difference in differences Idea: Compare differences after a sudden change with trends in a control group Example: Minimum wage changes in just one state (Card & Krueger, 1994) http://stats.stackexchange.com/a/125266
  • 32. Natural experiments: Caveats Natural experiments are great, but: ◦ Good natural experiments are hard to find ◦ They rely on many (untestable) assumptions ◦ The treated population may not be the one of interest
  • 33. Natural experiments: Caveats Natural experiments are great, but: ◦ Good natural experiments are hard to find ◦ They rely on many (untestable) assumptions ◦ The treated population may not be the one of interest Sometimes we can use additional data + algorithms to automatically find natural experiments
  • 35. Example: How much traffic to recommender systems cause? (Sharma, Hofman & Watts, 2015)
  • 36. Observational estimates Naively, up to 30% of pageviews come through recommendations
  • 37. Observational estimates Naively, up to 30% of pageviews come through recommendations
  • 38. Observational estimates Naively, up to 30% of pageviews come through recommendations But this is almost surely an overestimate of the effect
  • 48. Confound: Correlated demand Some views would have happened anyway due to correlated demand We call these convenience clicks Direct views of hats Referred click- throughs to gloves Effect? Demand for winter items
  • 49. Ideally we would run an A/B test where randomly selected people see recommendations Ideal experiment World 1 World 2
  • 50. Natural experiment Instead, we can exploits sudden shocks in traffic to focal products as an instrumental variable Direct views of focal product Referred click- throughsEffect? Demand External shock
  • 51. Usual approach Think hard for a source of random variation that only directly affects focal product (e.g., author wins award)
  • 52. Usual approach Think hard for a source of random variation that only directly affects focal product (e.g., author wins award) Problem: Impossible to rule out side effects
  • 53. New approach: Automatically discovering natural experiments Look for products that receive shocks in direct traffic, while their recommendations do not
  • 54. New approach: Automatically discovering natural experiments Look for products that receive shocks in direct traffic, while their recommendations do not
  • 55. New approach: Automatically discovering natural experiments Applying this method to 23 million pageviews in the Bing Toolbar logs, we find over 4,000 such natural experiments
  • 56. New approach: Automatically discovering natural experiments Causal click-through rate is just the marginal gain in clicks during the shock Causal effect = Change in recommendation clicks / Size of shock
  • 57. Causal click-through rate is just the marginal gain in clicks during the shock Causal effect = Change in rec clicks / Size of shock Results
  • 58. Although recommendation click- throughs account for a large fraction of traffic, at least 75% of this activity would likely occur in the absence of recommendations* Results *With lots of caveats
  • 59. Closing thoughts Large-scale observational data is useful for building predictive models of a static world
  • 60. Closing thoughts But without appropriate random variation, it’s hard to predict what happens when you change something in the world
  • 61. Closing thoughts Randomized experiments are like custom-made datasets to answer a specific question
  • 62. Closing thoughts Additional data + algorithms can help us discover and analyze these examples in the wild

Notas del editor

  1. What we’d like to do is to make a copy of the world