SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Respondent Driven Sampling &
Network Sampling with Memory
(time permitting…)
M. Giovanna Merli
Sanford School of Public Policy &
Duke Population Research Institute (DUPRI)
Duke University
Funding Acknowledgements
• RDS Data Collection in China (2009-2010)
– “Place-RDS Comparison Study”
• USAID under the terms of cooperative agreements GPO-A-00-03-00003-00 and
GPO-A-00-09-00003-0 (Weir, PI)
• China National Center for STD Control (Chen, PI)
• Duke CFAR AI064518 (Merli, PI)
– “Partnership for Social Science Research on HIV/AIDS in China”
• NICHD R24 HD056670 (Henderson, PI)
• RDS Data Analyses and Simulations (2011-2015)
– “Using Multiple Data Sources to Improve RDS Estimation”
• NICHD R01HD068523 (Merli, PI)
• NSM Data Collection in Tanzania
– PFirst Award/DGHI (Merli, PI)
2
Problems with the study of hidden
populations
Female sex workers, men who have sex with men, injecting drug users,
homeless, undocumented migrants are hidden populations
For these populations we typically want to:
• Obtain accurate and precise estimates of disease prevalence
• Discern impact on larger population health dynamics
• Identify gaps in HIV/STD prevention
Collecting data from hidden populations to infer population representation is
difficult because of the absence of a sampling frame – their members are hard
to identify
– Stigma
– Non response
– Lack of trust
– Rarity
3
Problems with the study of hidden
populations
• Convenience samples, clinic-based inquiries,
and sampling frames with limited coverage
(e.g. venue based sampling) lack basis for
inferring representation
4
Respondent Driven Sampling (RDS)
Heckathorn 1997, 2002; Salganik and Heckathorn 2004;
Volz and Heckathorn 2008
• Most popular solution to
problems of sampling
hidden populations
– 450+ studies
– 624+ papers, 10k+ citations
– Over $185 million from NIH
• Compare to “ego centric”
– 167 studies funded
– $42 million since 1990
5
How RDS works
• RDS primarily used to estimate population proportions of binary
nodal covariates (e.g. gender, infection status, tier of sex work, etc.)
• Leverages social network of respondents to recruit other
respondents
• Chain referral / peer recruitment / link tracing sampling strategy
– “Seed” participants (selected by convenience) receive coupons (2)
– Recruit 2-3 new participants each
– Each new respondent given 2-3 coupons to recruit others
– Recruitment incentives for participating and for successful recruitment
– No one participates more than once
– Process continues until desired sample size is obtained
6
How RDS works
7
10
How RDS works
14
Problems with estimation in link tracing
sampling designs of hidden populations
• Sampling frame
unavailable
• Sample inclusion
probabilities are not
known (hence sampling
weights unknown)
• Researchers have limited
control of the sampling
process
• Seed respondents not
chosen at random
RDS solution
• Sampling probabilities computed under an approximation of
the true sampling process
– RDS assumes non-seed participants are Sampled with Probability
Proportional to self-reported degree – (SPPD)
– Provable in a random walk on most graphs of interest
– Sampling probabilities approximated by degree, hence sampling
weight = 1/degree
• Weighting/estimation can yield asymptotically unbiased
estimates of the population mean
• SPPD assumption underpins much of RDS estimation claims
16
RDS estimators
Estimator Proportion Equation Notes
Naïve 𝑝 = 𝑖𝜖𝜒 𝑥𝑖 𝑛 −1 𝑥𝑖 is the value of the focal
variable for respondent 𝑖; 𝑛 is the
sample size
RDS1-SH
𝑝 = 𝑆0,1 𝑑0 𝑆0,1 𝑑0 + 𝑆1,0 𝑑1
−1 𝑆 𝑎,𝑏 is the estimated proportion of
recruitments from group 𝑎 to 𝑏;
𝑑 𝑎is the estimated average degree
in each group
(Salganik and Heckathorn 2004)
RDS1-LEN
𝑝 = 𝑆0,1
𝑒𝑔𝑜
𝑑0 𝑆0,1
𝑒𝑔𝑜
𝑑0 + 𝑆1,0
𝑒𝑔𝑜
𝑑1
−1 𝑆 𝑎,𝑏
𝑒𝑔𝑜
is the estimated proportion
of network ties from group 𝑎 to 𝑏
based on ego network reports
(Lu 2013)
RDS2-VH 𝒑 = 𝒊∈𝝌 𝒙𝒊 𝒅𝒊
−𝟏
𝒊∈𝝌 𝒅𝒊
−𝟏 −𝟏 𝒅𝒊
−𝟏
is the inverse of self-
reported degree for person 𝒊
(Volz and Heckathorn 2008)
17
In RDS, all approximations are subject to critical
assumptions that are often not met in the field
• About the unobserved sample recruitment process (most crucial)
– Respondent gives a coupon to a friend
– Respondents recruit new participants non-preferentially from amongst their
social contacts (each friend has an equal chance of being picked)
– The initial set of respondents (“seeds”) are drawn with random probabilities
– Respondents report their number of ties accurately (how many people you
know that are members of the population of interest?)
• About the social network structure
– Rapid mixing: The chain referral process converges very quickly to the
stationary distribution of a random walk (i.e. node selection probabilities are
independent of sample starting point)
– Connectedness: The target population must be connected by a network that
consists of a single component
– Network size: Network must be sufficiently large (sampling fraction small) that
sampling without replacement can be treated as if it is equivalent to sampling
with replacement
18
Prior evaluations of RDS
• Comparison of RDS estimates to known parameters of non-
hidden populations
– (Wejnert 2009; Wejnert & Heckathorn 2008; McCreesh et al. 2012)
• Test effects of violating RDS assumptions about social
network structure on synthetic populations
– (Gile & Handcock 2010; Thomas & Gile 2011; Lu et al. 2011)
• Examine effects of network structure in multiple empirical
settings with theoretical/ideal RDS samples
– (Goel & Salganik 2010; Mouw & Verdery 2012; Verdery , Mouw et al. 2015)
• Use full information on participants’ recruitment behavior to
evaluate non-preferential recruitment assumption
– (Yamanis, Merli, Neely et al. Sociological Methods and Research 2013)
19
RDS evaluation in the context of
Female Sex Workers in Liuzhou, China
• Evaluate SPPD assumption and
population coverage (Merli, Moody, Smith et
al., 2015 Social Science and Medicine)
• Evaluate performance of RDS
estimators (Verdery, Merli, Moody et al., 2015
Epidemiology)
• Propose RDS data collection
innovation to improve estimator
performance (Verdery, Merli, Moody, In
Progress)
• Evaluations with a simulation
approach grounded in empirical data
from a hidden population of FSWs in
China (Liuzhou, Guangxi Province)
(Weir, Merli, Li et al. 2012, Sexually Transmitted
Infections)
20
Data
• Two sources
– RDS: 583 FSWs (Oct. 2009 – Feb. 2010) (about 8% of total
FSW population in Liuzhou)
– PLACE (venue based sampling approach): 161 FSWs (Nov.
2009 – Mar. 2010)
• Same target population and inclusion definition
– Women who reside in Liuzhou who exchanged sex for money in last 4 weeks
• Same geographic area and similar time period
• Same measurement of key variables
– Test for biomarker of lifetime exposure to syphilis and core questionnaire
• Same face-to-face interview and common applicant pool for interviewers
• Rare to have two concurrent surveys in same population!
21
Description of the Liuzhou RDS sample
Tier
of sex
work
Venues where clients are
solicited
RDS
(N = 576)
High Karaoke bars, star hotels, discos,
night clubs
250
Middle Hair salons, saunas, massage
parlors, foot cleaning/massage,
bathhouses
268
Low Streets, parks, other public spaces 27
Non-
venue
based
Telephone, text, internet,
private referrals
31
22
Fisher and Merli 2014, Network Science.
Approach, part 1
• Construct “population social network” from data
collected in RDS and PLACE
– Used new methodologies for estimating social network
parameters and simulating population network
• Use Case Control Logistic Regression to estimate homophily
parameters from the RDS data (Smith, SM 2012)
• Use Exponential Random Graph Modeling to generate full
network from local structural features (ERGM; Handcock et al., JOSS 2008)
– Tested various sensitivities about the means by which
this population social network is constructed
• (which data source, venue size estimates, and assumptions
about geographic distribution of social network ties)
23
“Population social network”
Generate “population characteristics”
based on PLACE survey estimates
Add “population social network”
based on RDS survey estimates
24
Approach, part 2
• Simulate RDS chains over “population social
network” (1000 per recruitment scenario)
– Scenarios vary according to different sample
recruitment assumptions
• Seeding of the chain
• Recruitment patterns
– How much does the ideal case (random seeding
and random recruitment) diverge from actual RDS
seeding and recruitment matched to the Liuzhou
FSW data?
25
Results:
Violation of SPPD assumption
• Compared individual degree to
the proportion of times an
individual was sampled across
the simulated chains
– Very high correlation when
seeds and referrals are random
– SSPD assumption increasingly
violated when seeds & referrals
are matched to the actual data
– Over-recruitment of middle tier
sex workers drives the result
• For more:
– Merli, Moody, Smith et al.,
Social Science & Medicine,
2015
26
r=0.82 r=0.96 r=0.97
Merli, Moody, Smith et al., SSM, 2015
Distribution of RDS2-VH proportion estimates
(low/middle tier) across seeding and recruitment
scenarios
27
Verdery, Merli, Moody et al. 2015, Epidemiology
Variability of estimates: Design effects
(ratio of variance in RDS estimates to variance in estimates from same size SRS)
• DE very large, but not out of line with findings of prior work (Goel
and Salganik 2010)
• Large Design Effects imply that much larger sample sizes would
be required to reach level of precision currently assumed from
RDS samples typically in the hundreds
• CDC recommends RDS sample sizes in the hundreds for public
health surveillance – IMPLICATIONS: Not sufficient power to
identify changes in behaviors or disease prevalence
28
DemDem DemRan RanRan
Middle Tier 6.18 19.60 28.20
Discussion
• Seeding and recruitment scenarios
– Matching on seeds not critical
– Matching on recruitment patterns has a larger
effect, exacerbates biases but reduces design
effects
• Problematic because seems harder to control than seed
matching
29
Estimator performance
• Estimator development
– Only one (RDS1-LEN) works
markedly better than
others
• Robust to preferential
recruitment by taking into
account respondents’ ego-
network composition
– BUT unusable for most
(unobservable)
characteristics we care
about
– Still problems with variance
estimation
30
Verdery, Merli, Moody et al. 2015, Epidemiology
Distributions of estimates of proportions in low
tiers of sex work by estimator (recruitment and
seeds matched to the Liuzhou FSW data)
Recent innovation: IP-RDS
(Verdery, Merli, Moody, In Progress)
• What can be done to improve the performance of RDS
estimates while retaining the method’s desirable peer-
driven sample recruitment properties?
• Modify RDS data collection process
• Apply antithetic variate mean estimator to data
• Results from simulations: Improved estimation
performance
31
New data collection protocol
IP-RDS
• Incentivize respondents to invert their
preferences when choosing new respondents,
i.e. respondents are asked to invert their
recruitment preferences on the recruitment
biasing variable (e.g. tier of sex work)
32
“Random recruitment”
33
A
B
C
D
3/9
3/9
3/9
“Preferential recruitment”
34
A
B
C
D
4/9
4/9
1/9
“Inverse-Preferential recruitment”
35
A
B
C
D
2/9
2/9
5/9
Antithetic variate mean estimator
• 𝜇 𝐴𝑉 = 𝑖∈𝑚1 𝑦 𝑖
2
+ 𝑖∈𝑚2 𝑦 𝑖
2
, where
yi is the value of the focal variable for the i
respondent
m1 is the count of recruitments by members of
one group of the recruitment biasing variable
(e.g. tier of sex work), and m2 is the count of
recruitments by members of the other group
36
Distributions of estimates of proportions in low/mid tiers of sex work
by estimator (naïve mean, RDS2-VH, AV-IP_RDS) and level of biased
recruitment behavior (absolute difference in recruitment probabilities
conditional on attribute of targeted peer)
37
Discussion of IP-RDS
• Simple change to RDS protocol
– May or may not require financial incentives for
targeted recruitment (empirical question)
• Outperforms conventional estimators
– Gains in bias reduction comparable to RDS1-LEN
estimator
• Tested on more networks (similar results)
• BUT …Not yet field tested
38
Network Sampling with Memory
• Mouw and Verdery 2012, Sociological
Methodology
• Collects network data
• Introduces researcher’s control over the
sampling process
• Directs the recruitment process to more
efficiently explore the network (avoiding
bottlenecks)
How does NSM work?
• Recruitment starts with a few seed respondents
• Network roster data collected from respondents about
minimally identifying information of their network members
(last name and last four digits of cell phone number) to
connect nodes in the network (up to 10 network members per
respondent)
• NSM sampling algorithm selects up to 3 nominated network
members per respondent and asks respondents for full contact
information on these
• Process proceeds iteratively to recruit new waves of
respondents
Network data collection
How does NSM work?
• NSM sampling algorithm uses two sampling
modes, List and Search
• List mode
– keeps a list, L, of all nominated network members
– samples with replacement from L
– even sampling of new nodes -- new nodes sampled at
the same cumulative sampling rate as earlier nodes
– as list of sampled nodes approaches the full population
network, NSM sample converges to simple random
sampling
How does NSM work?
• Search mode—look for “bridge” nodes to
unexplored parts of the network. Start in
search mode, then switch to list mode.
Simulation results
• Test NSM vs. RDS using 162 university and School
networks from Facebook and Add Health
• Size of networks ranges from 300 to 16,500 nodes
• Estimate % white (Add Health) and % first year students
(Facebook)
• Start from a randomly selected student, repeat 500
times for each network
• Calculate bias, design effects and mean absolute bias
• Test (162 networks) DE is 1.16 for NSM vs 77.38 for RDS
Is it feasible?
• Is it feasible to collect network data on hidden
populations?
• 2010 NSIT (Network Survey of Immigration and
Transnationalism) (Mouw, PI)
• CAHS (Chinese in Africa Health Survey) (Merli, PI)
• Cost effectiveness of gains in precision
NSM field applications
Network Survey of Immigration and
Transnationalism (NSIT)
Mouw et al. 2014. Social Problems;
Verdery et al. 2016. Social Networks
Chinese in Africa Health Survey (CAHS)
Merli, Verdery, Mouw, Li 2016. Migration Studies
46
Red: RDU
Blue: Mexico
Green: Houston
Small: Nominated
Large: Sampled
Network of Chinese migrants in Dar es Salaam
sampled by NSM, size = probability of selecting
next node
Key challenge: Getting referrals from
respondents
• NSIT required recontacting respondents to get
contact information on alters
• CAHS -- “forward” sampling variant (FNSM)—
more practical
– Asked for contact information on a small number
of alters at each interview (selected by NSM
algorithm)
NSM -- Future directions
• NIH R21 grant to test NSM among Chinese
immigrants in RDU (Merli, Mouw, Verdery,
Moody, Keister, Sanders)
– Pilot various approaches to get referrals from
respondents
– Evaluate NSM against ACS
– Test multiple modes of data collection (in-person,
telephone, web)
48

Más contenido relacionado

Similar a 09 Respondent Driven Sampling and Network Sampling with Memory (2016)

A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
soder145
 
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research  Abeer AlNajjar.docxSampling for Quantities & Qualitative Research  Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
anhlodge
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
Stephanie Eckman
 
Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2
Jean-Paul Grund
 

Similar a 09 Respondent Driven Sampling and Network Sampling with Memory (2016) (20)

AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
 
00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing
 
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideDay 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
 
Topic_4_Survey.pdf
Topic_4_Survey.pdfTopic_4_Survey.pdf
Topic_4_Survey.pdf
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research  Abeer AlNajjar.docxSampling for Quantities & Qualitative Research  Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
 
Statistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasetsStatistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasets
 
Understanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related MetadataUnderstanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related Metadata
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
 
2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
 
Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 

Más de Duke Network Analysis Center

Más de Duke Network Analysis Center (20)

01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
 
24 The Evolution of Network Thinking
24 The Evolution of Network Thinking24 The Evolution of Network Thinking
24 The Evolution of Network Thinking
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
 
20 Network Experiments
20 Network Experiments20 Network Experiments
20 Network Experiments
 
19 Electronic Medical Records
19 Electronic Medical Records19 Electronic Medical Records
19 Electronic Medical Records
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
15 Network Visualization and Communities
15 Network Visualization and Communities15 Network Visualization and Communities
15 Network Visualization and Communities
 
13 Community Detection
13 Community Detection13 Community Detection
13 Community Detection
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
 
07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics
 
04 Network Data Collection
04 Network Data Collection04 Network Data Collection
04 Network Data Collection
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function
 
00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization
 
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC
 
11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence 11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence
 

Último

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Último (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

09 Respondent Driven Sampling and Network Sampling with Memory (2016)

  • 1. Respondent Driven Sampling & Network Sampling with Memory (time permitting…) M. Giovanna Merli Sanford School of Public Policy & Duke Population Research Institute (DUPRI) Duke University
  • 2. Funding Acknowledgements • RDS Data Collection in China (2009-2010) – “Place-RDS Comparison Study” • USAID under the terms of cooperative agreements GPO-A-00-03-00003-00 and GPO-A-00-09-00003-0 (Weir, PI) • China National Center for STD Control (Chen, PI) • Duke CFAR AI064518 (Merli, PI) – “Partnership for Social Science Research on HIV/AIDS in China” • NICHD R24 HD056670 (Henderson, PI) • RDS Data Analyses and Simulations (2011-2015) – “Using Multiple Data Sources to Improve RDS Estimation” • NICHD R01HD068523 (Merli, PI) • NSM Data Collection in Tanzania – PFirst Award/DGHI (Merli, PI) 2
  • 3. Problems with the study of hidden populations Female sex workers, men who have sex with men, injecting drug users, homeless, undocumented migrants are hidden populations For these populations we typically want to: • Obtain accurate and precise estimates of disease prevalence • Discern impact on larger population health dynamics • Identify gaps in HIV/STD prevention Collecting data from hidden populations to infer population representation is difficult because of the absence of a sampling frame – their members are hard to identify – Stigma – Non response – Lack of trust – Rarity 3
  • 4. Problems with the study of hidden populations • Convenience samples, clinic-based inquiries, and sampling frames with limited coverage (e.g. venue based sampling) lack basis for inferring representation 4
  • 5. Respondent Driven Sampling (RDS) Heckathorn 1997, 2002; Salganik and Heckathorn 2004; Volz and Heckathorn 2008 • Most popular solution to problems of sampling hidden populations – 450+ studies – 624+ papers, 10k+ citations – Over $185 million from NIH • Compare to “ego centric” – 167 studies funded – $42 million since 1990 5
  • 6. How RDS works • RDS primarily used to estimate population proportions of binary nodal covariates (e.g. gender, infection status, tier of sex work, etc.) • Leverages social network of respondents to recruit other respondents • Chain referral / peer recruitment / link tracing sampling strategy – “Seed” participants (selected by convenience) receive coupons (2) – Recruit 2-3 new participants each – Each new respondent given 2-3 coupons to recruit others – Recruitment incentives for participating and for successful recruitment – No one participates more than once – Process continues until desired sample size is obtained 6
  • 8.
  • 9.
  • 10. 10
  • 11.
  • 12.
  • 13.
  • 15. Problems with estimation in link tracing sampling designs of hidden populations • Sampling frame unavailable • Sample inclusion probabilities are not known (hence sampling weights unknown) • Researchers have limited control of the sampling process • Seed respondents not chosen at random
  • 16. RDS solution • Sampling probabilities computed under an approximation of the true sampling process – RDS assumes non-seed participants are Sampled with Probability Proportional to self-reported degree – (SPPD) – Provable in a random walk on most graphs of interest – Sampling probabilities approximated by degree, hence sampling weight = 1/degree • Weighting/estimation can yield asymptotically unbiased estimates of the population mean • SPPD assumption underpins much of RDS estimation claims 16
  • 17. RDS estimators Estimator Proportion Equation Notes Naïve 𝑝 = 𝑖𝜖𝜒 𝑥𝑖 𝑛 −1 𝑥𝑖 is the value of the focal variable for respondent 𝑖; 𝑛 is the sample size RDS1-SH 𝑝 = 𝑆0,1 𝑑0 𝑆0,1 𝑑0 + 𝑆1,0 𝑑1 −1 𝑆 𝑎,𝑏 is the estimated proportion of recruitments from group 𝑎 to 𝑏; 𝑑 𝑎is the estimated average degree in each group (Salganik and Heckathorn 2004) RDS1-LEN 𝑝 = 𝑆0,1 𝑒𝑔𝑜 𝑑0 𝑆0,1 𝑒𝑔𝑜 𝑑0 + 𝑆1,0 𝑒𝑔𝑜 𝑑1 −1 𝑆 𝑎,𝑏 𝑒𝑔𝑜 is the estimated proportion of network ties from group 𝑎 to 𝑏 based on ego network reports (Lu 2013) RDS2-VH 𝒑 = 𝒊∈𝝌 𝒙𝒊 𝒅𝒊 −𝟏 𝒊∈𝝌 𝒅𝒊 −𝟏 −𝟏 𝒅𝒊 −𝟏 is the inverse of self- reported degree for person 𝒊 (Volz and Heckathorn 2008) 17
  • 18. In RDS, all approximations are subject to critical assumptions that are often not met in the field • About the unobserved sample recruitment process (most crucial) – Respondent gives a coupon to a friend – Respondents recruit new participants non-preferentially from amongst their social contacts (each friend has an equal chance of being picked) – The initial set of respondents (“seeds”) are drawn with random probabilities – Respondents report their number of ties accurately (how many people you know that are members of the population of interest?) • About the social network structure – Rapid mixing: The chain referral process converges very quickly to the stationary distribution of a random walk (i.e. node selection probabilities are independent of sample starting point) – Connectedness: The target population must be connected by a network that consists of a single component – Network size: Network must be sufficiently large (sampling fraction small) that sampling without replacement can be treated as if it is equivalent to sampling with replacement 18
  • 19. Prior evaluations of RDS • Comparison of RDS estimates to known parameters of non- hidden populations – (Wejnert 2009; Wejnert & Heckathorn 2008; McCreesh et al. 2012) • Test effects of violating RDS assumptions about social network structure on synthetic populations – (Gile & Handcock 2010; Thomas & Gile 2011; Lu et al. 2011) • Examine effects of network structure in multiple empirical settings with theoretical/ideal RDS samples – (Goel & Salganik 2010; Mouw & Verdery 2012; Verdery , Mouw et al. 2015) • Use full information on participants’ recruitment behavior to evaluate non-preferential recruitment assumption – (Yamanis, Merli, Neely et al. Sociological Methods and Research 2013) 19
  • 20. RDS evaluation in the context of Female Sex Workers in Liuzhou, China • Evaluate SPPD assumption and population coverage (Merli, Moody, Smith et al., 2015 Social Science and Medicine) • Evaluate performance of RDS estimators (Verdery, Merli, Moody et al., 2015 Epidemiology) • Propose RDS data collection innovation to improve estimator performance (Verdery, Merli, Moody, In Progress) • Evaluations with a simulation approach grounded in empirical data from a hidden population of FSWs in China (Liuzhou, Guangxi Province) (Weir, Merli, Li et al. 2012, Sexually Transmitted Infections) 20
  • 21. Data • Two sources – RDS: 583 FSWs (Oct. 2009 – Feb. 2010) (about 8% of total FSW population in Liuzhou) – PLACE (venue based sampling approach): 161 FSWs (Nov. 2009 – Mar. 2010) • Same target population and inclusion definition – Women who reside in Liuzhou who exchanged sex for money in last 4 weeks • Same geographic area and similar time period • Same measurement of key variables – Test for biomarker of lifetime exposure to syphilis and core questionnaire • Same face-to-face interview and common applicant pool for interviewers • Rare to have two concurrent surveys in same population! 21
  • 22. Description of the Liuzhou RDS sample Tier of sex work Venues where clients are solicited RDS (N = 576) High Karaoke bars, star hotels, discos, night clubs 250 Middle Hair salons, saunas, massage parlors, foot cleaning/massage, bathhouses 268 Low Streets, parks, other public spaces 27 Non- venue based Telephone, text, internet, private referrals 31 22 Fisher and Merli 2014, Network Science.
  • 23. Approach, part 1 • Construct “population social network” from data collected in RDS and PLACE – Used new methodologies for estimating social network parameters and simulating population network • Use Case Control Logistic Regression to estimate homophily parameters from the RDS data (Smith, SM 2012) • Use Exponential Random Graph Modeling to generate full network from local structural features (ERGM; Handcock et al., JOSS 2008) – Tested various sensitivities about the means by which this population social network is constructed • (which data source, venue size estimates, and assumptions about geographic distribution of social network ties) 23
  • 24. “Population social network” Generate “population characteristics” based on PLACE survey estimates Add “population social network” based on RDS survey estimates 24
  • 25. Approach, part 2 • Simulate RDS chains over “population social network” (1000 per recruitment scenario) – Scenarios vary according to different sample recruitment assumptions • Seeding of the chain • Recruitment patterns – How much does the ideal case (random seeding and random recruitment) diverge from actual RDS seeding and recruitment matched to the Liuzhou FSW data? 25
  • 26. Results: Violation of SPPD assumption • Compared individual degree to the proportion of times an individual was sampled across the simulated chains – Very high correlation when seeds and referrals are random – SSPD assumption increasingly violated when seeds & referrals are matched to the actual data – Over-recruitment of middle tier sex workers drives the result • For more: – Merli, Moody, Smith et al., Social Science & Medicine, 2015 26 r=0.82 r=0.96 r=0.97 Merli, Moody, Smith et al., SSM, 2015
  • 27. Distribution of RDS2-VH proportion estimates (low/middle tier) across seeding and recruitment scenarios 27 Verdery, Merli, Moody et al. 2015, Epidemiology
  • 28. Variability of estimates: Design effects (ratio of variance in RDS estimates to variance in estimates from same size SRS) • DE very large, but not out of line with findings of prior work (Goel and Salganik 2010) • Large Design Effects imply that much larger sample sizes would be required to reach level of precision currently assumed from RDS samples typically in the hundreds • CDC recommends RDS sample sizes in the hundreds for public health surveillance – IMPLICATIONS: Not sufficient power to identify changes in behaviors or disease prevalence 28 DemDem DemRan RanRan Middle Tier 6.18 19.60 28.20
  • 29. Discussion • Seeding and recruitment scenarios – Matching on seeds not critical – Matching on recruitment patterns has a larger effect, exacerbates biases but reduces design effects • Problematic because seems harder to control than seed matching 29
  • 30. Estimator performance • Estimator development – Only one (RDS1-LEN) works markedly better than others • Robust to preferential recruitment by taking into account respondents’ ego- network composition – BUT unusable for most (unobservable) characteristics we care about – Still problems with variance estimation 30 Verdery, Merli, Moody et al. 2015, Epidemiology Distributions of estimates of proportions in low tiers of sex work by estimator (recruitment and seeds matched to the Liuzhou FSW data)
  • 31. Recent innovation: IP-RDS (Verdery, Merli, Moody, In Progress) • What can be done to improve the performance of RDS estimates while retaining the method’s desirable peer- driven sample recruitment properties? • Modify RDS data collection process • Apply antithetic variate mean estimator to data • Results from simulations: Improved estimation performance 31
  • 32. New data collection protocol IP-RDS • Incentivize respondents to invert their preferences when choosing new respondents, i.e. respondents are asked to invert their recruitment preferences on the recruitment biasing variable (e.g. tier of sex work) 32
  • 36. Antithetic variate mean estimator • 𝜇 𝐴𝑉 = 𝑖∈𝑚1 𝑦 𝑖 2 + 𝑖∈𝑚2 𝑦 𝑖 2 , where yi is the value of the focal variable for the i respondent m1 is the count of recruitments by members of one group of the recruitment biasing variable (e.g. tier of sex work), and m2 is the count of recruitments by members of the other group 36
  • 37. Distributions of estimates of proportions in low/mid tiers of sex work by estimator (naïve mean, RDS2-VH, AV-IP_RDS) and level of biased recruitment behavior (absolute difference in recruitment probabilities conditional on attribute of targeted peer) 37
  • 38. Discussion of IP-RDS • Simple change to RDS protocol – May or may not require financial incentives for targeted recruitment (empirical question) • Outperforms conventional estimators – Gains in bias reduction comparable to RDS1-LEN estimator • Tested on more networks (similar results) • BUT …Not yet field tested 38
  • 39. Network Sampling with Memory • Mouw and Verdery 2012, Sociological Methodology • Collects network data • Introduces researcher’s control over the sampling process • Directs the recruitment process to more efficiently explore the network (avoiding bottlenecks)
  • 40. How does NSM work? • Recruitment starts with a few seed respondents • Network roster data collected from respondents about minimally identifying information of their network members (last name and last four digits of cell phone number) to connect nodes in the network (up to 10 network members per respondent) • NSM sampling algorithm selects up to 3 nominated network members per respondent and asks respondents for full contact information on these • Process proceeds iteratively to recruit new waves of respondents
  • 42. How does NSM work? • NSM sampling algorithm uses two sampling modes, List and Search • List mode – keeps a list, L, of all nominated network members – samples with replacement from L – even sampling of new nodes -- new nodes sampled at the same cumulative sampling rate as earlier nodes – as list of sampled nodes approaches the full population network, NSM sample converges to simple random sampling
  • 43. How does NSM work? • Search mode—look for “bridge” nodes to unexplored parts of the network. Start in search mode, then switch to list mode.
  • 44. Simulation results • Test NSM vs. RDS using 162 university and School networks from Facebook and Add Health • Size of networks ranges from 300 to 16,500 nodes • Estimate % white (Add Health) and % first year students (Facebook) • Start from a randomly selected student, repeat 500 times for each network • Calculate bias, design effects and mean absolute bias • Test (162 networks) DE is 1.16 for NSM vs 77.38 for RDS
  • 45. Is it feasible? • Is it feasible to collect network data on hidden populations? • 2010 NSIT (Network Survey of Immigration and Transnationalism) (Mouw, PI) • CAHS (Chinese in Africa Health Survey) (Merli, PI) • Cost effectiveness of gains in precision
  • 46. NSM field applications Network Survey of Immigration and Transnationalism (NSIT) Mouw et al. 2014. Social Problems; Verdery et al. 2016. Social Networks Chinese in Africa Health Survey (CAHS) Merli, Verdery, Mouw, Li 2016. Migration Studies 46 Red: RDU Blue: Mexico Green: Houston Small: Nominated Large: Sampled Network of Chinese migrants in Dar es Salaam sampled by NSM, size = probability of selecting next node
  • 47. Key challenge: Getting referrals from respondents • NSIT required recontacting respondents to get contact information on alters • CAHS -- “forward” sampling variant (FNSM)— more practical – Asked for contact information on a small number of alters at each interview (selected by NSM algorithm)
  • 48. NSM -- Future directions • NIH R21 grant to test NSM among Chinese immigrants in RDU (Merli, Mouw, Verdery, Moody, Keister, Sanders) – Pilot various approaches to get referrals from respondents – Evaluate NSM against ACS – Test multiple modes of data collection (in-person, telephone, web) 48