SlideShare a Scribd company logo
1 of 58
52-Week Biotech Stock Price 1
100 Years of “Emma” 2
17 Years Superbowl Viewership 3
4 What is common among these time series data?
All wrong! All these time series are fabrications All these time series are “random walks” 5
6 Welcome to secondary data analysis.
Secondary Data Analysis B. Rey de Castro, Sc.D. Guest Researcher CDC National Center for Health Statistics University of Maryland College Park School of Public Health FMSC 720 Study Design in MCH Epidemiology November 30, 2010
Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 8
Uses for Secondary Data Hypothesis generation/testing Pilot data for grant proposals Expanding knowledge Publications
National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm Population Children, adults  nationwide Method Face to face interview Physical exams Content  Chronic and Infectious Disease Mental health and cognitive functioning Energy Balance Reproductive history and sexual behavior Respiratory disease Data N ~ 5,000 annually Initiated in 1960’s; Annual since 1999 On-line tutorial
    National Health Interview Survey                    (NHIS) http://www.cdc.gov/nchs/nhis.htm Population Households, families, adults, children nationwide Method Face to face interview Content Health conditions and behaviors, access to and use of health services Cancer Control Module (1987, 1992, 2000, 2003, and 2005) Energy Balance Cancer Screening  Sun Avoidance  Tobacco Use and Control  Genetic Testing Data N ~ 40,000 households (~87,000 individuals) annually Initiated in 1957
Other Federal Surveys National Longitudinal Mortality Study http://www.census.gov/nlms/ National Health Care Survey http://www.cdc.gov/nchs/nhcs.htm National Ambulatory Medical Care Survey http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm Medical Expenditure Panel Survey http://www.meps.ahrq.gov/ Medicare Current Beneficiary Survey http://www.cms.hhs.gov/MCBS/ Medicare Health Outcomes Survey http://www.hosonline.org/ National Survey on Drug Use and Health http://www.oas.samhsa.gov/nhsda.htm National Survey of Family Growth http://www.cdc.gov/nchs/about/major/nsfg/nsfgbiblio.htm
Strengths Inexpensive data collection and design costs More statistical power: larger samples Broader geographic area Generalizable to national population Improves understanding of hypothesis Test trends over time Potential for linkage Person Geographically
Limitations 1 Substantial time spent on statistical analysis Cross-sectional Recall bias Mismatch: ideal and feasible hypothesis Mismatch: hypothesis and original purpose Generalizabilityto small areas impossible Specialized statistical techniques
Limitations 2 Quality Validity & reliability Changes to survey over time Poor documentation Restricted/conditional access Confidentiality 15
Recap Just a few examples of publicly available data Most are cross-sectional All employ a complex sampling design Many use multi-stage sampling Requires special software to analyze  e.g., SUDAAN Use of weighting, clustering, and stratification Differences in variance estimation methods
Complex Surveys 17
Statistical Weight The statistical weight of a sampled person is the number of people in the population that the person represents.  Weights derived from Selection probabilities Response rates Post-stratification adjustment  e.g., gender, education, income, region
Stratification Population divided before sampling into disjoint, exhaustive groups (strata) Members termed primary sampling units (PSUs)  Independent samples are taken in each strata Strata formed by similar geographic areas   e.g., NHANES: partition US counties into 49 strata based on region and economic/racial characteristics Sample 2 counties (PSUs) from each strata
Clustering Persons residing in a small area may have similar characteristics Thus, responses of subjects in small area are potentially correlated  Correlation must be accounted for in the analysis  Survey analysis programs do this through strata/PSU information
Variance Estimation for Surveys Linearization: Uses a Taylor series expansion to estimate variance of non-linear estimators  Default method for most programs Requires stratification and PSU information Replication: Calculates parameter estimates for each replicate and combines to estimate variance Jackknife with replicate weights available for SUDAAN, STATA, SAS and WesVAR
Replication vs. Linearization If survey doesn’t have replicate weights use the full sample weights and linearization If survey has replicate weights use them with the jackknife procedure Most software use linearization method Only SUDAAN, STATA, SAS, and WesVAR can incorporate replicate weights
Complex Survey Design Correct variance estimates Proper hypothesis testing Standard errors will tend to be larger  Less likely to make Type I error
Statistical Software for Analyzing Health Surveys  Specifically designed for analyzing data utilizing complex sampling designs: SUDAAN WesVar Others that can be used: SAS STATA SPSS Mplus
Data/Research Resources Univ. of Michigan Consortium for social research: http://www.icpsr.umich.edu/ UCLA Statistical Computing: http://www.ats.ucla.edu/stat/ BRFSS Maps http://apps.nccd.cdc.gov/gisbrfss/default.aspx State Cancer Profiles http://statecancerprofiles.cancer.gov/
References Korn, E.L. and Graubard, B.I. (1999). Analysis of  	Health Surveys. New York: John Wiley State Cancer Profiles: http://statecancerprofiles.cancer.gov/ SUDAAN: http://www.rti.org/SUDAAN/ SAS:  http://www.sas.com/ SPSS: http://www.spss.com/ STATA:  http://www.stata.com/ WesVar: http://www.westat.com/wesvar/ Mplus: http://www.statmodel.com/
Other Data Sources State registries Birth Death Cancer Emergency room admissions Acute outcomes 27
Intermission 28
Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 29
Lesson One 30 Integrity
Dirty Data Key-punch errors Invalid data Missing data Mislabeled variables Unknown variables 31
Preparing Data 32
Processing Data Recode data Label variables Format data 33
Investigation Reality checks Out-of-range values Descriptive statistics Ranges: out-of-range or improbable values Frequencies: missing values or classes Simple graphical display 34
Normal Ranges 35
Imputing Missing Values Increases available data Statistically more complex Defensibility Useful 36
Lesson Two Spend time up-front being sure about your data Foundation of sand or stone? Crystal clear case definition & recodes More time preparing than analyzing Prevents problems Simplifies analysis 37
Statistical Analysis Plan 38
Outcome 39
Design 40
Clustered Data 41
Longitudinal 42
Hierarchical 43
Diagnostics Independence Homoskedasticity Skewness Influential observations 44
Lesson Three Plan, then execute the plan Conform statistical technique to outcome and design Diagnostics 45
Case Study Ongoing spatial epidemiology project Complex survey Cross-sectional Data linkage Childhood asthma episodes Air pollution exposure 46
Case Study Air pollutant: acrolein EPA attributes >90% non-cancer respiratory health effects to acrolein No epidemiology to date 47
Data Linkage 48
National Health Interview Survey Health outcome Asthma episode in last 12 months 2000 – 2004 Children 3 – 17 years-old Parents of ~66,000 kids surveyed Nationally representative sample Complex survey weighting 49
National Health Interview Survey Potential Confounders Smoking household Acrolein industry household Age, sex, race Education, income, single-parent family Access to care, insurance Urban/rural Census regional division 50
National Air Toxics Assessment Air pollutant Acrolein Strong respiratory irritant Cigarette smoke; industrial emissions 2002 Modeled exposure assessment Census tracts nationwide 51
52 How would you link these two databases?
Geographic Linkage 53
54 But, requires access to confidential NHIS data.
NCHS Says Orient to data structure and contents Locate variables Download data Append & merge data Clean & recode data Format & label variables 55
NHIS Data Processing Extract and compile data by year Multiple files 2004 redesign Compile data 2000 – 2004 Formatting and variable names a pain Identify records with complete data Link to NATA Done confidentially by NCHS staff 56
Analysis Plan Hypothesis “Childhood asthma episodes are associated with census-tract-level estimates of acrolein exposure” Descriptive statistics Logistic regression Complex weighted variance estimation SAS-callable SUDAAN 57
Wisdom Network Cultivate relationships Front-line staff Principal investigators 58

More Related Content

What's hot

Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSSPhi Jack
 
Identification of research problem
Identification of research problemIdentification of research problem
Identification of research problemRAVI RAI DANGI
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)YesAnalytics
 
Scale of measurement
Scale of measurementScale of measurement
Scale of measurementHennaAnsari
 
Experimental Research Design (True, Quasi and Pre Experimental Design)
Experimental Research Design (True, Quasi and Pre Experimental Design)Experimental Research Design (True, Quasi and Pre Experimental Design)
Experimental Research Design (True, Quasi and Pre Experimental Design)Alam Nuzhathalam
 
01 parametric and non parametric statistics
01 parametric and non parametric statistics01 parametric and non parametric statistics
01 parametric and non parametric statisticsVasant Kothari
 
Research, Types and objectives of research
Research, Types and objectives of research Research, Types and objectives of research
Research, Types and objectives of research Bindu Kshtriya
 
Effect sizes in meta-analysis
Effect sizes in meta-analysisEffect sizes in meta-analysis
Effect sizes in meta-analysisRizwan S A
 
Types of Statistics
Types of StatisticsTypes of Statistics
Types of Statisticsloranel
 
Longitudinal research
Longitudinal researchLongitudinal research
Longitudinal researchnadia naseem
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppthabtamu biazin
 

What's hot (20)

Scales of measurement
Scales of measurementScales of measurement
Scales of measurement
 
Systematic search strategies
Systematic search strategiesSystematic search strategies
Systematic search strategies
 
Analysis and Interpretation of Data
Analysis and Interpretation of DataAnalysis and Interpretation of Data
Analysis and Interpretation of Data
 
Questionnaire
QuestionnaireQuestionnaire
Questionnaire
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSS
 
Identification of research problem
Identification of research problemIdentification of research problem
Identification of research problem
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)
 
Scale of measurement
Scale of measurementScale of measurement
Scale of measurement
 
The Sign Test
The Sign TestThe Sign Test
The Sign Test
 
Experimental Research Design (True, Quasi and Pre Experimental Design)
Experimental Research Design (True, Quasi and Pre Experimental Design)Experimental Research Design (True, Quasi and Pre Experimental Design)
Experimental Research Design (True, Quasi and Pre Experimental Design)
 
01 parametric and non parametric statistics
01 parametric and non parametric statistics01 parametric and non parametric statistics
01 parametric and non parametric statistics
 
Research, Types and objectives of research
Research, Types and objectives of research Research, Types and objectives of research
Research, Types and objectives of research
 
Effect sizes in meta-analysis
Effect sizes in meta-analysisEffect sizes in meta-analysis
Effect sizes in meta-analysis
 
Systematic review
Systematic reviewSystematic review
Systematic review
 
Types of Statistics
Types of StatisticsTypes of Statistics
Types of Statistics
 
Longitudinal research
Longitudinal researchLongitudinal research
Longitudinal research
 
Data analysis
Data analysisData analysis
Data analysis
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
 
Data collection
Data collectionData collection
Data collection
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppt
 

Viewers also liked

Summer training project report on
Summer training project report onSummer training project report on
Summer training project report onKantinath Banerjee
 
Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" MarketerBoard
 
A project report on evaluation of financial performance based on ratio analysis
A project report on  evaluation of financial performance based on ratio analysisA project report on  evaluation of financial performance based on ratio analysis
A project report on evaluation of financial performance based on ratio analysisBabasab Patil
 
Project report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyProject report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyPinkey Rana
 
A project report on analysis of financial statement of icici bank
A project report on analysis of financial statement of  icici bankA project report on analysis of financial statement of  icici bank
A project report on analysis of financial statement of icici bankProjects Kart
 

Viewers also liked (6)

Nagender
NagenderNagender
Nagender
 
Summer training project report on
Summer training project report onSummer training project report on
Summer training project report on
 
Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services"
 
A project report on evaluation of financial performance based on ratio analysis
A project report on  evaluation of financial performance based on ratio analysisA project report on  evaluation of financial performance based on ratio analysis
A project report on evaluation of financial performance based on ratio analysis
 
Project report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyProject report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A Company
 
A project report on analysis of financial statement of icici bank
A project report on analysis of financial statement of  icici bankA project report on analysis of financial statement of  icici bank
A project report on analysis of financial statement of icici bank
 

Similar to Secondary Data Analysis

WGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadWGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadUWGlobalHealth
 
Epidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptEpidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptTauseef Jawaid
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleHadas Jacoby
 
Epidemiological study designs
Epidemiological study designs Epidemiological study designs
Epidemiological study designs Tauseef Jawaid
 
Epidemiological methods
Epidemiological methodsEpidemiological methods
Epidemiological methodsBhoj Raj Singh
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018John Shoffner, MD
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014muink
 
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptxLew Berman
 
PrEP Implementation Planning for the US
PrEP Implementation Planning for the USPrEP Implementation Planning for the US
PrEP Implementation Planning for the USCHAMP Network
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Marion Sills
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesPhilip Payne
 
ODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsMichael Kerr
 
PDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsPDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsCynthia.Russell
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcomesoder145
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
Matching the Research Design to the Study Question
Matching the Research Design to the Study QuestionMatching the Research Design to the Study Question
Matching the Research Design to the Study QuestionAcademyHealth
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Peter Embi
 

Similar to Secondary Data Analysis (20)

WGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadWGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali Mokdad
 
Epidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptEpidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.ppt
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
Epidemiological study designs
Epidemiological study designs Epidemiological study designs
Epidemiological study designs
 
Epidemiological methods
Epidemiological methodsEpidemiological methods
Epidemiological methods
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
 
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
 
PrEP Implementation Planning for the US
PrEP Implementation Planning for the USPrEP Implementation Planning for the US
PrEP Implementation Planning for the US
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
ODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning Sessions
 
Research design fw 2011
Research design fw 2011Research design fw 2011
Research design fw 2011
 
PDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsPDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your Fingertips
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
Matching the Research Design to the Study Question
Matching the Research Design to the Study QuestionMatching the Research Design to the Study Question
Matching the Research Design to the Study Question
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014
 
From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...
 

More from REY DECASTRO

Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...REY DECASTRO
 
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...REY DECASTRO
 
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...REY DECASTRO
 
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...REY DECASTRO
 
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...REY DECASTRO
 
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...REY DECASTRO
 
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...REY DECASTRO
 
Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...REY DECASTRO
 
Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...REY DECASTRO
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...REY DECASTRO
 
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...REY DECASTRO
 
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...REY DECASTRO
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...REY DECASTRO
 
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...REY DECASTRO
 

More from REY DECASTRO (14)

Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
 
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
 
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
 
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
 
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
 
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
 
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
 
Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...
 
Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
 
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
 
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
 
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Secondary Data Analysis

  • 2. 100 Years of “Emma” 2
  • 3. 17 Years Superbowl Viewership 3
  • 4. 4 What is common among these time series data?
  • 5. All wrong! All these time series are fabrications All these time series are “random walks” 5
  • 6. 6 Welcome to secondary data analysis.
  • 7. Secondary Data Analysis B. Rey de Castro, Sc.D. Guest Researcher CDC National Center for Health Statistics University of Maryland College Park School of Public Health FMSC 720 Study Design in MCH Epidemiology November 30, 2010
  • 8. Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 8
  • 9. Uses for Secondary Data Hypothesis generation/testing Pilot data for grant proposals Expanding knowledge Publications
  • 10. National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm Population Children, adults nationwide Method Face to face interview Physical exams Content Chronic and Infectious Disease Mental health and cognitive functioning Energy Balance Reproductive history and sexual behavior Respiratory disease Data N ~ 5,000 annually Initiated in 1960’s; Annual since 1999 On-line tutorial
  • 11. National Health Interview Survey (NHIS) http://www.cdc.gov/nchs/nhis.htm Population Households, families, adults, children nationwide Method Face to face interview Content Health conditions and behaviors, access to and use of health services Cancer Control Module (1987, 1992, 2000, 2003, and 2005) Energy Balance Cancer Screening Sun Avoidance Tobacco Use and Control Genetic Testing Data N ~ 40,000 households (~87,000 individuals) annually Initiated in 1957
  • 12. Other Federal Surveys National Longitudinal Mortality Study http://www.census.gov/nlms/ National Health Care Survey http://www.cdc.gov/nchs/nhcs.htm National Ambulatory Medical Care Survey http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm Medical Expenditure Panel Survey http://www.meps.ahrq.gov/ Medicare Current Beneficiary Survey http://www.cms.hhs.gov/MCBS/ Medicare Health Outcomes Survey http://www.hosonline.org/ National Survey on Drug Use and Health http://www.oas.samhsa.gov/nhsda.htm National Survey of Family Growth http://www.cdc.gov/nchs/about/major/nsfg/nsfgbiblio.htm
  • 13. Strengths Inexpensive data collection and design costs More statistical power: larger samples Broader geographic area Generalizable to national population Improves understanding of hypothesis Test trends over time Potential for linkage Person Geographically
  • 14. Limitations 1 Substantial time spent on statistical analysis Cross-sectional Recall bias Mismatch: ideal and feasible hypothesis Mismatch: hypothesis and original purpose Generalizabilityto small areas impossible Specialized statistical techniques
  • 15. Limitations 2 Quality Validity & reliability Changes to survey over time Poor documentation Restricted/conditional access Confidentiality 15
  • 16. Recap Just a few examples of publicly available data Most are cross-sectional All employ a complex sampling design Many use multi-stage sampling Requires special software to analyze e.g., SUDAAN Use of weighting, clustering, and stratification Differences in variance estimation methods
  • 18. Statistical Weight The statistical weight of a sampled person is the number of people in the population that the person represents. Weights derived from Selection probabilities Response rates Post-stratification adjustment e.g., gender, education, income, region
  • 19. Stratification Population divided before sampling into disjoint, exhaustive groups (strata) Members termed primary sampling units (PSUs) Independent samples are taken in each strata Strata formed by similar geographic areas   e.g., NHANES: partition US counties into 49 strata based on region and economic/racial characteristics Sample 2 counties (PSUs) from each strata
  • 20. Clustering Persons residing in a small area may have similar characteristics Thus, responses of subjects in small area are potentially correlated Correlation must be accounted for in the analysis Survey analysis programs do this through strata/PSU information
  • 21. Variance Estimation for Surveys Linearization: Uses a Taylor series expansion to estimate variance of non-linear estimators Default method for most programs Requires stratification and PSU information Replication: Calculates parameter estimates for each replicate and combines to estimate variance Jackknife with replicate weights available for SUDAAN, STATA, SAS and WesVAR
  • 22. Replication vs. Linearization If survey doesn’t have replicate weights use the full sample weights and linearization If survey has replicate weights use them with the jackknife procedure Most software use linearization method Only SUDAAN, STATA, SAS, and WesVAR can incorporate replicate weights
  • 23. Complex Survey Design Correct variance estimates Proper hypothesis testing Standard errors will tend to be larger Less likely to make Type I error
  • 24. Statistical Software for Analyzing Health Surveys Specifically designed for analyzing data utilizing complex sampling designs: SUDAAN WesVar Others that can be used: SAS STATA SPSS Mplus
  • 25. Data/Research Resources Univ. of Michigan Consortium for social research: http://www.icpsr.umich.edu/ UCLA Statistical Computing: http://www.ats.ucla.edu/stat/ BRFSS Maps http://apps.nccd.cdc.gov/gisbrfss/default.aspx State Cancer Profiles http://statecancerprofiles.cancer.gov/
  • 26. References Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys. New York: John Wiley State Cancer Profiles: http://statecancerprofiles.cancer.gov/ SUDAAN: http://www.rti.org/SUDAAN/ SAS: http://www.sas.com/ SPSS: http://www.spss.com/ STATA: http://www.stata.com/ WesVar: http://www.westat.com/wesvar/ Mplus: http://www.statmodel.com/
  • 27. Other Data Sources State registries Birth Death Cancer Emergency room admissions Acute outcomes 27
  • 29. Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 29
  • 30. Lesson One 30 Integrity
  • 31. Dirty Data Key-punch errors Invalid data Missing data Mislabeled variables Unknown variables 31
  • 33. Processing Data Recode data Label variables Format data 33
  • 34. Investigation Reality checks Out-of-range values Descriptive statistics Ranges: out-of-range or improbable values Frequencies: missing values or classes Simple graphical display 34
  • 36. Imputing Missing Values Increases available data Statistically more complex Defensibility Useful 36
  • 37. Lesson Two Spend time up-front being sure about your data Foundation of sand or stone? Crystal clear case definition & recodes More time preparing than analyzing Prevents problems Simplifies analysis 37
  • 44. Diagnostics Independence Homoskedasticity Skewness Influential observations 44
  • 45. Lesson Three Plan, then execute the plan Conform statistical technique to outcome and design Diagnostics 45
  • 46. Case Study Ongoing spatial epidemiology project Complex survey Cross-sectional Data linkage Childhood asthma episodes Air pollution exposure 46
  • 47. Case Study Air pollutant: acrolein EPA attributes >90% non-cancer respiratory health effects to acrolein No epidemiology to date 47
  • 49. National Health Interview Survey Health outcome Asthma episode in last 12 months 2000 – 2004 Children 3 – 17 years-old Parents of ~66,000 kids surveyed Nationally representative sample Complex survey weighting 49
  • 50. National Health Interview Survey Potential Confounders Smoking household Acrolein industry household Age, sex, race Education, income, single-parent family Access to care, insurance Urban/rural Census regional division 50
  • 51. National Air Toxics Assessment Air pollutant Acrolein Strong respiratory irritant Cigarette smoke; industrial emissions 2002 Modeled exposure assessment Census tracts nationwide 51
  • 52. 52 How would you link these two databases?
  • 54. 54 But, requires access to confidential NHIS data.
  • 55. NCHS Says Orient to data structure and contents Locate variables Download data Append & merge data Clean & recode data Format & label variables 55
  • 56. NHIS Data Processing Extract and compile data by year Multiple files 2004 redesign Compile data 2000 – 2004 Formatting and variable names a pain Identify records with complete data Link to NATA Done confidentially by NCHS staff 56
  • 57. Analysis Plan Hypothesis “Childhood asthma episodes are associated with census-tract-level estimates of acrolein exposure” Descriptive statistics Logistic regression Complex weighted variance estimation SAS-callable SUDAAN 57
  • 58. Wisdom Network Cultivate relationships Front-line staff Principal investigators 58
  • 59. Wisdom No one cares more about your problem than you Or, you should 59
  • 60. Wisdom Teach yourself Learn to learn 60
  • 61. Contact B. Rey de Castro, Sc.D. jsq7@cdc.gov http://www.slideshare.net/intelligo/secondary-data-analysis-5972949 61

Editor's Notes

  1. Stage 1: Primary sampling units (PSUs) are selected.  These are mostly single counties or, in a few cases, groups of contiguous counties with probability proportional to a measure of size (PPS).Stage 2: The PSUs are divided up into segments (generally city blocks or their equivalent). As with each PSU, sample segments are selected with PPS.Stage 3: Households within each segment are listed, and a sample is randomly drawn. In geographic areas where the proportion of age, ethnic, or income groups selected for oversampling is high, the probability of selection for those groups is greater than in other areas.Stage 4: Individuals are chosen to participate in NHANES from a list of all persons residing in selected households. Individuals are drawn at random within designated age-sex-race/ethnicity screening subdomains. On average, 1.6 persons are selected per household.