SlideShare una empresa de Scribd logo
1 de 32
What’s Significant?
 Hypothesis Testing, Effect Size, Confidence
      Intervals, & the p-Value Fallacy

Patrick B. Barlow, The University of Tennessee
On the Agenda…

•   Recap of causation
•   The basics of hypothesis testing
     – From research question to testable hypothesis
•   Effect size
     – What is it?
     – What can impact effect size?
•   Confidence Intervals
     – What are they?
     – How do you interpret?
     – What are the implications for interpreting statistical findings?
•   Statistical significance & p-values
     – What counts as “statistically significant”?
     – Weaknesses of the p-value
     – The p-value fallacy
•   Putting it all Together
Recap: Bradford Hill
Criteria

•   Strength of causal
    inference is affected
    by a number of
    different factors:
     – Strength of
       association
     – Consistency
     – Specificity
     – Temporal
       relationship
     – Biological gradient
     – Plausibility
     – Coherence
     – Experiment
       (reversibility)
     – Analogy
       (consideration of
       alternate
       explanations)
From research question to testable hypothesis
Statistical significance & p-values


THE BASICS OF HYPOTHESIS
TESTING
The Basics of Hypothesis Testing

In statistics, hypothesis testing forms the basis for the majority of
inferential statistical tests.
• Three basic components:
     –   Null hypothesis (H0)
     –   Alternative/research hypothesis (H1)
     –   Error
           •   Type I
           •   Type II


•   Was originally conceived as a way to minimize error over infinite trials
    rather than specify the absolute “truth” in a single scenario.
     –   Goodman equated hypothesis testing to, “a system of justice that is not concerned with
         which individual defendant is found guilty or innocent…but tries instead to control the
         overall number of incorrect verdicts.”
The Basics of Hypothesis Testing

    Null Hypothesis (H0)             Alternative Hypothesis (H1)
• Almost always the              •    The statement that you will
  statement that no                   be trying to “prove” by
  difference or relationship          conducting your inferential
  exists between the variables        statistics.
  of interest.                   •    It is almost always the
• Example: A study looking            statement that a difference
  at deep vein thrombosis             or relationship does exist
  (DVT) & the risk of                 between the variables of
  pulmonary embolism (PE)             interest.
   – The null hypothesis would
     be…                         •    What would be an alternative
   – “Having DVT does not             hypothesis for our example?
     increase one’s risk for           – “Having DVT increases the
     developing a PE.”                   risk of developing a PE.”
The Basics of Hypothesis Testing

The two most common errors we encounter in statistical testing are Type I
& Type II error. Both of these errors pose serious risks to the integrity of
your conclusions if ignored.

•    Type I error: falsely concluding a statistically significant relationship
     does exist when in fact it does not
      –     “Alpha”, “False positive”, “False alarm”, “Red-herring”, etc.
      –     Origin of the “p<.05” as statistically significant.


•    Type II error: failing to detect a statistically significant relationship
     when in fact one does exist
      –     “Beta”, “Miss”, “False negative”
      –     Statistical power & Type II error

          The probability for committing either error is interdependent, so the researcher/analyst
                      must consider which error would be more costly to their study.
Your Turn
                                       Questions
     Instructions               (for each research topic)

                              1. What is your research question?
                              2. What would you propose to use
                                 as a research design?
  In groups of 2-3, work      3. What would be the null
together to brainstorm at        hypothesis?
                              4. What are two possible
    least two research           alternative/research hypotheses
    questions/topics, &          that could be tested?
                              5. Considering the relationship
    answer each of the           between Type I & II error, which
   following questions:          would be more costly/serious to
                                 commit if conducting your
                                 particular study?



          Be prepared to discuss your answers!
What is it?
How do we interpret effect sizes?
How does effect size relate to issues of statistical power, sample
size, and error?


EFFECT SIZE
What is it?
Generally speaking, the effect
size represents the magnitude
or strength of the relationship
between two variables.

•   The proportion of variance
    in the DV explained by your
    IV.
      •   Example…
•   The difference in the mean
    on your DV among levels of
    your IV.
      •   Example…
•  The difference in proportion
   of patients with an outcome
   in the exposed vs. the
   unexposed groups of your
   IV.
Two types
1. Unstandardized Effect
    Sizes:
2. Standardized Effect Sizes:
How do we interpret
unstandardized                                     Average BMI Between Men & Women
effect sizes?                                      Following Physical Fitness Intervention
                                              29
Interpreted in the same                               28.5
metric as your variables.                     28
                                                                                           Mean
Example:
                                              27                                      difference = 3.0
                                                                                 26
                                              26
                                                                                           kg/m2
In a fitness study looking at                                25
differences between the
                                Average BMI
                                              25
sexes, men (M=26.0,                                                                               Men
SD=3.0) reported                              24                                                  Women
significantly higher average
BMI than women (M=23.0,                       23
                                                                                      23
SD=2.5), p = .02.
                                              22
What is the unstandardized
effect size?                                  21


                                              20
                                                      Pre Intervention   Post Intervention
Your Turn

In pairs, calculate & interpret (in sentence format) the unstandardized effect
size. Be ready to share your interpretations.
1.   Patients admitted to “academic” hospital clinics (M=.50, SD=.40) had
     lower average 90-day readmissions than patients seen by non-
     academic clinics (M=1.5, SD=.75), p = .02.
2.   A researcher looks at differences in number of side effects patients had
     on three difference drugs (A, B, and C). Comparison of Drug “A” to
     Drug “B” shows average side effects to be 4(SD=2.5) and
     7(SD=4.8), respectively, p=.04
3.   An article shows a difference in average number of COPD-related
     readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90) a
     patient education intervention, p=.08.
4.   An article shows a difference in average number of COPD-related
     readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90), and
     six months following a patient education intervention
     (M=0.80, SD=3.0), p =.12
How do we interpret standardized
         effect sizes?



Two of the most common standardized effect
sizes are Risk / Odds Ratios and Pearson r/R2
Interpreting ORs and RRs
• Odds/Risk ratio ABOVE 1.0 = Your exposure INCREASES
  risk of the event occurring
   – For OR/RRs between 1.00 and 1.99 the risk is increased by
       (OR – 1)%.
   – For OR/RRs 2.00 or higher, the risk is increased OR
     times, but you could also still use (OR – 1)%.
• Example:
   – Smoking is found to increase your odds of breast cancer by
     OR = 1.25. What is the increase in odds?
       • You are 25% more likely to have breast cancer if you are a smoker.
   – Smoking is found to increase your risk of developing lung
     cancer by RR = 4.8. What is the increase in risk?
       • You are 4.8 times more likely to develop lung cancer if you are a
         smoker vs. non-smoker.
Interpreting ORs and RRs
• Odds/Risk ratio BELOW 1.0 = Your exposure
  DECREASES risk of the event occurring
  – The risk is decreased by (1 – OR)%
  – Often called a PROTECTIVE effect

• Example:
  – Addition of the new guidelines for pacemaker/ICD
    interrogation produced an OR for device
    interrogation of OR = .30 versus the old
    guidelines. What is the reduction in odds?
     • (1 – OR) = (1 – .30) = 70% reduction in odds.
Your Turn

       Instructions                 Practice

Feel free to make up your    1.   OR = 3.00
own examples or just         2.   OR = .39
use, “Odds/Risk of           3.   RR = 1.50
having disease if you        4.   OR = 1.00
have the exposure of         5.   RR = .22
interest.”                   6.   RR = 18.99
                             7.   OR = .78
 What does the OR/RR         8.   RR = 6.30
say about the strength of
      relationship?
Interpreting r / R2

       Pearson r                         R2

• Provides the strength      • Literally calculated the
  of a linear relationship     square of an r statistic.
  between exactly two        • Also known as the
  continuous, quantitati       coefficient of
  ve variables.                determination
• Can vary between -1        • Provides the
  (perfect negative) to 1      proportion of shared
  (perfect positive)           variance between your
• Most correlational           IV and DV
  studies only report r         – What’s the range?
How do we interpret effect sizes?
How does effect size relate to issues of
 statistical power, sample size, and
                error?
Effect size vs. Statistical Power, sample
size, and error.
• As effect size increases , statistical
  power also increases . Which means
  that (1) you need a smaller sample
  size, and (2) have a lower chance of
  making a Type II error (i.e. a “miss”).

    So, when possible, measure for a large effect
                       size!
An OR/RR is only as
                                       important as the
                                      confidence interval
                                      that comes with it!




What are they?
How do you interpret?
How do they affect our conclusions?


CONFIDENCE
INTERVALS
What are they?
• Confidence intervals provide, as the name suggests, the confidence in a
  particular inferential statistic.
• Provide the range of values within which we are confident the true
  population parameter (e.g. mean, proportion, etc.) exists.
• Usually set at 95%
• They are calculated by using:
   • Standard error of measurement (Sm or SE)
   • Point estimate for your sample (e.g. t statistic)
   • Degrees of freedom for the sample
What are they? OR /
RR example

 95% Confidence intervals are added
 to any OR/RR calculation to provide
 an estimate on the accuracy of the
 estimation.
 • Size Matters!
      – Wide CI = weaker inference
      – Narrow CI = stronger
         inference
      – CI crosses over 1.0 = non-
         significant
 • Any 95% CI can instantly tell us:
      1. Sample size
      2. Accuracy of estimation
      3. Statistical significance
                                       1.0
Interpreting 95% Confidence
                        Intervals
95% CI of an Odds or Risk
                                         Your Turn
          Ratio

• What you read…               Interpret these 95% CIs
   – OR = 4.5 (95% CI =2.8 –   1.   OR 2.4 (95% CI 1.7 - 3.3)
     6.1)
• What you interpret…          2. OR 6.7 (95% CI 1.4 -
   – Lower bounds: OR = 2.8         107.2)
   – Upper bounds: OR = 6.1
• How you interpret…           3. OR 1.2 (95% CI .147 - 1.97)
   – “We are 95% confident     4. OR .37 (95% CI .22 - .56)
     that the true odds of
     disease for exposed vs.   5. OR .57 (95% CI .12 - .99)
     unexposed lies between
     2.8 and 6.1.”             6. OR .78 (95% CI .36 – 1.65)
What counts as “statistically significant”?
Weaknesses of the p-value
The p-value fallacy


STATISTICAL
SIGNIFICANCE
What counts as “Statistically
             significant?”
• To be considered statistically significant, the
  probability of obtaining a value of the test
  statistic (e.g. t, z, F, or χ2) must smaller than the
  probability for committing a Type I error.

• In other words, the probability (p) must be less
  than (<) what you have chosen for your alpha
  value (.05).
   – So, in most cases we conclude that a relationship if
     statistically significant if the test returns a p<.05.
Interpretation & Practice
• If a statistically significant relationship is
  found, then we conclude that observed
  relationship is too great to exist by chance
  alone.
• Which of the following are statistically
  significant results?
  1.   t(34)=5.89, p = .002
  2.   F(3, 285)=1.09, p = .101
  3.   χ2(4)=18.78, p = .04
  4.   t(68) = 4.25, p = .05
Weakness of p-values

• Not truly compatible with hypothesis testing
    – Absence of evidence vs. evidence of absence

• Never meant to be the sole indicator of significance
    – Average knowledge of statistical interpretation in evidence-based
      professions

• No consideration of effect size

• What influences p-values?
    –   Sample size
    –   Chance
    –   Effect size
    –   Statistical power
The “p-value fallacy”

P-values have become the “have your cake and eat
it too” of the statistical world.

• You get the supposed accuracy of a single study
  (short term) while being able to simultaneously
  avoid errors in the long term.

• Comes from misinterpretation of p-values as
  absolute indicators of the strength of a
  relationship. That is, seeing p = .03 as more
  significant than p = .04.
How to use multiple sources to become a better consumer of
Epidemiologic Evidence

PUTTING IT ALL TOGETHER
Going beyond the p-value

• Measures of effect size provides a far more vivid
  description of the magnitude of the relationship.
   – An OR of 4.30 is stronger than an OR of 1.50.
   – A mean difference of 35pts is larger than a mean
     difference of 20pts.
   – 65% of the variance is more than 20% of the variance

• The 95% CI provides far more information on the
  accuracy of the inference.
   – Which is more accurate?
      • OR = 2.5 (95% CI = 1.2 – 10.0) vs. OR = 2.5 (95% CI = 1.2 –
        3.1)
When reading an article…

Always consider:
1. What is the research question? Have the
   researchers used the correct null &
   alternative hypotheses?
2. How large is the…
  − Sample? Subgroup? Etc.
  − Effect size? (standardized or unstandardized)
  − Confidence interval?
3. Finally, what is the p-value?
Just because a finding is not
 significant does not mean
  that it is not meaningful.
You should always consider
the effect size and context of
the research when making a
 decision about whether or
 not any finding is clinically
            relevant.

Más contenido relacionado

La actualidad más candente

Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introduction
Geetika Gulyani
 
One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Test
shoffma5
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
Vijeesh Soman
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesis
vikramlawand
 

La actualidad más candente (20)

Probability
ProbabilityProbability
Probability
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introduction
 
Regression
RegressionRegression
Regression
 
Statistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdfStatistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdf
 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
 
One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Test
 
Hypothesis testing and p values 06
Hypothesis testing and p values  06Hypothesis testing and p values  06
Hypothesis testing and p values 06
 
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingHypothesis and Hypothesis Testing
Hypothesis and Hypothesis Testing
 
Introduction to the t Statistic
Introduction to the t StatisticIntroduction to the t Statistic
Introduction to the t Statistic
 
Hypothesis testing1
Hypothesis testing1Hypothesis testing1
Hypothesis testing1
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
One-Sample Hypothesis Tests
One-Sample Hypothesis TestsOne-Sample Hypothesis Tests
One-Sample Hypothesis Tests
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniques
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesis
 

Similar a What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
PriyankaSharma89719
 
Sample size
Sample sizeSample size
Sample size
zubis
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slides
BartsMSBlog
 
Aron chpt 7 ed effect size
Aron chpt 7 ed effect sizeAron chpt 7 ed effect size
Aron chpt 7 ed effect size
Karen Price
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
drdeepika87
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011
Sandra Nicks
 

Similar a What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy (20)

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
 
Commonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part ICommonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part I
 
Sample size
Sample sizeSample size
Sample size
 
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
 
Meta analysis with R
Meta analysis with RMeta analysis with R
Meta analysis with R
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 
Ezz eazy biostatistics for crash course
Ezz eazy biostatistics for crash courseEzz eazy biostatistics for crash course
Ezz eazy biostatistics for crash course
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational Data
 
Seawell_Exam
Seawell_ExamSeawell_Exam
Seawell_Exam
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slides
 
Baker esni handouts reading papers
Baker esni handouts reading papersBaker esni handouts reading papers
Baker esni handouts reading papers
 
Aron chpt 7 ed effect size
Aron chpt 7 ed effect sizeAron chpt 7 ed effect size
Aron chpt 7 ed effect size
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Bill howe 5_statistics
Bill howe 5_statisticsBill howe 5_statistics
Bill howe 5_statistics
 
009906275.pdf
009906275.pdf009906275.pdf
009906275.pdf
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 

Más de Pat Barlow

Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…
Pat Barlow
 
Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...
Pat Barlow
 

Más de Pat Barlow (16)

Fundamentals of measurement
Fundamentals of measurementFundamentals of measurement
Fundamentals of measurement
 
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
 
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
 
Brief Look at Association vs causation
Brief Look at Association vs causationBrief Look at Association vs causation
Brief Look at Association vs causation
 
New Benchmark 500 Uploads!
New Benchmark 500 Uploads!New Benchmark 500 Uploads!
New Benchmark 500 Uploads!
 
REVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae BarlowREVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae Barlow
 
Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013
 
Comparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout versionComparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout version
 
Learning by doing aalhe presentation handout
Learning by doing aalhe presentation handoutLearning by doing aalhe presentation handout
Learning by doing aalhe presentation handout
 
Common measures of association in medical research handout
Common measures of association in medical research handoutCommon measures of association in medical research handout
Common measures of association in medical research handout
 
Commonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutCommonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research Handout
 
Commonly Used Statistics in Survey Research
Commonly Used Statistics in Survey ResearchCommonly Used Statistics in Survey Research
Commonly Used Statistics in Survey Research
 
Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...
 
Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…
 
Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...
 
Comparing Research Designs
Comparing Research DesignsComparing Research Designs
Comparing Research Designs
 

Último

在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
uodye
 
Abortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get CytotecAbortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
wpkuukw
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
oopacde
 
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get CytotecBuy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
drmarathore
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
wpkuukw
 
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
tufbav
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
uodye
 
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pillsIn Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
uodye
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
ougvy
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
wpkuukw
 
Abortion pills in Dammam +966572737505 Buy Cytotec
Abortion pills in Dammam +966572737505 Buy CytotecAbortion pills in Dammam +966572737505 Buy Cytotec
Abortion pills in Dammam +966572737505 Buy Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
 
Abortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get CytotecAbortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get Cytotec
 
🌹Patia⬅️ Vip Call Girls Bhubaneswar 📱9777949614 Book Well Trand Call Girls In...
🌹Patia⬅️ Vip Call Girls Bhubaneswar 📱9777949614 Book Well Trand Call Girls In...🌹Patia⬅️ Vip Call Girls Bhubaneswar 📱9777949614 Book Well Trand Call Girls In...
🌹Patia⬅️ Vip Call Girls Bhubaneswar 📱9777949614 Book Well Trand Call Girls In...
 
LANDSLIDE MONITORING AND ALERT SYSTEM FINAL YEAR PROJECT BROCHURE
LANDSLIDE MONITORING AND ALERT SYSTEM FINAL YEAR PROJECT BROCHURELANDSLIDE MONITORING AND ALERT SYSTEM FINAL YEAR PROJECT BROCHURE
LANDSLIDE MONITORING AND ALERT SYSTEM FINAL YEAR PROJECT BROCHURE
 
Mass storage systems presentation operating systems
Mass storage systems presentation operating systemsMass storage systems presentation operating systems
Mass storage systems presentation operating systems
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
 
Critical Commentary Social Work Ethics.pptx
Critical Commentary Social Work Ethics.pptxCritical Commentary Social Work Ethics.pptx
Critical Commentary Social Work Ethics.pptx
 
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get CytotecBuy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
 
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
 
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime GuwahatiGuwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
 
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pillsIn Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
 
Call Girls Amethi 9332606886 HOT & SEXY Models beautiful and charming call g...
Call Girls Amethi  9332606886 HOT & SEXY Models beautiful and charming call g...Call Girls Amethi  9332606886 HOT & SEXY Models beautiful and charming call g...
Call Girls Amethi 9332606886 HOT & SEXY Models beautiful and charming call g...
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
 
Abortion pills in Dammam +966572737505 Buy Cytotec
Abortion pills in Dammam +966572737505 Buy CytotecAbortion pills in Dammam +966572737505 Buy Cytotec
Abortion pills in Dammam +966572737505 Buy Cytotec
 

What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy

  • 1. What’s Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy Patrick B. Barlow, The University of Tennessee
  • 2. On the Agenda… • Recap of causation • The basics of hypothesis testing – From research question to testable hypothesis • Effect size – What is it? – What can impact effect size? • Confidence Intervals – What are they? – How do you interpret? – What are the implications for interpreting statistical findings? • Statistical significance & p-values – What counts as “statistically significant”? – Weaknesses of the p-value – The p-value fallacy • Putting it all Together
  • 3. Recap: Bradford Hill Criteria • Strength of causal inference is affected by a number of different factors: – Strength of association – Consistency – Specificity – Temporal relationship – Biological gradient – Plausibility – Coherence – Experiment (reversibility) – Analogy (consideration of alternate explanations)
  • 4. From research question to testable hypothesis Statistical significance & p-values THE BASICS OF HYPOTHESIS TESTING
  • 5. The Basics of Hypothesis Testing In statistics, hypothesis testing forms the basis for the majority of inferential statistical tests. • Three basic components: – Null hypothesis (H0) – Alternative/research hypothesis (H1) – Error • Type I • Type II • Was originally conceived as a way to minimize error over infinite trials rather than specify the absolute “truth” in a single scenario. – Goodman equated hypothesis testing to, “a system of justice that is not concerned with which individual defendant is found guilty or innocent…but tries instead to control the overall number of incorrect verdicts.”
  • 6. The Basics of Hypothesis Testing Null Hypothesis (H0) Alternative Hypothesis (H1) • Almost always the • The statement that you will statement that no be trying to “prove” by difference or relationship conducting your inferential exists between the variables statistics. of interest. • It is almost always the • Example: A study looking statement that a difference at deep vein thrombosis or relationship does exist (DVT) & the risk of between the variables of pulmonary embolism (PE) interest. – The null hypothesis would be… • What would be an alternative – “Having DVT does not hypothesis for our example? increase one’s risk for – “Having DVT increases the developing a PE.” risk of developing a PE.”
  • 7. The Basics of Hypothesis Testing The two most common errors we encounter in statistical testing are Type I & Type II error. Both of these errors pose serious risks to the integrity of your conclusions if ignored. • Type I error: falsely concluding a statistically significant relationship does exist when in fact it does not – “Alpha”, “False positive”, “False alarm”, “Red-herring”, etc. – Origin of the “p<.05” as statistically significant. • Type II error: failing to detect a statistically significant relationship when in fact one does exist – “Beta”, “Miss”, “False negative” – Statistical power & Type II error The probability for committing either error is interdependent, so the researcher/analyst must consider which error would be more costly to their study.
  • 8. Your Turn Questions Instructions (for each research topic) 1. What is your research question? 2. What would you propose to use as a research design? In groups of 2-3, work 3. What would be the null together to brainstorm at hypothesis? 4. What are two possible least two research alternative/research hypotheses questions/topics, & that could be tested? 5. Considering the relationship answer each of the between Type I & II error, which following questions: would be more costly/serious to commit if conducting your particular study? Be prepared to discuss your answers!
  • 9. What is it? How do we interpret effect sizes? How does effect size relate to issues of statistical power, sample size, and error? EFFECT SIZE
  • 10. What is it? Generally speaking, the effect size represents the magnitude or strength of the relationship between two variables. • The proportion of variance in the DV explained by your IV. • Example… • The difference in the mean on your DV among levels of your IV. • Example… • The difference in proportion of patients with an outcome in the exposed vs. the unexposed groups of your IV. Two types 1. Unstandardized Effect Sizes: 2. Standardized Effect Sizes:
  • 11. How do we interpret unstandardized Average BMI Between Men & Women effect sizes? Following Physical Fitness Intervention 29 Interpreted in the same 28.5 metric as your variables. 28 Mean Example: 27 difference = 3.0 26 26 kg/m2 In a fitness study looking at 25 differences between the Average BMI 25 sexes, men (M=26.0, Men SD=3.0) reported 24 Women significantly higher average BMI than women (M=23.0, 23 23 SD=2.5), p = .02. 22 What is the unstandardized effect size? 21 20 Pre Intervention Post Intervention
  • 12. Your Turn In pairs, calculate & interpret (in sentence format) the unstandardized effect size. Be ready to share your interpretations. 1. Patients admitted to “academic” hospital clinics (M=.50, SD=.40) had lower average 90-day readmissions than patients seen by non- academic clinics (M=1.5, SD=.75), p = .02. 2. A researcher looks at differences in number of side effects patients had on three difference drugs (A, B, and C). Comparison of Drug “A” to Drug “B” shows average side effects to be 4(SD=2.5) and 7(SD=4.8), respectively, p=.04 3. An article shows a difference in average number of COPD-related readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90) a patient education intervention, p=.08. 4. An article shows a difference in average number of COPD-related readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90), and six months following a patient education intervention (M=0.80, SD=3.0), p =.12
  • 13. How do we interpret standardized effect sizes? Two of the most common standardized effect sizes are Risk / Odds Ratios and Pearson r/R2
  • 14. Interpreting ORs and RRs • Odds/Risk ratio ABOVE 1.0 = Your exposure INCREASES risk of the event occurring – For OR/RRs between 1.00 and 1.99 the risk is increased by (OR – 1)%. – For OR/RRs 2.00 or higher, the risk is increased OR times, but you could also still use (OR – 1)%. • Example: – Smoking is found to increase your odds of breast cancer by OR = 1.25. What is the increase in odds? • You are 25% more likely to have breast cancer if you are a smoker. – Smoking is found to increase your risk of developing lung cancer by RR = 4.8. What is the increase in risk? • You are 4.8 times more likely to develop lung cancer if you are a smoker vs. non-smoker.
  • 15. Interpreting ORs and RRs • Odds/Risk ratio BELOW 1.0 = Your exposure DECREASES risk of the event occurring – The risk is decreased by (1 – OR)% – Often called a PROTECTIVE effect • Example: – Addition of the new guidelines for pacemaker/ICD interrogation produced an OR for device interrogation of OR = .30 versus the old guidelines. What is the reduction in odds? • (1 – OR) = (1 – .30) = 70% reduction in odds.
  • 16. Your Turn Instructions Practice Feel free to make up your 1. OR = 3.00 own examples or just 2. OR = .39 use, “Odds/Risk of 3. RR = 1.50 having disease if you 4. OR = 1.00 have the exposure of 5. RR = .22 interest.” 6. RR = 18.99 7. OR = .78 What does the OR/RR 8. RR = 6.30 say about the strength of relationship?
  • 17. Interpreting r / R2 Pearson r R2 • Provides the strength • Literally calculated the of a linear relationship square of an r statistic. between exactly two • Also known as the continuous, quantitati coefficient of ve variables. determination • Can vary between -1 • Provides the (perfect negative) to 1 proportion of shared (perfect positive) variance between your • Most correlational IV and DV studies only report r – What’s the range?
  • 18. How do we interpret effect sizes?
  • 19. How does effect size relate to issues of statistical power, sample size, and error? Effect size vs. Statistical Power, sample size, and error. • As effect size increases , statistical power also increases . Which means that (1) you need a smaller sample size, and (2) have a lower chance of making a Type II error (i.e. a “miss”). So, when possible, measure for a large effect size!
  • 20. An OR/RR is only as important as the confidence interval that comes with it! What are they? How do you interpret? How do they affect our conclusions? CONFIDENCE INTERVALS
  • 21. What are they? • Confidence intervals provide, as the name suggests, the confidence in a particular inferential statistic. • Provide the range of values within which we are confident the true population parameter (e.g. mean, proportion, etc.) exists. • Usually set at 95% • They are calculated by using: • Standard error of measurement (Sm or SE) • Point estimate for your sample (e.g. t statistic) • Degrees of freedom for the sample
  • 22. What are they? OR / RR example 95% Confidence intervals are added to any OR/RR calculation to provide an estimate on the accuracy of the estimation. • Size Matters! – Wide CI = weaker inference – Narrow CI = stronger inference – CI crosses over 1.0 = non- significant • Any 95% CI can instantly tell us: 1. Sample size 2. Accuracy of estimation 3. Statistical significance 1.0
  • 23. Interpreting 95% Confidence Intervals 95% CI of an Odds or Risk Your Turn Ratio • What you read… Interpret these 95% CIs – OR = 4.5 (95% CI =2.8 – 1. OR 2.4 (95% CI 1.7 - 3.3) 6.1) • What you interpret… 2. OR 6.7 (95% CI 1.4 - – Lower bounds: OR = 2.8 107.2) – Upper bounds: OR = 6.1 • How you interpret… 3. OR 1.2 (95% CI .147 - 1.97) – “We are 95% confident 4. OR .37 (95% CI .22 - .56) that the true odds of disease for exposed vs. 5. OR .57 (95% CI .12 - .99) unexposed lies between 2.8 and 6.1.” 6. OR .78 (95% CI .36 – 1.65)
  • 24. What counts as “statistically significant”? Weaknesses of the p-value The p-value fallacy STATISTICAL SIGNIFICANCE
  • 25. What counts as “Statistically significant?” • To be considered statistically significant, the probability of obtaining a value of the test statistic (e.g. t, z, F, or χ2) must smaller than the probability for committing a Type I error. • In other words, the probability (p) must be less than (<) what you have chosen for your alpha value (.05). – So, in most cases we conclude that a relationship if statistically significant if the test returns a p<.05.
  • 26. Interpretation & Practice • If a statistically significant relationship is found, then we conclude that observed relationship is too great to exist by chance alone. • Which of the following are statistically significant results? 1. t(34)=5.89, p = .002 2. F(3, 285)=1.09, p = .101 3. χ2(4)=18.78, p = .04 4. t(68) = 4.25, p = .05
  • 27. Weakness of p-values • Not truly compatible with hypothesis testing – Absence of evidence vs. evidence of absence • Never meant to be the sole indicator of significance – Average knowledge of statistical interpretation in evidence-based professions • No consideration of effect size • What influences p-values? – Sample size – Chance – Effect size – Statistical power
  • 28. The “p-value fallacy” P-values have become the “have your cake and eat it too” of the statistical world. • You get the supposed accuracy of a single study (short term) while being able to simultaneously avoid errors in the long term. • Comes from misinterpretation of p-values as absolute indicators of the strength of a relationship. That is, seeing p = .03 as more significant than p = .04.
  • 29. How to use multiple sources to become a better consumer of Epidemiologic Evidence PUTTING IT ALL TOGETHER
  • 30. Going beyond the p-value • Measures of effect size provides a far more vivid description of the magnitude of the relationship. – An OR of 4.30 is stronger than an OR of 1.50. – A mean difference of 35pts is larger than a mean difference of 20pts. – 65% of the variance is more than 20% of the variance • The 95% CI provides far more information on the accuracy of the inference. – Which is more accurate? • OR = 2.5 (95% CI = 1.2 – 10.0) vs. OR = 2.5 (95% CI = 1.2 – 3.1)
  • 31. When reading an article… Always consider: 1. What is the research question? Have the researchers used the correct null & alternative hypotheses? 2. How large is the… − Sample? Subgroup? Etc. − Effect size? (standardized or unstandardized) − Confidence interval? 3. Finally, what is the p-value?
  • 32. Just because a finding is not significant does not mean that it is not meaningful. You should always consider the effect size and context of the research when making a decision about whether or not any finding is clinically relevant.

Notas del editor

  1. Alternatively, the second example could be interpreted as: “Smoking increases your risk of lung cancer by 380% vs. non-smoking”
  2. Insert literature examples.