The document discusses null hypothesis significance testing (NHST) and power. It explains that NHST is used to determine if mean differences between groups are greater than would be expected by chance. A statistically significant result is one that has a low probability of occurring if the null hypothesis is true. The level of significance is typically set at p<.05. Limitations of NHST include that small sample sizes may lack power to detect real effects. The document also discusses how power and effect size determine the likelihood of correctly rejecting the null hypothesis and avoiding Type II errors.
3. Null Hypothesis Significance
Testing
• Goal
– determine whether mean differences among groups
in an experiment are greater than differences
expected simply because of chance (error variation)
• First step
– assume that the groups do not differ (H0)
• = null hypothesis
• assume the independent variable did not have an effect
3
4. Null Hypothesis Significance
Testing
• Next steps
– Probability theory: estimate likelihood of observed
outcome, while assuming null hypothesis is true.
– “statistically significant”
• outcome has small likelihood of occurring under H0
• reject H0
• conclude IV had an effect on DV
– difference between means is larger than what would be expected
if error variation alone caused the outcome
4
8. Null Hypothesis Significance
Testing
• How small does the likelihood have to be to
decide outcome isn’t due to chance?
• scientific consensus: p < .05
• = alpha (α) or level of significance
• What does a statistically significant outcome tell us?
– outcome at p ≈ .05 has about a 50/50 chance of being repeated
(at p < .05) in an exact replication
– as probability of outcome decreases (e.g., p = .025, p = .01),
likelihood of observing a statistically significant outcome (p < .05)
in an exact replication increases
– APA recommends reporting exact probability of outcome
8
10. Null Hypothesis Significance
Testing
• What do we conclude when a finding is not
statistically significant?
– do not reject the null hypothesis of no difference
– don’t accept the null hypothesis
• don’t conclude that the IV didn’t produce an effect
– cannot make a conclusion about the effect of an IV
• some factor in experiment may have prevented us from
observing an effect of the IV
• most common factor: too few participants
10
11. NHST Criticisms
• A difference between populations can almost
always be found, given a large enough sample
• A statistically significant finding may not be
relevant in practice, whilst a true effect of
practical significance may not appear
statistically significant if the test lacks the
power
• Fairness of exclusion
• Publication bias and the file-drawer problem
11
12. Experimental Sensitivity and
Power
• Sensitivity
– likelihood an experiment will detect the effect of
an IV when in fact, the IV has an effect
• affected by experiment methods and procedures
• sensitivity increases with good research design and
methods
– high degree of experimental control
– little opportunity for biases
12
13. Experimental Sensitivity and
Power
• Power
– likelihood that a statistical test will allow
researchers to reject correctly H0
• low statistical power increases Type II errors
• Power = 1 - β
• three factors affect power of statistical tests
– level of significance (alpha)
– size of the effect of the IV
– sample size (N)
13
14. Experimental Sensitivity and
Power
• Prospective Power Analysis
• step 1: estimate effect size of IV
– examine previous research involving the IV
• step 2: refer to “Power Tables”
– identify sample size needed to observe effect of IV
• step 3: use adequate sample size
– most studies in psychology are “underpowered” because of
low sample size
• Retrospective Power Analysis
• Determine the power of a study based on the effect
size, sample size, and significance level
14