13. Hypothesis testing
• Distinguish between two hypotheses
1. H0 – there is no difference between groups
2. H1 – there is a difference between groups
• Or…
1. H0 – there is no relation between two variables
2. H1 – there is some relation between the two
variables
14. From statistical values to p-values
• Various procedures give us statistical values
– T-tests (one sample, two sample, paired etc.)
– F-Tests
– Correlation tests (r values)
• What is a p value?
15. P value
• P(z) = A probability if we repeat our
experiment (with all the analyses) and there is
no effect we will get this or greater statistical
value.
17. OK back to neuroimaging
• Assuming that we are doing a massive
univariate analysis (we look at each voxel
independently) we have a t-map
• Now using a theoretical distribution (given the
degrees of freedom) we can turn it into a p-
map
18. Inference!
• We take out p-map discard all voxel with
values > 0.05
– “The value for which P=0.05, or 1 in 20, is 1.96 or
nearly 2; it is convenient to take this point as a
limit in judging whether a deviation ought to be
considered significant or not. Deviations
exceeding twice the standard deviation are thus
formally regarded as significant.”
• We are done – right?
19. Not quite done yet…
• Let me generate two vectors of values and test
using a t-test if they are different
• What is the probability that P(t) < 0.05
– Well… 0.05
• Let me generate another set of values… and
another… 100 pairs of vectors
• What is the probability that at least one of the
test?
21. Correcting for multiple comparisons
• Bonferroni correction (based on Bool’s
inequality)
– Divide your p-threshold by the number of tests
you have performed
– Or multiple your p-values by the number of tests
you have performed
22. Bonferroni is a Family Wise Error
correction
It guarantees that the chances of getting at least
one false positive in all the tests is less than your
p-threshold
23. Permutation based FWE correction
• The assumptions behind the theoretical
distributions are often not met
• There are many dependencies between voxels
– Each test is not independent so Bonferroni
correction can be conservative
• We can however establish an empirical
distribution
24. Permutation based FWE correction
1. Break the relation: shuffle the participants
between the groups
2. Perform the test
3. Save the maximum statistical value across
voxels
4. Repeat
25. Permutation based FWE correction
Our FWE corrected p value is the percentage of
permutations that yielded statistical values
higher than the original (unshuffled one)
26. False Discovery Rate
• Even conceptually FWE correction seems
conservative
– At least one test out of 60 000?
• Is there a more intuitive way of looking at
this?
27. False Discovery Rate
I present a number of voxels that I think show a
strong effect, but I admit that a certain
percentage of them might be false positives.
29. FDR procedures
• Benjamini-Hochberg procedure
– With it’s dependent variables variant
• Efrons local FDR procedure
– Explicit modeling of the signal distribution
30. Interim Summary
• FWE corrections
– Bonferroni – simple but struggles with
dependencies (over conservative)
– Permutations – less dependent on assumptions,
but time consuming
• FDR corrections
– B-H – simple but also struggles with dependencies
– Local FDR – data driven, but can fail in case of low
SNR
31. CLUSTER EXTENT TESTS
Test how big are the blobs
Random field theory
Smoothness estimation
Permutation test
The problem of cluster forming threshold
Fun fact: FWE with RFT
32. Intuition
If we are interested in continuous regions of
activations why are we looking at voxels not
blobs?
35. What contributes to expected cluster
size?
How likely is to get cluster of this size from pure
noise?
It depends… on:
1. cluster forming threshold
2. smoothness of the map
3. size of the map
36. Where do we get those parameters?
1. cluster forming threshold
– Arbitrary decision
2. smoothness of the map
– Estimated from the residuals of the GLM
3. size of the map
– Calculated from the mask
37. Permutation based cluster extent
probability
1. Break the relation: shuffle the participants
between the groups
2. Perform the test
3. Threshold the map to get clusters
4. Save the sizes of all clusters
5. Repeat
38. Permutation based cluster extent
probability
Our cluster extent p value is the percentage of
permutations that yielded cluster sizes bigger
than the original (unshuffled one)
53. P-value paradox
• There are no two entities or groups that are
truly identical
• There are no two variables that are in no way
unrelated
• We just fail to obtain enough samples to see it
– Or our tools are not sensitive enough
54. More samples more “significance”
• The more subjects you will have in your study
the more likely it is that you will find
something significant
• The same applies to scan length, and field
strength
56. P-value failure
• P-values do not tell us much about actual size
of the effect
• Neither do they tell of the predictive power of
the found relation
57. The interesting question
Is PCC involved in autism?
vs.
Given cortical thickness of a subjects PCC how
well am I able to predict his or hers diagnosis?
58. Why does this matter
• More subjects, longer scans, stronger scans –
everything is significant
– We are getting there
• Lack of faith in science from the public
– Poor reproducibility
59. What needs to be done
We need more replications
We need to start reporting null results
60. What you can do
• Report effect sizes and their confidence
intervals
– For all test/voxels – not just those significant
• Share the unthresholded statistical maps
– It only takes 5 minutes on neurovault.org
• Report all the tests you have performed – not
just the significant ones