4. Probability
✤ How we express likelihood mathematically
✤ For an event “A”, the probability of A occurring is denoted “P(A)”
✤ Always number between 0 and 1
✤ P(A) = 0 means that A never happens
✤ P(A) = 1 means that A always happens
4
5. Independence & Exclusivity
✤ independence - A and B are independent if the occurrence of one does
not affect the probability of the other:
✤ P(A|B) = P(A) = P(A|not B)
✤ P(B|A) = P(B) = P(B|not A)
✤ mutually exclusive - A and B are mutually exclusive if it is impossible
for both of them to occur:
✤ P(A and B) = 0
5
6. Probability Rules
✤ Probability of not happening is 1 minus probability of occurring
✤ P(not A) = 1 - P(A)
✤ When A and B are independent:
✤ P(A and B) = P(A) × P(B)
✤ P(A or B) = P(A) + P(B) - P(A and B)
6
7. Probability Fundamentals
✤ Sum of probabilities of all possible outcomes is 1
✤ Flip a coin and you get either heads or tails:
✤ P(heads) + P(tails) = 1 = P(heads or tails)
✤ With mutually exclusive outcomes A, B, C, and D
✤ P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D)
7
8. Conditional Probability
✤ With non-independent events, knowing one has happened may change
the likelihood of the other occurring
✤ Conditional probability - what is the probability of A given that B has
already happened?
✤ P(A|B)
✤ Bayes Rule for conditional probability:
P (A and B) P (B|A) × P (B)
P (A|B) = =
P (B) P (A)
8
9. Conditional Probability Hoedown
✤ At John Jay, 62.5% of all students hate statistics while 25% of all
students hate statistics and passed the class. What is the probability
that a student passes stats given that the student hates statistics?
✤ Two fair dice are rolled, what is the (conditional) probability that
exactly one die’s value is a 1 or 2 given that they show different
numbers?
9
10. Something Really Important
✤ Classic stats problem emerged from the game show Let’s Make a Deal,
often called the Monty Hall Problem after the show’s host
✤ Has ended many friendships and caused bitter internet arguments
10
11. The Game
✤ There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors
is a fabulous prize that Monty has hidden
✤ You get to choose a door, which may or may not have the prize
✤ Monty opens another door without revealing the prize
✤ You now have the option to stay with your door or switch to another,
should you stick with your original choice or switch?
11
12. Choosing The First Door
✤ Three doors and one prize so you’ll pick the right door one out of
three times, i.e. P(right first choice) = 1/3
✤ Likewise, you’ll pick the wrong door with P(wrong first choice) = 2/3
12
13. The Reveal
✤ No matter how you choose, there are two other doors one prize. This
means there is at least one of the two unchosen doors with nothing
behind it.
✤ Monty knows where the prize is and opens the door that DOESN’T
have the prize behind it.
✤ This leaves your door and one other. One of them has the prize and
the other doesn’t, should you switch?
13
14. To Switch, or Not To Switch
✤ You don’t know if you have the right door!
✤ What’s the probability that your door has the prize?
✤ What’s the probability that the other door has the prize?
✤ What’s the probability that your door doesn’t have the prize?
14
15. Example of the Game
✤ As an example, the prize is hidden behind door “3”.
✤ If you choose door “3” initially, switching can only lose you the prize
✤ If you choose door “2” initially, Monty must open door “1” and
switching will get you the prize
✤ If you choose door “1” initially, Monty must open door “2” and
switching will get you the prize
15
16. Switch Already!
✤ Switching is a way of saying “I don’t think the prize is behind this
door”
✤ Since the probability is 1/3 that the prize is behind any one door, the
probability is 2/3 that the prize is not behind that door
✤ Always switch and you’ll win 2/3’s of the time!
16
17. Expected Values
✤ Probability can be used to estimate rewards in a game of chance
✤ Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ...
✤ Silly coin-flipping game: If you can flip a coin three times and have
exactly one Heads, you get a dollar. If not, you give me a dollar.
✤ Should you take the bet?
17
18. Normal Distribution
✤ The distribution is
✤ unimodal
✤ symmetric
✤ “light tailed”
✤ Notation: X ~ N(μ, σ) means “the random variable X has a normal
distribution with mean μ and standard deviation σ“
18
19. Area Under the Curve Equals 1
0.8
N(!3,0.5)
N(2,1)
N(!1,3)
0.6
f(x)
0.4
0.2
0.0
!4 !2 0 2 4
19
20. Rules of Thumb
✤ P(within one standard deviation) = 0.68
✤ P(within 1.68 standard deviations) = 0.95
✤ P(within three standard deviations) = 0.997
✤ With “real” normal distributions, you just don’t get outliers!
20
21. Standard Normal Distribution
✤ standard normal distribution is the normal distribution with mean μ = 0
and standard deviation σ = 1: Z ~ N(0, 1)
✤ Any normal distribution can be transformed into a standard normal
distribution. If X ~ N(μ, σ), then:
X −µ
=Z
σ
✤
21
22. Z - Scores & the Standard Normal
✤ Each observation has an associated z-score, which is the number of
standard deviations that observation is away from the mean
✤ Converting a sample from a normal distribution to z-scores transforms
it to a standard normal distribution
✤ z-score = (observation - mean) ÷ standard deviation
✤ If the observation is above the mean then the Z-score is positive, if
below then the Z-score is negative
22
23. Interval Estimation
✤ We might estimate the mean for an entire population using the mean
for a small sample, this is called a point estimate.
✤ A confidence interval gives a range of “plausible” values for the
population mean
✤ Usually reported as "mean ± wiggle room"
✤ Each interval has an associated level of confidence, usually written as
a percent (95% being the most common)
✤ "I am 95% confident that the population mean is in this range,
with the sample mean being the most likely guess"
23
25. Normal Critical Deviates
the point for which the area und
ht is γ. how many you wanted to find the middle X% of to travel
Critical normal deviate: If
✤
distribution, standard deviations would you have
the
in each direction.
✤ Define zγ to be the point for which the area under the normal curve to
matical notation, zγ is the point f
the right is γ.
✤ In more mathematical notation, zγ is the point for which:
P (Z > zγ ) = γ,
25
26. Interpreting Confidence Intervals
✤ The width of a confidence interval indicates precision
✤ An observation's z-score can test if an observation is similar to
others, bigger than ±1.96 means 95% likely to be different
✤ 95% confidence intervals are by far the most common, but any level of
confidence interval can be computed:
✤ 90%: mean ± (1.645 × standard deviation)
✤ 95%: mean ± (1.96 × standard deviation)
✤ 99%: mean ± (2.58 × standard deviation)
26
27. Components of Confidence
✤ How might a confidence interval change as:
✤ Ȳ increases
✤ σ increases
✤ n increases
✤ the confidence level increases (e.g., from 95% to 99%)
27
28. Conflicting Hypotheses
✤ In statistical inference, there are always two conflicting hypotheses:
✤ null hypothesis “H0” - often states “no effect” or “no difference”.
This is the hypothesis that we will assume to be true unless we
have convincing evidence to the contrary.
✤ alternative hypothesis “H1” or “Ha” - The hypothesis that we will
believe only if the evidence strongly supports it.
✤ The null hypothesis typically has “=” in it
28
29. Hypothesis as Metaphor
✤ Hypothesis tests are like U.S. criminal trials
✤ The judicial system is structured such that the accused person is
presumed innocent until proven guilty. In such a system the absence
of convincing evidence (“beyond a reasonable doubt”) results in the
person being set free.
✤ H0: innocent
✤ Ha: guilty
29
30. P-values
✤ In each hypothesis testing situation we will compute a p-value. This is
the probability that the null hypothesis is correct given the data.
✤ Accept H0 if the p-value is large
✤ Reject H0 if the p-value is small, go with Ha
✤ How small is small enough? It depends... (usually p < 0.05)
30
31. Notes on Hypothesis Testing
✤ “Statistical significance” is not the same as “clinical significance”. A
tiny effect may be “statistically significant” if the sample size is huge.
✤ The p-value does not describe the magnitude of the effect!
✤ When reporting analysis results, a confidence interval should always
be provided along with the results of a hypothesis test.
✤ The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to
similar conclusions, in practice they often do not)
✤ Never report results as “p < 0.05”, report the p-value and let the
reader decide if they agree with your interpretation.
31
32. • Type I Error: Reject H0 when H0 is actually true.
– For example, to conclude there is an effect (or a difference)
when there really isn’t one.
– Also called “false positive”.
• Type II Error: Accept H0 when H0 is actually false.
– For example, to fail to find an effect (or a difference) when
there really is one.
– Also called “false negative”.
State of nature
Decision H0 is true Ha is true
Accept H0 qh
q Type II
qh
q
Reject H0 Type I
32
33. Probabilities of Errors of Type I and Ty
Probabilities
✤ Each of the errors has an associated probability: associated
Each of the errors has an probabilit
• α = P (Type I Error)
• β = P (Type II Error)
✤ Hypothesis testing is set up to control Type I error rate (α)
Hypothesis testing is set up to control Type I
The experimenter chooses α - everything else follows from this!
The experimenter chooses α — everything else
✤ Most common (by far) choice for α is 0.05.
✤ (Also, 0.01 and 0.10most common
The on occasion) (by far) choice for α is 0.05
33
34. Comparing Means
✤ Tests:
✤ Single group versus a fixed mean
✤ Two groups with the same variable
✤ Two groups with pairwise observation
✤ Hypotheses:
✤ H0 : the two groups have equal means ( mean A = mean B )
✤ Ha : the means of the groups are different
34
35. Assumptions for t-Tests
✤ The group (sample) is the Independent Variable (dichotomous)
✤ The outcome of interest is the Dependent Variable
✤ t-Tests are only valid if these assumptions are not violated:
✤ The research question DOES involve the comparison of 2 means
✤ The Dependent Variable is a quantitative scale
✤ The distribution of the Dependent Variable is normal
✤ Independent Variable assigned randomly (independently)
35
36. Met Assumptions, but Which Test?
✤ Only one group with data: One-Sample t-Test
✤ Two groups:
✤ Not related to each other: Independent-Samples t-Test
✤ Related samples (e.g. before & after): Paired-Samples t-Test
36
37. One-Sample t-Test
✤ Compares a sample mean to a known population mean.
✤ Need to know the population mean!
✤ Example: Is there a difference between the population mean IQ (100)
and the mean IQ for a sample of 50 John Jay students (125)?
37
38. Paired-Samples t-Test
✤ Sometimes we have two sets of measurements that are related:
✤ Each subject is measured before and after treatment
✤ With pairs of identical twins
✤ Subject has different treatment on left & right arms
✤ For each observation in one group there is exactly one closely related
observation in the other groups (can make pairs, one of each group)
38
39. Independent-Samples t-Test
✤ Compares the means of two groups or samples.
✤ One of the most common situations in statistical inference is that of
comparing two means from independent samples
✤ Clinical trials - treatment group vs. placebo group
✤ Exposed vs. unexposed
✤ Males vs. females
✤ General population vs. specific subpopulation
39
40. Review: Hypotheses
✤ Null Hypothesis: there is no relationship between the independent
and dependent variables
✤ p-value: the probability of the null hypothesis (H0) being true
✤ Reject H0 if p is too small (usually p < 0.05)
✤ If we reject H0, we must instead choose the alternative (Ha)
40
41. Review: t-Tests
✤ Compare the means of exactly two groups
✤ Only one group (with data) compared to a fixed number:
✤ One-Sample t-Test
✤ Two groups (with data):
✤ Not related to each other: Independent-Samples t-Test
✤ Related samples (e.g. before & after): Paired-Samples t-Test
41