SlideShare una empresa de Scribd logo
1 de 41
Introduction to Statistics
STA250

Lecture 11 - April 21st, 2010
                                1
2
3
Probability


✤   How we express likelihood mathematically

✤   For an event “A”, the probability of A occurring is denoted “P(A)”

✤   Always number between 0 and 1

    ✤   P(A) = 0 means that A never happens

    ✤   P(A) = 1 means that A always happens


                                     4
Independence & Exclusivity

✤   independence - A and B are independent if the occurrence of one does
    not affect the probability of the other:

    ✤   P(A|B) = P(A) = P(A|not B)

    ✤   P(B|A) = P(B) = P(B|not A)

✤   mutually exclusive - A and B are mutually exclusive if it is impossible
    for both of them to occur:

    ✤   P(A and B) = 0

                                       5
Probability Rules

✤   Probability of not happening is 1 minus probability of occurring

    ✤   P(not A) = 1 - P(A)



✤   When A and B are independent:

    ✤   P(A and B) = P(A) × P(B)



✤   P(A or B) = P(A) + P(B) - P(A and B)
                                     6
Probability Fundamentals


✤   Sum of probabilities of all possible outcomes is 1

✤   Flip a coin and you get either heads or tails:

    ✤   P(heads) + P(tails) = 1 = P(heads or tails)

✤   With mutually exclusive outcomes A, B, C, and D

    ✤   P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D)


                                        7
Conditional Probability

✤   With non-independent events, knowing one has happened may change
    the likelihood of the other occurring

✤   Conditional probability - what is the probability of A given that B has
    already happened?

    ✤   P(A|B)

✤   Bayes Rule for conditional probability:
                           P (A and B) P (B|A) × P (B)
                 P (A|B) =            =
                              P (B)         P (A)
                                       8
Conditional Probability Hoedown


✤   At John Jay, 62.5% of all students hate statistics while 25% of all
    students hate statistics and passed the class. What is the probability
    that a student passes stats given that the student hates statistics?



✤   Two fair dice are rolled, what is the (conditional) probability that
    exactly one die’s value is a 1 or 2 given that they show different
    numbers?


                                       9
Something Really Important



✤   Classic stats problem emerged from the game show Let’s Make a Deal,
    often called the Monty Hall Problem after the show’s host

✤   Has ended many friendships and caused bitter internet arguments




                                    10
The Game


✤   There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors
    is a fabulous prize that Monty has hidden

✤   You get to choose a door, which may or may not have the prize

✤   Monty opens another door without revealing the prize

✤   You now have the option to stay with your door or switch to another,
    should you stick with your original choice or switch?


                                     11
Choosing The First Door



✤   Three doors and one prize so you’ll pick the right door one out of
    three times, i.e. P(right first choice) = 1/3

✤   Likewise, you’ll pick the wrong door with P(wrong first choice) = 2/3




                                     12
The Reveal


✤   No matter how you choose, there are two other doors one prize. This
    means there is at least one of the two unchosen doors with nothing
    behind it.

✤   Monty knows where the prize is and opens the door that DOESN’T
    have the prize behind it.

✤   This leaves your door and one other. One of them has the prize and
    the other doesn’t, should you switch?


                                    13
To Switch, or Not To Switch


✤   You don’t know if you have the right door!

✤   What’s the probability that your door has the prize?

✤   What’s the probability that the other door has the prize?

✤   What’s the probability that your door doesn’t have the prize?



                                     14
Example of the Game


✤   As an example, the prize is hidden behind door “3”.

✤   If you choose door “3” initially, switching can only lose you the prize

✤   If you choose door “2” initially, Monty must open door “1” and
    switching will get you the prize

✤   If you choose door “1” initially, Monty must open door “2” and
    switching will get you the prize


                                      15
Switch Already!


✤   Switching is a way of saying “I don’t think the prize is behind this
    door”

✤   Since the probability is 1/3 that the prize is behind any one door, the
    probability is 2/3 that the prize is not behind that door

✤   Always switch and you’ll win 2/3’s of the time!



                                      16
Expected Values

✤   Probability can be used to estimate rewards in a game of chance

    ✤   Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ...



✤   Silly coin-flipping game: If you can flip a coin three times and have
    exactly one Heads, you get a dollar. If not, you give me a dollar.

✤   Should you take the bet?

                                     17
Normal Distribution

✤   The distribution is

    ✤   unimodal

    ✤   symmetric

    ✤   “light tailed”

✤   Notation: X ~ N(μ, σ) means “the random variable X has a normal
    distribution with mean μ and standard deviation σ“

                                   18
Area Under the Curve Equals 1

            0.8
                                         N(!3,0.5)
                                         N(2,1)
                                         N(!1,3)
            0.6
     f(x)

            0.4
            0.2
            0.0




                  !4   !2        0   2        4
                            19
Rules of Thumb


✤   P(within one standard deviation) = 0.68

✤   P(within 1.68 standard deviations) = 0.95

✤   P(within three standard deviations) = 0.997



✤   With “real” normal distributions, you just don’t get outliers!


                                      20
Standard Normal Distribution


✤   standard normal distribution is the normal distribution with mean μ = 0
    and standard deviation σ = 1: Z ~ N(0, 1)

✤   Any normal distribution can be transformed into a standard normal
    distribution. If X ~ N(μ, σ), then:
                               X −µ
                                    =Z
                                 σ
✤




                                     21
Z - Scores & the Standard Normal

✤   Each observation has an associated z-score, which is the number of
    standard deviations that observation is away from the mean

✤   Converting a sample from a normal distribution to z-scores transforms
    it to a standard normal distribution

    ✤   z-score = (observation - mean) ÷ standard deviation

✤   If the observation is above the mean then the Z-score is positive, if
    below then the Z-score is negative

                                         22
Interval Estimation

✤   We might estimate the mean for an entire population using the mean
    for a small sample, this is called a point estimate.

✤   A confidence interval gives a range of “plausible” values for the
    population mean

       ✤   Usually reported as "mean ± wiggle room"

✤   Each interval has an associated level of confidence, usually written as
    a percent (95% being the most common)

       ✤   "I am 95% confident that the population mean is in this range,
           with the sample mean being the most likely guess"
                                      23
Two-Sided: 1.96 Std. Dev.’s




                 24
Normal Critical Deviates
the point for which the area und
ht is γ. how many you wanted to find the middle X% of to travel
   Critical normal deviate: If
    ✤

   distribution,               standard deviations would you have
                                                                  the

        in each direction.

    ✤   Define zγ to be the point for which the area under the normal curve to

matical notation, zγ is the point f
        the right is γ.

    ✤   In more mathematical notation, zγ is the point for which:


                             P (Z > zγ ) = γ,
                                         25
Interpreting Confidence Intervals

✤   The width of a confidence interval indicates precision

    ✤   An observation's z-score can test if an observation is similar to
        others, bigger than ±1.96 means 95% likely to be different

✤   95% confidence intervals are by far the most common, but any level of
    confidence interval can be computed:

    ✤   90%: mean ± (1.645 × standard deviation)

    ✤   95%: mean ± (1.96 × standard deviation)

    ✤   99%: mean ± (2.58 × standard deviation)
                                        26
Components of Confidence


✤   How might a confidence interval change as:

    ✤   Ȳ increases

    ✤   σ increases

    ✤   n increases

    ✤   the confidence level increases (e.g., from 95% to 99%)


                                      27
Conflicting Hypotheses

✤   In statistical inference, there are always two conflicting hypotheses:

    ✤   null hypothesis “H0” - often states “no effect” or “no difference”.
        This is the hypothesis that we will assume to be true unless we
        have convincing evidence to the contrary.

    ✤   alternative hypothesis “H1” or “Ha” - The hypothesis that we will
        believe only if the evidence strongly supports it.



✤   The null hypothesis typically has “=” in it
                                        28
Hypothesis as Metaphor

✤   Hypothesis tests are like U.S. criminal trials

✤   The judicial system is structured such that the accused person is
    presumed innocent until proven guilty. In such a system the absence
    of convincing evidence (“beyond a reasonable doubt”) results in the
    person being set free.

    ✤   H0: innocent

    ✤   Ha: guilty

                                       29
P-values


✤   In each hypothesis testing situation we will compute a p-value. This is
    the probability that the null hypothesis is correct given the data.

    ✤   Accept H0 if the p-value is large

    ✤   Reject H0 if the p-value is small, go with Ha

✤   How small is small enough? It depends... (usually p < 0.05)



                                        30
Notes on Hypothesis Testing

✤   “Statistical significance” is not the same as “clinical significance”. A
    tiny effect may be “statistically significant” if the sample size is huge.

✤   The p-value does not describe the magnitude of the effect!

✤   When reporting analysis results, a confidence interval should always
    be provided along with the results of a hypothesis test.

✤   The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to
    similar conclusions, in practice they often do not)

✤   Never report results as “p < 0.05”, report the p-value and let the
    reader decide if they agree with your interpretation.
                                         31
• Type I Error: Reject H0 when H0 is actually true.
  – For example, to conclude there is an effect (or a difference)
    when there really isn’t one.
  – Also called “false positive”.
• Type II Error: Accept H0 when H0 is actually false.
  – For example, to fail to find an effect (or a difference) when
    there really is one.
  – Also called “false negative”.

                               State of nature
              Decision     H0 is true   Ha is true
             Accept H0          qh
                                 q       Type II
                                            qh
                                             q
             Reject H0       Type I
                                32
Probabilities of Errors of Type I and Ty
            Probabilities

✤   Each of the errors has an associated probability: associated
                    Each of the errors has an                             probabilit
                         • α = P (Type I Error)
                         • β = P (Type II Error)

✤   Hypothesis testing is set up to control Type I error rate (α)
                      Hypothesis testing is set up to control Type I
        The experimenter chooses α - everything else follows from this!
                      The experimenter chooses α — everything else
✤   Most common (by far) choice for α is 0.05.

    ✤   (Also, 0.01 and 0.10most common
                      The on occasion)           (by far) choice for α is 0.05
                                          33
Comparing Means

✤   Tests:

    ✤   Single group versus a fixed mean

    ✤   Two groups with the same variable

    ✤   Two groups with pairwise observation

✤   Hypotheses:

    ✤   H0 : the two groups have equal means ( mean A = mean B )

    ✤   Ha : the means of the groups are different
                                       34
Assumptions for t-Tests

    ✤   The group (sample) is the Independent Variable (dichotomous)

    ✤   The outcome of interest is the Dependent Variable

✤   t-Tests are only valid if these assumptions are not violated:

    ✤   The research question DOES involve the comparison of 2 means

    ✤   The Dependent Variable is a quantitative scale

    ✤   The distribution of the Dependent Variable is normal

    ✤   Independent Variable assigned randomly (independently)
                                      35
Met Assumptions, but Which Test?


✤   Only one group with data: One-Sample t-Test

✤   Two groups:

    ✤   Not related to each other: Independent-Samples t-Test

    ✤   Related samples (e.g. before & after): Paired-Samples t-Test



                                       36
One-Sample t-Test


✤   Compares a sample mean to a known population mean.

✤   Need to know the population mean!



✤   Example: Is there a difference between the population mean IQ (100)
    and the mean IQ for a sample of 50 John Jay students (125)?



                                    37
Paired-Samples t-Test

✤   Sometimes we have two sets of measurements that are related:

    ✤   Each subject is measured before and after treatment

    ✤   With pairs of identical twins

    ✤   Subject has different treatment on left & right arms



✤   For each observation in one group there is exactly one closely related
    observation in the other groups (can make pairs, one of each group)
                                        38
Independent-Samples t-Test

✤   Compares the means of two groups or samples.

✤   One of the most common situations in statistical inference is that of
    comparing two means from independent samples

    ✤   Clinical trials - treatment group vs. placebo group

    ✤   Exposed vs. unexposed

    ✤   Males vs. females

    ✤   General population vs. specific subpopulation
                                       39
Review: Hypotheses


✤   Null Hypothesis: there is no relationship between the independent
    and dependent variables

✤   p-value: the probability of the null hypothesis (H0) being true

    ✤   Reject H0 if p is too small (usually p < 0.05)

    ✤   If we reject H0, we must instead choose the alternative (Ha)



                                        40
Review: t-Tests

✤   Compare the means of exactly two groups



✤   Only one group (with data) compared to a fixed number:

    ✤   One-Sample t-Test

✤   Two groups (with data):

    ✤   Not related to each other: Independent-Samples t-Test

    ✤   Related samples (e.g. before & after): Paired-Samples t-Test
                                       41

Más contenido relacionado

La actualidad más candente

Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciSelvin Hadi
 
Chapter6
Chapter6Chapter6
Chapter6Vu Vo
 
Properties of discrete probability distribution
Properties of discrete probability distributionProperties of discrete probability distribution
Properties of discrete probability distributionJACKIE MACALINTAL
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variablesnszakir
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distributionnumanmunir01
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variablesnszakir
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionStephen Ong
 
Discrete Random Variable (Probability Distribution)
Discrete Random Variable (Probability Distribution)Discrete Random Variable (Probability Distribution)
Discrete Random Variable (Probability Distribution)LeslyAlingay
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distributionpolscjp
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrvPooja Sakhla
 
Documents.mx eduv
Documents.mx eduvDocuments.mx eduv
Documents.mx eduvOsmar Meraz
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsBabasab Patil
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionDataminingTools Inc
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMichael Ogoy
 

La actualidad más candente (20)

Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ci
 
Chapter6
Chapter6Chapter6
Chapter6
 
Sfs4e ppt 06
Sfs4e ppt 06Sfs4e ppt 06
Sfs4e ppt 06
 
Properties of discrete probability distribution
Properties of discrete probability distributionProperties of discrete probability distribution
Properties of discrete probability distribution
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
 
Semana8 muestreo
Semana8 muestreoSemana8 muestreo
Semana8 muestreo
 
Semana7 dn
Semana7 dnSemana7 dn
Semana7 dn
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variables
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distribution
 
Discrete Random Variable (Probability Distribution)
Discrete Random Variable (Probability Distribution)Discrete Random Variable (Probability Distribution)
Discrete Random Variable (Probability Distribution)
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distribution
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv
 
Documents.mx eduv
Documents.mx eduvDocuments.mx eduv
Documents.mx eduv
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Random variables
Random variablesRandom variables
Random variables
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random Variable
 
Hipotesis y muestreo estadístico
Hipotesis y muestreo estadísticoHipotesis y muestreo estadístico
Hipotesis y muestreo estadístico
 
Semana5 modelos
Semana5 modelosSemana5 modelos
Semana5 modelos
 

Similar a Lecture 11

Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...
Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...
Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...Daniel Katz
 
PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestAashish Patel
 
PG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionPG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionAashish Patel
 
Statistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfStatistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfYoursTube1
 
Statistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfStatistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfAkashyadav375896
 
Normal Distribution
Normal DistributionNormal Distribution
Normal DistributionNevIlle16
 
Probability and Randomness
Probability and RandomnessProbability and Randomness
Probability and RandomnessSalmaAlbakri2
 
Lecture 2 - Probability
Lecture 2 - ProbabilityLecture 2 - Probability
Lecture 2 - ProbabilityLuke Dicken
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptxImpanaR2
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBharath kumar Karanam
 
Final Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxFinal Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxlmelaine
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theoremBalaji P
 
Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis Vijay Hemmadi
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probabilityIkhlas Rahman
 

Similar a Lecture 11 (20)

Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...
Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...
Quantitative Methods for Lawyers - Class #10 - Binomial Distributions, Normal...
 
Probability
ProbabilityProbability
Probability
 
PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z Test
 
PG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionPG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability Distribution
 
PROBABILITY.pptx
PROBABILITY.pptxPROBABILITY.pptx
PROBABILITY.pptx
 
Statistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfStatistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdf
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Statistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfStatistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdf
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Probability and Randomness
Probability and RandomnessProbability and Randomness
Probability and Randomness
 
Lecture 2 - Probability
Lecture 2 - ProbabilityLecture 2 - Probability
Lecture 2 - Probability
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
 
Final Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxFinal Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docx
 
Lec13_Bayes.pptx
Lec13_Bayes.pptxLec13_Bayes.pptx
Lec13_Bayes.pptx
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theorem
 
Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis
 
Paranormal stats
Paranormal statsParanormal stats
Paranormal stats
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probability
 

Lecture 11

  • 2. 2
  • 3. 3
  • 4. Probability ✤ How we express likelihood mathematically ✤ For an event “A”, the probability of A occurring is denoted “P(A)” ✤ Always number between 0 and 1 ✤ P(A) = 0 means that A never happens ✤ P(A) = 1 means that A always happens 4
  • 5. Independence & Exclusivity ✤ independence - A and B are independent if the occurrence of one does not affect the probability of the other: ✤ P(A|B) = P(A) = P(A|not B) ✤ P(B|A) = P(B) = P(B|not A) ✤ mutually exclusive - A and B are mutually exclusive if it is impossible for both of them to occur: ✤ P(A and B) = 0 5
  • 6. Probability Rules ✤ Probability of not happening is 1 minus probability of occurring ✤ P(not A) = 1 - P(A) ✤ When A and B are independent: ✤ P(A and B) = P(A) × P(B) ✤ P(A or B) = P(A) + P(B) - P(A and B) 6
  • 7. Probability Fundamentals ✤ Sum of probabilities of all possible outcomes is 1 ✤ Flip a coin and you get either heads or tails: ✤ P(heads) + P(tails) = 1 = P(heads or tails) ✤ With mutually exclusive outcomes A, B, C, and D ✤ P(A) + P(B) + P(C) + P(D) = 1 = P(A or B or C or D) 7
  • 8. Conditional Probability ✤ With non-independent events, knowing one has happened may change the likelihood of the other occurring ✤ Conditional probability - what is the probability of A given that B has already happened? ✤ P(A|B) ✤ Bayes Rule for conditional probability: P (A and B) P (B|A) × P (B) P (A|B) = = P (B) P (A) 8
  • 9. Conditional Probability Hoedown ✤ At John Jay, 62.5% of all students hate statistics while 25% of all students hate statistics and passed the class. What is the probability that a student passes stats given that the student hates statistics? ✤ Two fair dice are rolled, what is the (conditional) probability that exactly one die’s value is a 1 or 2 given that they show different numbers? 9
  • 10. Something Really Important ✤ Classic stats problem emerged from the game show Let’s Make a Deal, often called the Monty Hall Problem after the show’s host ✤ Has ended many friendships and caused bitter internet arguments 10
  • 11. The Game ✤ There are 3 doors labeled “1”, “2”, and “3”, behind one of these doors is a fabulous prize that Monty has hidden ✤ You get to choose a door, which may or may not have the prize ✤ Monty opens another door without revealing the prize ✤ You now have the option to stay with your door or switch to another, should you stick with your original choice or switch? 11
  • 12. Choosing The First Door ✤ Three doors and one prize so you’ll pick the right door one out of three times, i.e. P(right first choice) = 1/3 ✤ Likewise, you’ll pick the wrong door with P(wrong first choice) = 2/3 12
  • 13. The Reveal ✤ No matter how you choose, there are two other doors one prize. This means there is at least one of the two unchosen doors with nothing behind it. ✤ Monty knows where the prize is and opens the door that DOESN’T have the prize behind it. ✤ This leaves your door and one other. One of them has the prize and the other doesn’t, should you switch? 13
  • 14. To Switch, or Not To Switch ✤ You don’t know if you have the right door! ✤ What’s the probability that your door has the prize? ✤ What’s the probability that the other door has the prize? ✤ What’s the probability that your door doesn’t have the prize? 14
  • 15. Example of the Game ✤ As an example, the prize is hidden behind door “3”. ✤ If you choose door “3” initially, switching can only lose you the prize ✤ If you choose door “2” initially, Monty must open door “1” and switching will get you the prize ✤ If you choose door “1” initially, Monty must open door “2” and switching will get you the prize 15
  • 16. Switch Already! ✤ Switching is a way of saying “I don’t think the prize is behind this door” ✤ Since the probability is 1/3 that the prize is behind any one door, the probability is 2/3 that the prize is not behind that door ✤ Always switch and you’ll win 2/3’s of the time! 16
  • 17. Expected Values ✤ Probability can be used to estimate rewards in a game of chance ✤ Expected Value = P(A)×Reward(A) + P(B)×Reward(B) + ... ✤ Silly coin-flipping game: If you can flip a coin three times and have exactly one Heads, you get a dollar. If not, you give me a dollar. ✤ Should you take the bet? 17
  • 18. Normal Distribution ✤ The distribution is ✤ unimodal ✤ symmetric ✤ “light tailed” ✤ Notation: X ~ N(μ, σ) means “the random variable X has a normal distribution with mean μ and standard deviation σ“ 18
  • 19. Area Under the Curve Equals 1 0.8 N(!3,0.5) N(2,1) N(!1,3) 0.6 f(x) 0.4 0.2 0.0 !4 !2 0 2 4 19
  • 20. Rules of Thumb ✤ P(within one standard deviation) = 0.68 ✤ P(within 1.68 standard deviations) = 0.95 ✤ P(within three standard deviations) = 0.997 ✤ With “real” normal distributions, you just don’t get outliers! 20
  • 21. Standard Normal Distribution ✤ standard normal distribution is the normal distribution with mean μ = 0 and standard deviation σ = 1: Z ~ N(0, 1) ✤ Any normal distribution can be transformed into a standard normal distribution. If X ~ N(μ, σ), then: X −µ =Z σ ✤ 21
  • 22. Z - Scores & the Standard Normal ✤ Each observation has an associated z-score, which is the number of standard deviations that observation is away from the mean ✤ Converting a sample from a normal distribution to z-scores transforms it to a standard normal distribution ✤ z-score = (observation - mean) ÷ standard deviation ✤ If the observation is above the mean then the Z-score is positive, if below then the Z-score is negative 22
  • 23. Interval Estimation ✤ We might estimate the mean for an entire population using the mean for a small sample, this is called a point estimate. ✤ A confidence interval gives a range of “plausible” values for the population mean ✤ Usually reported as "mean ± wiggle room" ✤ Each interval has an associated level of confidence, usually written as a percent (95% being the most common) ✤ "I am 95% confident that the population mean is in this range, with the sample mean being the most likely guess" 23
  • 24. Two-Sided: 1.96 Std. Dev.’s 24
  • 25. Normal Critical Deviates the point for which the area und ht is γ. how many you wanted to find the middle X% of to travel Critical normal deviate: If ✤ distribution, standard deviations would you have the in each direction. ✤ Define zγ to be the point for which the area under the normal curve to matical notation, zγ is the point f the right is γ. ✤ In more mathematical notation, zγ is the point for which: P (Z > zγ ) = γ, 25
  • 26. Interpreting Confidence Intervals ✤ The width of a confidence interval indicates precision ✤ An observation's z-score can test if an observation is similar to others, bigger than ±1.96 means 95% likely to be different ✤ 95% confidence intervals are by far the most common, but any level of confidence interval can be computed: ✤ 90%: mean ± (1.645 × standard deviation) ✤ 95%: mean ± (1.96 × standard deviation) ✤ 99%: mean ± (2.58 × standard deviation) 26
  • 27. Components of Confidence ✤ How might a confidence interval change as: ✤ Ȳ increases ✤ σ increases ✤ n increases ✤ the confidence level increases (e.g., from 95% to 99%) 27
  • 28. Conflicting Hypotheses ✤ In statistical inference, there are always two conflicting hypotheses: ✤ null hypothesis “H0” - often states “no effect” or “no difference”. This is the hypothesis that we will assume to be true unless we have convincing evidence to the contrary. ✤ alternative hypothesis “H1” or “Ha” - The hypothesis that we will believe only if the evidence strongly supports it. ✤ The null hypothesis typically has “=” in it 28
  • 29. Hypothesis as Metaphor ✤ Hypothesis tests are like U.S. criminal trials ✤ The judicial system is structured such that the accused person is presumed innocent until proven guilty. In such a system the absence of convincing evidence (“beyond a reasonable doubt”) results in the person being set free. ✤ H0: innocent ✤ Ha: guilty 29
  • 30. P-values ✤ In each hypothesis testing situation we will compute a p-value. This is the probability that the null hypothesis is correct given the data. ✤ Accept H0 if the p-value is large ✤ Reject H0 if the p-value is small, go with Ha ✤ How small is small enough? It depends... (usually p < 0.05) 30
  • 31. Notes on Hypothesis Testing ✤ “Statistical significance” is not the same as “clinical significance”. A tiny effect may be “statistically significant” if the sample size is huge. ✤ The p-value does not describe the magnitude of the effect! ✤ When reporting analysis results, a confidence interval should always be provided along with the results of a hypothesis test. ✤ The choice of 0.05 is arbitrary. (p = 0.051 and p = 0.049 should lead to similar conclusions, in practice they often do not) ✤ Never report results as “p < 0.05”, report the p-value and let the reader decide if they agree with your interpretation. 31
  • 32. • Type I Error: Reject H0 when H0 is actually true. – For example, to conclude there is an effect (or a difference) when there really isn’t one. – Also called “false positive”. • Type II Error: Accept H0 when H0 is actually false. – For example, to fail to find an effect (or a difference) when there really is one. – Also called “false negative”. State of nature Decision H0 is true Ha is true Accept H0 qh q Type II qh q Reject H0 Type I 32
  • 33. Probabilities of Errors of Type I and Ty Probabilities ✤ Each of the errors has an associated probability: associated Each of the errors has an probabilit • α = P (Type I Error) • β = P (Type II Error) ✤ Hypothesis testing is set up to control Type I error rate (α) Hypothesis testing is set up to control Type I The experimenter chooses α - everything else follows from this! The experimenter chooses α — everything else ✤ Most common (by far) choice for α is 0.05. ✤ (Also, 0.01 and 0.10most common The on occasion) (by far) choice for α is 0.05 33
  • 34. Comparing Means ✤ Tests: ✤ Single group versus a fixed mean ✤ Two groups with the same variable ✤ Two groups with pairwise observation ✤ Hypotheses: ✤ H0 : the two groups have equal means ( mean A = mean B ) ✤ Ha : the means of the groups are different 34
  • 35. Assumptions for t-Tests ✤ The group (sample) is the Independent Variable (dichotomous) ✤ The outcome of interest is the Dependent Variable ✤ t-Tests are only valid if these assumptions are not violated: ✤ The research question DOES involve the comparison of 2 means ✤ The Dependent Variable is a quantitative scale ✤ The distribution of the Dependent Variable is normal ✤ Independent Variable assigned randomly (independently) 35
  • 36. Met Assumptions, but Which Test? ✤ Only one group with data: One-Sample t-Test ✤ Two groups: ✤ Not related to each other: Independent-Samples t-Test ✤ Related samples (e.g. before & after): Paired-Samples t-Test 36
  • 37. One-Sample t-Test ✤ Compares a sample mean to a known population mean. ✤ Need to know the population mean! ✤ Example: Is there a difference between the population mean IQ (100) and the mean IQ for a sample of 50 John Jay students (125)? 37
  • 38. Paired-Samples t-Test ✤ Sometimes we have two sets of measurements that are related: ✤ Each subject is measured before and after treatment ✤ With pairs of identical twins ✤ Subject has different treatment on left & right arms ✤ For each observation in one group there is exactly one closely related observation in the other groups (can make pairs, one of each group) 38
  • 39. Independent-Samples t-Test ✤ Compares the means of two groups or samples. ✤ One of the most common situations in statistical inference is that of comparing two means from independent samples ✤ Clinical trials - treatment group vs. placebo group ✤ Exposed vs. unexposed ✤ Males vs. females ✤ General population vs. specific subpopulation 39
  • 40. Review: Hypotheses ✤ Null Hypothesis: there is no relationship between the independent and dependent variables ✤ p-value: the probability of the null hypothesis (H0) being true ✤ Reject H0 if p is too small (usually p < 0.05) ✤ If we reject H0, we must instead choose the alternative (Ha) 40
  • 41. Review: t-Tests ✤ Compare the means of exactly two groups ✤ Only one group (with data) compared to a fixed number: ✤ One-Sample t-Test ✤ Two groups (with data): ✤ Not related to each other: Independent-Samples t-Test ✤ Related samples (e.g. before & after): Paired-Samples t-Test 41

Notas del editor