SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Stat310               Fin


                          Hadley Wickham
Saturday, 24 April 2010
Thank you!

                     To those of you who bought your
                     textbooks from my amazon link.
                     To the textbook publishers who
                     generously sent me free copies of books.
                     To Kensey for suggesting chik-fil-a



Saturday, 24 April 2010
1. Eat!
                 2. Final & help sessions
                 3. Finish off hypothesis testing
                 4. Other statistics opportunities
                 5. Feedback (TA & me)



Saturday, 24 April 2010
Final



Saturday, 24 April 2010
Final
                     Take home. Two hours long.
                     Three (double-sided) pages of notes.
                     Available Wednesday April 28 9am.
                     Due Wednesday May 5, 5pm,
                     under my door.
                     Ten small questions of approximately
                     equal weight. Similar to questions from the
                     homework/book.


Saturday, 24 April 2010
Common themes
                          Probability of an event.
                          Independence & conditioning.
                          Distributions: pdf/pmf, cdf, mgf, named.
                          Transformations.
                          Sampling distribution of mean and variance.
                          Estimation and testing.

                                                     Philosophy of grading
Saturday, 24 April 2010
Help sessions

                     Mon, Tue, Wed, Thurs, Fri, Sat, Sun?
                     Morning or afternoon?
                     One-on-one help, plus brief revision of
                     topics of particular interest. Suggest and
                     vote at http://goo.gl/mod/joIx



Saturday, 24 April 2010
Honour code

                     Remember to pledge your exam, and
                     note the time at which you started and
                     ended.
                     You may refer only to your note sheets,
                     not to the text book or old homeworks
                     etc.



Saturday, 24 April 2010
Hypothesis testing



Saturday, 24 April 2010
Course grades

                     Assume I took a random sample of 20
                     students from each years, and that
                     course grades are normally distributed by
                     variance 80.
                     What is the distribution of difference of
                     the two group means?



Saturday, 24 April 2010
Your turn

                     The average grade from 2009 was 85 and
                     the average grade from 2010 was 90.
                     What is the p-value? (The probability that
                     you’d see a difference this large or large if
                     there really was no difference in the
                     population means)



Saturday, 24 April 2010
1. Write down Ho and Ha
                    (positions of defence and prosecution)
                 2. Figure out good test statistic
                    (what numeric summary?)
                 3. Work out null distribution
                    (distribution of innocents)
                 4. Calculate p-value by comparing actual value
                    to null distribution (what proportion of true
                    innocents look more guilty than the suspect)
                 5. Reject Ho if p-value smaller than cutoff


Saturday, 24 April 2010
Say is      Say is
                                      guilty    innocent

                                                 False
                          Is guilty   Correct
                                                acquittal

                                 False
                   Is innocent                  Correct
                               conviction


Saturday, 24 April 2010
Your turn


                     Which type of error is more expensive/
                     more costly/worse in the criminal justice
                     system?




Saturday, 24 April 2010
Reject HO Accept HO


                                                Type II
                          HO false    Correct
                                                 error

                                      Type I
                          HO true               Correct
                                      error


Saturday, 24 April 2010
Rates
                     For a given test,
                     P(false conviction) = α = significance level
                     P(false acquittal) = 1 - β
                     β = power
                     What do think happens to β if you try to
                     make α smaller?


Saturday, 24 April 2010
α↑ β↓
                          α↓ β↑
Saturday, 24 April 2010
Cut off
                     Choose cut-off based on rate of false
                     convictions.
                     If you want a 5% rate of false
                     convictions, reject Ho if the p-value is less
                     than 0.05. (This is the industry standard
                     rate)
                     Can work out power.


Saturday, 24 April 2010
μx=80, μy=85



   90                                                                                                                                                                         y
                                                                y                                                      y
                                                                                                                                                                                                                               y
   88                                                                                                                                                                                                              y
                                                    y                               yy                                                                   y                                             y
                                   y                                                     y             y       y                             y                                       y                     y
                         y y                    y                           y                 y                                         y                                                    yy                            y
   86                                       y                                                                                                    y                                       y                                             y
                          y             x
                                                                    y
                                                                                y            y y y y               y       y           y y           y           y       y
                                                                                                                                                                                                  yy
                                                                                                                                                                                                               y
                 y                                      y               y                                                        y                                                                                     y           y       y y
                     y         y       yy                                                                                                              y             y       y y
   84                                y
                                                            y                   x                          y        y             y        yxx yy y          y                   y
                                                    y     y        x                               y
                   x         y     y                    x   y            y                            x                        yy xy
                                                                                                                            x yx    y
                           y                      x                        x                  x yx              x x              yyx x yy
   82           y                x                     x x         y              x      xx x           y             y   y
                                                                                                                                   x x        y
                        x x
                       x x                                                                       x      x         y
                                                x                        x
   80             x x                       x                 x
                                                                     x
                                                                       x                x x
                                                                                                           x     x xx xxx     x
                                                                                                                                      x
                                                                                                                                                x x x
                x                  x xxxx x                     xx             x          y    x          x x                x x                   x x
                               x     x              x x x x                  x        x            xx                   x x               xx x
   78                        x           x                                       x xx                         x                                  x
                                              x                                                                     x
                     x
   76                                                       x                                                                    x

                                                            20                                                 40                                                60                                        80                               100




Saturday, 24 April 2010
μx=80, μy=85




              10

               8

               6
 Difference




               4

               2

               0

              −2

                          20   40   60   80      100




Saturday, 24 April 2010
μx=80, μy=85




                10

                8
 |Difference|




                6

                4

                2

                0
                          20   40   60   80      100




Saturday, 24 April 2010
μx=80, μy=85




           3.5

           3.0

           2.5
 z−score




           2.0

           1.5

           1.0

           0.5

           0.0
                          20   40   60   80      100




Saturday, 24 April 2010
μx=80, μy=85




           0.8


           0.6
 p−value




           0.4


           0.2


           0.0
                           20       40       60          80      100


                 Correctly reject null 39% of the time


Saturday, 24 April 2010
μx=μy=80



                                              x                                                                                                                   x
   84                                         y                                                                                                               x
                              y                                x                                    x                                                                     y
                                  y                                                                                       x                            x              x
                          x           y                                  y                                            y
                   y                      x                y         x       xy           xy                    y x                            x y y
   82                                                                                                            y                      yy
                               y               y y x        yxx          x
                                                                                     y                                         y
                                                                                                                               x                                         y
                    y y y
                   xx                          x y      x                                  y
                                                                                           x                            y                                  y
                             y
                                  x
                                                            x              y     y
                                                                                        y                     x y x y                               yy               y
                  x x x                                                                y y             x            yx             y           x y
                                    y
                                    x                                  yy                                y x
   80                      x          y y xx x yy
                                           y                    y                                      yxxx         x                    x y           y                   x
                yy             x x y     x          x x                       yx y                        y                  x           yy y        xx          x
                                                                                                                          x                                    y
                        y               xy      x                                            xxxyy                          xy            x     x x
                      y               x
                                                  x           y              xxy       x x         x
                                                                                                   yy
                                                                                                             y y
                                                                                                               y                       x     x                   y x
                                                                                                                                                                         x
   78                     x x x                         y                                                             y          xx                      y
                                                                       x                                                                                           x       y
                                                y                                    x x     y
                                                                                                                          y                     y                  y
                x                 y          y            y         xx           x             y x               xx
                                                                                                                                                         x
                                                                                                                                                                       x
                                                                    y                           y                                                 x
   76                                                             y                                                                  x
                                                                                                                                 y                           y
                                                                                                     x
                                                                                   x

                                                  20                         40                            60                             80                              100




Saturday, 24 April 2010
μx=μy=80




               5
 difference




               0



              −5




                          20   40   60   80        100




Saturday, 24 April 2010
μx=μy=80




           3.0

           2.5

           2.0
 z−score




           1.5

           1.0

           0.5

           0.0
                          20   40   60   80        100




Saturday, 24 April 2010
μx=μy=80




                8


                6
 |difference|




                4


                2


                0
                          20   40   60   80        100




Saturday, 24 April 2010
μx=μy=80




           0.8


           0.6
 p−value




           0.4


           0.2


           0.0
                           20       40        60          80        100


                 Incorrectly reject null 6% of the time



Saturday, 24 April 2010
Your turn

                     The average grade from 2009 was 85 and
                     the average grade from 2010 was 90.
                     Would you reject the null hypothesis that
                     the average grade was the same?




Saturday, 24 April 2010
Connection to
                          confidence intervals
                     If you construct a 90% confidence
                     interval, and it doesn’t include the
                     parameter until the null, then the p-value
                     must be > 1 - 0.9 = 0.1.
                     If the p-value is 0.08, then a 92% or
                     greater confidence interval would include
                     the null parameter, and a smaller
                     confidence interval would not.


Saturday, 24 April 2010
Statistics



Saturday, 24 April 2010
Majoring
                     3 required stat classes (Stat310, Stat405, Stat410)
                     + 6 stat electives
                     + calc, linear algebra, computing
                     + design project
                     Makes for a great double major.
                     Particularly useful if you’re thinking about
                     grad school. (Appealing to employers too)
                     http://statistics.rice.edu/ShowInterior.aspx?id=58


Saturday, 24 April 2010
Minoring
                     From next year
                     Three required:
                     Track A: stat310, stat405, stat400/410
                     Track B: stat100, stat280, stat385
                     Three elective:
                     300 level+, one outside stat if it has
                     strong statistical component


Saturday, 24 April 2010
Stat410

                     Introduction to linear models
                     Powerful and general statistical tool.
                     Theory and data.
                     Offered in Fall.




Saturday, 24 April 2010
Stat405

                     Project based introduction to data
                     analysis. Lots of computing and hardly
                     any maths.
                     http://had.co.nz/stat405
                     Offered in Fall, and next year in Spring.



Saturday, 24 April 2010
Electives
                     SOCI 436 (Houston area survey), 313
                     (demography)
                     ECON 340/440 (game theory), 400
                     (econometrics), 475 (optimisation), 477 (math of
                     economics), 479 (modelling)
                     STAT 385, 431 (more theory), 420 (process
                     control), 421 (time series), 422 (Bayesian data
                     analysis), 423 (bioinformatics), 453
                     (biostatistics), 485 (environmental)


Saturday, 24 April 2010
Feedback

                     One form for me.
                     One form Xin Zhao, who most of you
                     never met but was the TA in charge of
                     your grading.
                     No form for Garrett.



Saturday, 24 April 2010

Más contenido relacionado

Más de Hadley Wickham (20)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 

25 fin

  • 1. Stat310 Fin Hadley Wickham Saturday, 24 April 2010
  • 2. Thank you! To those of you who bought your textbooks from my amazon link. To the textbook publishers who generously sent me free copies of books. To Kensey for suggesting chik-fil-a Saturday, 24 April 2010
  • 3. 1. Eat! 2. Final & help sessions 3. Finish off hypothesis testing 4. Other statistics opportunities 5. Feedback (TA & me) Saturday, 24 April 2010
  • 5. Final Take home. Two hours long. Three (double-sided) pages of notes. Available Wednesday April 28 9am. Due Wednesday May 5, 5pm, under my door. Ten small questions of approximately equal weight. Similar to questions from the homework/book. Saturday, 24 April 2010
  • 6. Common themes Probability of an event. Independence & conditioning. Distributions: pdf/pmf, cdf, mgf, named. Transformations. Sampling distribution of mean and variance. Estimation and testing. Philosophy of grading Saturday, 24 April 2010
  • 7. Help sessions Mon, Tue, Wed, Thurs, Fri, Sat, Sun? Morning or afternoon? One-on-one help, plus brief revision of topics of particular interest. Suggest and vote at http://goo.gl/mod/joIx Saturday, 24 April 2010
  • 8. Honour code Remember to pledge your exam, and note the time at which you started and ended. You may refer only to your note sheets, not to the text book or old homeworks etc. Saturday, 24 April 2010
  • 10. Course grades Assume I took a random sample of 20 students from each years, and that course grades are normally distributed by variance 80. What is the distribution of difference of the two group means? Saturday, 24 April 2010
  • 11. Your turn The average grade from 2009 was 85 and the average grade from 2010 was 90. What is the p-value? (The probability that you’d see a difference this large or large if there really was no difference in the population means) Saturday, 24 April 2010
  • 12. 1. Write down Ho and Ha (positions of defence and prosecution) 2. Figure out good test statistic (what numeric summary?) 3. Work out null distribution (distribution of innocents) 4. Calculate p-value by comparing actual value to null distribution (what proportion of true innocents look more guilty than the suspect) 5. Reject Ho if p-value smaller than cutoff Saturday, 24 April 2010
  • 13. Say is Say is guilty innocent False Is guilty Correct acquittal False Is innocent Correct conviction Saturday, 24 April 2010
  • 14. Your turn Which type of error is more expensive/ more costly/worse in the criminal justice system? Saturday, 24 April 2010
  • 15. Reject HO Accept HO Type II HO false Correct error Type I HO true Correct error Saturday, 24 April 2010
  • 16. Rates For a given test, P(false conviction) = α = significance level P(false acquittal) = 1 - β β = power What do think happens to β if you try to make α smaller? Saturday, 24 April 2010
  • 17. α↑ β↓ α↓ β↑ Saturday, 24 April 2010
  • 18. Cut off Choose cut-off based on rate of false convictions. If you want a 5% rate of false convictions, reject Ho if the p-value is less than 0.05. (This is the industry standard rate) Can work out power. Saturday, 24 April 2010
  • 19. μx=80, μy=85 90 y y y y 88 y y yy y y y y y y y y y y y y y y y yy y 86 y y y y y x y y y y y y y y y y y y y yy y y y y y y y y y y y yy y y y y 84 y y x y y y yxx yy y y y y y x y x y y x y y x yy xy x yx y y x x x yx x x yyx x yy 82 y x x x y x xx x y y y x x y x x x x x x y x x 80 x x x x x x x x x x xx xxx x x x x x x x xxxx x xx x y x x x x x x x x x x x x x x x xx x x xx x 78 x x x xx x x x x x 76 x x 20 40 60 80 100 Saturday, 24 April 2010
  • 20. μx=80, μy=85 10 8 6 Difference 4 2 0 −2 20 40 60 80 100 Saturday, 24 April 2010
  • 21. μx=80, μy=85 10 8 |Difference| 6 4 2 0 20 40 60 80 100 Saturday, 24 April 2010
  • 22. μx=80, μy=85 3.5 3.0 2.5 z−score 2.0 1.5 1.0 0.5 0.0 20 40 60 80 100 Saturday, 24 April 2010
  • 23. μx=80, μy=85 0.8 0.6 p−value 0.4 0.2 0.0 20 40 60 80 100 Correctly reject null 39% of the time Saturday, 24 April 2010
  • 24. μx=μy=80 x x 84 y x y x x y y x x x x y y y y x y x xy xy y x x y y 82 y yy y y y x yxx x y y x y y y y xx x y x y x y y y x x y y y x y x y yy y x x x y y x yx y x y y x yy y x 80 x y y xx x yy y y yxxx x x y y x yy x x y x x x yx y y x yy y xx x x y y xy x xxxyy xy x x x y x x y xxy x x x yy y y y x x y x x 78 x x x y y xx y x x y y x x y y y y x y y y xx x y x xx x x y y x 76 y x y y x x 20 40 60 80 100 Saturday, 24 April 2010
  • 25. μx=μy=80 5 difference 0 −5 20 40 60 80 100 Saturday, 24 April 2010
  • 26. μx=μy=80 3.0 2.5 2.0 z−score 1.5 1.0 0.5 0.0 20 40 60 80 100 Saturday, 24 April 2010
  • 27. μx=μy=80 8 6 |difference| 4 2 0 20 40 60 80 100 Saturday, 24 April 2010
  • 28. μx=μy=80 0.8 0.6 p−value 0.4 0.2 0.0 20 40 60 80 100 Incorrectly reject null 6% of the time Saturday, 24 April 2010
  • 29. Your turn The average grade from 2009 was 85 and the average grade from 2010 was 90. Would you reject the null hypothesis that the average grade was the same? Saturday, 24 April 2010
  • 30. Connection to confidence intervals If you construct a 90% confidence interval, and it doesn’t include the parameter until the null, then the p-value must be > 1 - 0.9 = 0.1. If the p-value is 0.08, then a 92% or greater confidence interval would include the null parameter, and a smaller confidence interval would not. Saturday, 24 April 2010
  • 32. Majoring 3 required stat classes (Stat310, Stat405, Stat410) + 6 stat electives + calc, linear algebra, computing + design project Makes for a great double major. Particularly useful if you’re thinking about grad school. (Appealing to employers too) http://statistics.rice.edu/ShowInterior.aspx?id=58 Saturday, 24 April 2010
  • 33. Minoring From next year Three required: Track A: stat310, stat405, stat400/410 Track B: stat100, stat280, stat385 Three elective: 300 level+, one outside stat if it has strong statistical component Saturday, 24 April 2010
  • 34. Stat410 Introduction to linear models Powerful and general statistical tool. Theory and data. Offered in Fall. Saturday, 24 April 2010
  • 35. Stat405 Project based introduction to data analysis. Lots of computing and hardly any maths. http://had.co.nz/stat405 Offered in Fall, and next year in Spring. Saturday, 24 April 2010
  • 36. Electives SOCI 436 (Houston area survey), 313 (demography) ECON 340/440 (game theory), 400 (econometrics), 475 (optimisation), 477 (math of economics), 479 (modelling) STAT 385, 431 (more theory), 420 (process control), 421 (time series), 422 (Bayesian data analysis), 423 (bioinformatics), 453 (biostatistics), 485 (environmental) Saturday, 24 April 2010
  • 37. Feedback One form for me. One form Xin Zhao, who most of you never met but was the TA in charge of your grading. No form for Garrett. Saturday, 24 April 2010