SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Stat405                Simulation


                              Hadley Wickham
Thursday, 23 September 2010
1. Homework comments
               2. Mathematical approach
               3. More randomness
               4. Random number generators




Thursday, 23 September 2010
Homework
                   Just graded your organisation and code, and
                   focused my comments there.
                   Biggest overall tip: use floating figures (with figure
                   {...}) with captions. Use ref{} to refer to the figure in
                   the text.
                   Captions should start with brief description of plot
                   (including bin width if applicable) and finish with
                   brief description of what the plot reveals.
                   Will grade captions more aggressively in the future.


Thursday, 23 September 2010
Code
                   Gives explicit technical details.
                   Your comments should remind you why
                   you did what you did.
                   Most readers will not look at it, but it’s
                   very important to include it, because it
                   means that others can check your work.



Thursday, 23 September 2010
Mathematical
                               approach

                   Why are we doing this simulation? Could
                   work out the expected value and variance
                   mathematically. So let’s do it!
                   Simplifying assumption: slots are iid.




Thursday, 23 September 2010
calculate_prize <- function(windows) {
       payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40,
         "BB" = 25, "B" = 10, "C" = 10, "0" = 0)

          same <- length(unique(windows)) == 1
          allbars <- all(windows %in% c("B", "BB", "BBB"))

          if (same) {
            prize <- payoffs[windows[1]]
          } else if (allbars) {
            prize <- 5
          } else {
            cherries <- sum(windows == "C")
            diamonds <- sum(windows == "DD")

              prize <- c(0, 2, 5)[cherries + 1] *
                c(1, 2, 4)[diamonds + 1]
          }
          prize
     }

Thursday, 23 September 2010
slots <- read.csv("slots.csv", stringsAsFactors = F)

     # Calculate empirical distribution
     dist <- table(c(slots$w1, slots$w2, slots$w3))
     dist <- dist / sum(dist)

     slots <- names(dist)




Thursday, 23 September 2010
poss <- expand.grid(
       w1 = slots, w2 = slots, w3 = slots,
       stringsAsFactors = FALSE
     )

     poss$prize <- NA
     for(i in seq_len(nrow(poss))) {
       window <- as.character(poss[i, 1:3])
       poss$prize[i] <- calculate_prize(window)
     }




Thursday, 23 September 2010
Your turn
                   How can you calculate the probability of each
                   combination?
                   (Hint: think about subsetting. Another hint:
                   think about the table and character
                   subsetting. Final hint: you can do this in one
                   line of code)
                   Then work out the expected value (the payoff).



Thursday, 23 September 2010
poss$prob <- with(poss,
       dist[w1] * dist[w2] * dist[w3])

     (poss_mean <- with(poss, sum(prob * prize)))

     # How do we determine the variance of this
     # estimator?




Thursday, 23 September 2010
More
                randomness

Thursday, 23 September 2010
Sample

                   Very useful for selecting from a discrete
                   set (vector) of possibilities.
                   Four arguments: x, size, replace, prob




Thursday, 23 September 2010
How can you?
                   Choose 1 from vector
                   Choose n from vector, with replacement
                   Choose n from vector, without replacement
                   Perform a weighted sample
                   Put a vector in random order
                   Put a data frame in random order


Thursday, 23 September 2010
# Choose 1 from vector
     sample(letters, 1)

     # Choose n from vector, without replacement
     sample(letters, 10)
     sample(letters, 40)

     # Choose n from vector, with replacement
     sample(letters, 40, replace = T)

     # Perform a weighted sample
     sample(names(dist), prob = dist)


Thursday, 23 September 2010
# Put a vector in random order
     sample(letters)

     # Put a data frame in random order
     slots[sample(1:nrow(slots)), ]




Thursday, 23 September 2010
Your turn
                   Source of randomness in random_prize is
                   sample. Other options are:
                   runif, rbinom, rnbinom, rpois, rnorm,
                   rt, rcauchy
                   What sort of random variables do they
                   generate and what are their parameters?
                   Practice generating numbers from them.


Thursday, 23 September 2010
Function              Distribution       Parameters
                 runif            Uniform            min, max
               rbinom             Binomial         size, prob
             rnbinom          Negative binomial    size, prob
                 rpois            Poisson             lambda
                 rnorm             Normal            mean, sd
                      rt              t                 df
             rcauchy              Cauchy          location, scale

Thursday, 23 September 2010
Distributions
                   Other functions
                    •         r to generate random numbers
                    •         d to compute density f(x)
                    •         p to compute distribution F(x)
                    •         q to compute inverse distribution F-1(x)



Thursday, 23 September 2010
# Easy to combine random variables

     n <- rpois(10000, lambda = 10)
     x <- rbinom(10000, size = n, prob = 0.3)
     qplot(x, binwidth = 1)

     p <- runif(10000)
     x <- rbinom(10000, size = 10, prob = p)
     qplot(x, binwidth = 0.1)

     # cf.
     qplot(runif(10000), binwidth = 0.1)


Thursday, 23 September 2010
# Simulation is a powerful tool for exploring
     # distributions. Easy to do computationally; hard
     # to do analytically

     qplot(1 / rpois(10000, lambda = 20))
     qplot(1 / runif(10000, min = 0.5, max = 2))

     qplot(rnorm(10000) ^ 2)
     qplot(rnorm(10000) / rnorm(10000))

     # http://www.johndcook.com/distribution_chart.html



Thursday, 23 September 2010
Your turn




Thursday, 23 September 2010
RNG
                              Computers are deterministic, so how
                                do they produce randomness?




Thursday, 23 September 2010
Thursday, 23 September 2010
How do computers
                generate random numbers?

                   They don’t! Actually produce pseudo-
                   random sequences.
                   Common approach: Xn+1 = (aXn + c) mod m
                   (http://en.wikipedia.org/wiki/
                   Linear_congruential_generator)




Thursday, 23 September 2010
next_val <- function(x, a, c, m) {
       (a * x + c) %% m
     }

     x <- 1001
     (x <- next_val(x, 1664525, 1013904223, 2^32))

     # http://en.wikipedia.org/wiki/
     List_of_pseudorandom_number_generators

     # R uses
     # http://en.wikipedia.org/wiki/Mersenne_twister


Thursday, 23 September 2010
# Random numbers are reproducible!

     set.seed(1)
     runif(10)

     set.seed(1)
     runif(10)

     # Very useful when required to make a reproducible
     # example that involves randomness




Thursday, 23 September 2010
True randomness
                   Atmospheric radio noise: http://
                   www.random.org. Use from R with
                   random package.
                   Not really important unless you’re running
                   a lottery. (Otherwise by observing a long
                   enough sequence you can predict the
                   next value)


Thursday, 23 September 2010

Más contenido relacionado

La actualidad más candente

NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal process
nozyh
 
MIDP: Form Custom and Image Items
MIDP: Form Custom and Image ItemsMIDP: Form Custom and Image Items
MIDP: Form Custom and Image Items
Jussi Pohjolainen
 

La actualidad más candente (19)

MLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNsMLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNs
 
알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04
 
Rexx summary
Rexx summaryRexx summary
Rexx summary
 
Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
Lec2
Lec2Lec2
Lec2
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal process
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Lesson19 Maximum And Minimum Values 034 Slides
Lesson19   Maximum And Minimum Values 034 SlidesLesson19   Maximum And Minimum Values 034 Slides
Lesson19 Maximum And Minimum Values 034 Slides
 
Stage3D and AGAL
Stage3D and AGALStage3D and AGAL
Stage3D and AGAL
 
Adobe AIR: Stage3D and AGAL
Adobe AIR: Stage3D and AGALAdobe AIR: Stage3D and AGAL
Adobe AIR: Stage3D and AGAL
 
Pointers lesson 4 (malloc and its use)
Pointers lesson 4 (malloc and its use)Pointers lesson 4 (malloc and its use)
Pointers lesson 4 (malloc and its use)
 
SA09 Realtime education
SA09 Realtime educationSA09 Realtime education
SA09 Realtime education
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer Vision
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum Vaues
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum Vaues
 
A Generative Model for Joint Natural Language Understanding and Generation
A Generative Model for Joint Natural Language Understanding and GenerationA Generative Model for Joint Natural Language Understanding and Generation
A Generative Model for Joint Natural Language Understanding and Generation
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 
MIDP: Form Custom and Image Items
MIDP: Form Custom and Image ItemsMIDP: Form Custom and Image Items
MIDP: Form Custom and Image Items
 

Similar a 10 simulation

Outliers
OutliersOutliers
Outliers
Xuan Le
 

Similar a 10 simulation (20)

08 functions
08 functions08 functions
08 functions
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
04 reports
04 reports04 reports
04 reports
 
06 data
06 data06 data
06 data
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
04 Reports
04 Reports04 Reports
04 Reports
 
24 modelling
24 modelling24 modelling
24 modelling
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
11 Simulation
11 Simulation11 Simulation
11 Simulation
 
Project Fortress
Project FortressProject Fortress
Project Fortress
 
Clojure night
Clojure nightClojure night
Clojure night
 
Peyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slidePeyton jones-2009-fun with-type_functions-slide
Peyton jones-2009-fun with-type_functions-slide
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
how to rate a Rails application
how to rate a Rails applicationhow to rate a Rails application
how to rate a Rails application
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Computational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-UndecidabilityComputational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-Undecidability
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Rubinius - What Have You Done For Me Lately?
Rubinius - What Have You Done For Me Lately?Rubinius - What Have You Done For Me Lately?
Rubinius - What Have You Done For Me Lately?
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Outliers
OutliersOutliers
Outliers
 

Más de Hadley Wickham (20)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 

10 simulation

  • 1. Stat405 Simulation Hadley Wickham Thursday, 23 September 2010
  • 2. 1. Homework comments 2. Mathematical approach 3. More randomness 4. Random number generators Thursday, 23 September 2010
  • 3. Homework Just graded your organisation and code, and focused my comments there. Biggest overall tip: use floating figures (with figure {...}) with captions. Use ref{} to refer to the figure in the text. Captions should start with brief description of plot (including bin width if applicable) and finish with brief description of what the plot reveals. Will grade captions more aggressively in the future. Thursday, 23 September 2010
  • 4. Code Gives explicit technical details. Your comments should remind you why you did what you did. Most readers will not look at it, but it’s very important to include it, because it means that others can check your work. Thursday, 23 September 2010
  • 5. Mathematical approach Why are we doing this simulation? Could work out the expected value and variance mathematically. So let’s do it! Simplifying assumption: slots are iid. Thursday, 23 September 2010
  • 6. calculate_prize <- function(windows) { payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40, "BB" = 25, "B" = 10, "C" = 10, "0" = 0) same <- length(unique(windows)) == 1 allbars <- all(windows %in% c("B", "BB", "BBB")) if (same) { prize <- payoffs[windows[1]] } else if (allbars) { prize <- 5 } else { cherries <- sum(windows == "C") diamonds <- sum(windows == "DD") prize <- c(0, 2, 5)[cherries + 1] * c(1, 2, 4)[diamonds + 1] } prize } Thursday, 23 September 2010
  • 7. slots <- read.csv("slots.csv", stringsAsFactors = F) # Calculate empirical distribution dist <- table(c(slots$w1, slots$w2, slots$w3)) dist <- dist / sum(dist) slots <- names(dist) Thursday, 23 September 2010
  • 8. poss <- expand.grid( w1 = slots, w2 = slots, w3 = slots, stringsAsFactors = FALSE ) poss$prize <- NA for(i in seq_len(nrow(poss))) { window <- as.character(poss[i, 1:3]) poss$prize[i] <- calculate_prize(window) } Thursday, 23 September 2010
  • 9. Your turn How can you calculate the probability of each combination? (Hint: think about subsetting. Another hint: think about the table and character subsetting. Final hint: you can do this in one line of code) Then work out the expected value (the payoff). Thursday, 23 September 2010
  • 10. poss$prob <- with(poss, dist[w1] * dist[w2] * dist[w3]) (poss_mean <- with(poss, sum(prob * prize))) # How do we determine the variance of this # estimator? Thursday, 23 September 2010
  • 11. More randomness Thursday, 23 September 2010
  • 12. Sample Very useful for selecting from a discrete set (vector) of possibilities. Four arguments: x, size, replace, prob Thursday, 23 September 2010
  • 13. How can you? Choose 1 from vector Choose n from vector, with replacement Choose n from vector, without replacement Perform a weighted sample Put a vector in random order Put a data frame in random order Thursday, 23 September 2010
  • 14. # Choose 1 from vector sample(letters, 1) # Choose n from vector, without replacement sample(letters, 10) sample(letters, 40) # Choose n from vector, with replacement sample(letters, 40, replace = T) # Perform a weighted sample sample(names(dist), prob = dist) Thursday, 23 September 2010
  • 15. # Put a vector in random order sample(letters) # Put a data frame in random order slots[sample(1:nrow(slots)), ] Thursday, 23 September 2010
  • 16. Your turn Source of randomness in random_prize is sample. Other options are: runif, rbinom, rnbinom, rpois, rnorm, rt, rcauchy What sort of random variables do they generate and what are their parameters? Practice generating numbers from them. Thursday, 23 September 2010
  • 17. Function Distribution Parameters runif Uniform min, max rbinom Binomial size, prob rnbinom Negative binomial size, prob rpois Poisson lambda rnorm Normal mean, sd rt t df rcauchy Cauchy location, scale Thursday, 23 September 2010
  • 18. Distributions Other functions • r to generate random numbers • d to compute density f(x) • p to compute distribution F(x) • q to compute inverse distribution F-1(x) Thursday, 23 September 2010
  • 19. # Easy to combine random variables n <- rpois(10000, lambda = 10) x <- rbinom(10000, size = n, prob = 0.3) qplot(x, binwidth = 1) p <- runif(10000) x <- rbinom(10000, size = 10, prob = p) qplot(x, binwidth = 0.1) # cf. qplot(runif(10000), binwidth = 0.1) Thursday, 23 September 2010
  • 20. # Simulation is a powerful tool for exploring # distributions. Easy to do computationally; hard # to do analytically qplot(1 / rpois(10000, lambda = 20)) qplot(1 / runif(10000, min = 0.5, max = 2)) qplot(rnorm(10000) ^ 2) qplot(rnorm(10000) / rnorm(10000)) # http://www.johndcook.com/distribution_chart.html Thursday, 23 September 2010
  • 21. Your turn Thursday, 23 September 2010
  • 22. RNG Computers are deterministic, so how do they produce randomness? Thursday, 23 September 2010
  • 24. How do computers generate random numbers? They don’t! Actually produce pseudo- random sequences. Common approach: Xn+1 = (aXn + c) mod m (http://en.wikipedia.org/wiki/ Linear_congruential_generator) Thursday, 23 September 2010
  • 25. next_val <- function(x, a, c, m) { (a * x + c) %% m } x <- 1001 (x <- next_val(x, 1664525, 1013904223, 2^32)) # http://en.wikipedia.org/wiki/ List_of_pseudorandom_number_generators # R uses # http://en.wikipedia.org/wiki/Mersenne_twister Thursday, 23 September 2010
  • 26. # Random numbers are reproducible! set.seed(1) runif(10) set.seed(1) runif(10) # Very useful when required to make a reproducible # example that involves randomness Thursday, 23 September 2010
  • 27. True randomness Atmospheric radio noise: http:// www.random.org. Use from R with random package. Not really important unless you’re running a lottery. (Otherwise by observing a long enough sequence you can predict the next value) Thursday, 23 September 2010