SlideShare una empresa de Scribd logo
1 de 38
Chapter 9 Sampling Distributions
9.1 Sampling Distributions
Definitions Parameter the value of a characteristic for the entire population attained through census in practice, is usually an unknown or estimated value
Definitions Statistic the value of a characteristic for the entire population attained through sampling In practice, the value of a statistic is used to estimate the parameter
Sampling Variability Random samples will produce different values for a statistic The statistics are usually not the same value of the parameter Different sample produce different values (all of which are “close” to the parameter) This fact is known as sampling variability The value of a statistic for the same parameter varies in repeated sampling.
Parameters Statistics ParameterStatistic Mean of a Pop	Mean of a sample Prop. of a pop.	Prop. of a sample
Sampling Distribution All samples of size n are taken from a population of size N A histogram of these sample statistics is created This distribution is called the “sampling distribution” In practice, the sampling distribution is theorized, but never “created”
Creating a Sampling Distribution Let’s look at a pop N = 5, who answered ‘yes’ or ‘no’ to the question “Do you like toast?” We want to know proportion who say ‘yes’ Here are the responses: ID		Response01		Yes			02		No			03		Yes			04		No05		Yes
Creating a Sampling Distribution Let’s look at each sample and the phatfor sample size n = 3 Sample #ID’s in samplep-hat 1			01, 02, 03		0.662			01, 02, 04		0.333			01, 02, 05		0.664			01, 03, 04		0.665			01, 03, 05		16			01, 04, 05		0.667			02, 03, 04		0.338			02, 03, 05		0.669			02, 04, 05		0.3310			03, 04, 05		0.66 You can imagine that this quickly gets labor intensive!
Creating a Sampling Distribution Create a Histogram Class	Count0.00-0.24	00.25-0.49	30.50-0.74	60.75-1.00	1 Notice that p = 0.6, and the mean of this distribution is approx 0.6 7 6 5 4 3 2 1 0		0.5		1
Describing Sampling Distributions Like most 1-var data, we describe : Center Shape Spread Unusual features/Outliers If you are using a sample to estimate a parameter, of the sampling distribution: Where should the center be? What about the “ideal shape?” What would you like the spread to be? Would outliers be helpful?
Sampling Distribution and Bias When a statistic is unbiased, the mean of the sampling distribution is the value of the parameter. This is actually a pretty powerful statement.   In order to find the value of the parameter, you just need to take a lot of samples! (wait, that’s not good either) Revision: If a statistic is unbiased, then “chances are” the value of any sample should be close to the value of parameter Statistics that are unbiased are called “unbiased estimators” (these are good)
Variability of a Statistic The spread of a sampling distribution is known as the variability of the statistic Large sample size = less variability
The Enemies of Sampling Enemy #1: Bias Enemy #2: Variability A visual of the difference:
The Enemies of Sampling Another look with Histograms:
9.2 Sample Proportions
Sampling Distribution for Proportions For each sample, calculate p-hat: The sampling distribution of p-hat will have: Mean = p (the parameter) Standard deviation:
Sampling Distribution for Proportions Notice that this is an unbiased estimator! The standard deviation decreases when the sample size is large Std. Dev. and sample size have an “inverse square” relation Ex. If we want ½ the std dev, we need to 4x the sample size Ex. If we want to 1/3 the std dev,we need to 9x the sample size
Sampling Distribution for Proportions We will (almost) always use the Normal approximation for the sampling distribution for p-hat. This means we will need some conditions: We want “N> 10n”This ensures our std dev formula holds np> 10 andnq> 10This ensures our samp. dist. is approx. Normal
Samp Dist for Prop. (Example) 	We are sampling from a large population.  Our sample size is 1500.  We know that the p = 0.35.  What is the probability that our sample is more than 2 percent from the parameter?
Samp Dist for Prop. (Example) To summarize the problem, we are trying to find out what proportion of samples have a p-hat greater than 0.37 or less than 0.33 It will be easier to use the rules of compliments and to find “1 – P(0.33 < p-hat < 0.37)”
Samp Dist for Prop. (Example) Can we use a Normal approximation for this problem?  Let’s check the conditions: Although we are not told the exact population size N, we are told the population is large. “We are told the population is large, so N > 10(1500)”  Tip: when a problem says the population is large, you are to interpret that the population is greater than 10n
Samp Dist for Prop. (Example) Can we use a Normal approximation for this problem?  Let’s check the conditions: 	2.	np = 1500(0.35) = 525 > 10nq = 1500(0.65) = 975 > 10 “Since np = 525 > 10 and nq = 975 >10 and N > 10(1500), we can use the Normal distribution” Note: It is extremely important that you state and justify the use of the Normal distribution.
Samp Dist for Prop. (Example) Time for a graph (before normalization)Remember, you don’t have to be too fancy here!
Samp Dist for Prop. (Example) Let’s Normalize!
Samp Dist for Prop. (Example) Now the normalized graph
Samp Dist for Prop. (Example) Compute the area
Samp Dist for Prop. (Example) Finish the normalized graph
Samp Dist for Prop. (Example) Summary: “The probability that a sample (n=1500) is more than 2 percent from the parameter is 0.1032” Notes: remember that in this context, probability is the same as proportion, and proportion is the same as area. Actually, you’ve done many of these kinds of problems already, right?
9.3 Sample Means
Samples vs. Census Histogram for returns on common stocks in 1987: Histogram for 5 stock portfolios in 1987
Samples vs. Census We can see from the previous slide that the distribution of samples (portfolio) Are less variable than the census Are more Normal than the census
Sampling Distribution for Means Suppose we have a sampling distribution of samples size n from a large population The mean of the sampling distribution is the mean of the population The std dev of the samp dist is given by:
Sampling Distribution for Means The sample mean is an unbiased estimator of the population mean Like for proportions, the std dev and the population size have an inverse square relation Like for proportions, we need N> 10n for our std dev formula to hold up This sampling distribution holds true even if the population is not Normal!
The Central Limit Theorem An SRS of size n from any population will produce a sampling distribution that is N( , /(n)) whenever n is large enough. Caution: this theorem is only true for means.  Do not try to use the CLT for proportions!
The Central Limit Theorem Why we use CLT: From the previous section, we saw that we use the Normal dist to gauge probability of producing samples We invoke the CLT to justify usage of the Normal distribution Using Normal dist w/o justification is a “nono”
The Central Limit Theorem When to use the CLT: Sampling Distribution for a mean () We need to Normalize the sample mean The sample is described as “large” Generally, n> 30 The raw data is not given
Stats chapter 9

Más contenido relacionado

La actualidad más candente

Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportionsAditya Mahagaonkar
 
Chapter 09
Chapter 09Chapter 09
Chapter 09bmcfad01
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)jillmitchell8778
 
Interval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionInterval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionDataminingTools Inc
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation Remyagharishs
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements DrZahid Khan
 

La actualidad más candente (19)

Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Estimation Theory
Estimation TheoryEstimation Theory
Estimation Theory
 
Estimation
EstimationEstimation
Estimation
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportions
 
Chapter 09
Chapter 09Chapter 09
Chapter 09
 
Statistics
StatisticsStatistics
Statistics
 
Estimating a Population Mean
Estimating a Population Mean  Estimating a Population Mean
Estimating a Population Mean
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)
 
Interval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionInterval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of Proportion
 
Calculating p value
Calculating p valueCalculating p value
Calculating p value
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Point Estimation
Point EstimationPoint Estimation
Point Estimation
 
Estimating a Population Proportion
Estimating a Population Proportion  Estimating a Population Proportion
Estimating a Population Proportion
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
 

Similar a Stats chapter 9

Similar a Stats chapter 9 (20)

Sampling Size
Sampling SizeSampling Size
Sampling Size
 
Statistik Chapter 6
Statistik Chapter 6Statistik Chapter 6
Statistik Chapter 6
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptx
 
How to determine sample size
How to determine sample size How to determine sample size
How to determine sample size
 
Chapter08
Chapter08Chapter08
Chapter08
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Pengenalan Ekonometrika
Pengenalan EkonometrikaPengenalan Ekonometrika
Pengenalan Ekonometrika
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Estimating population values ppt @ bec doms
Estimating population values ppt @ bec domsEstimating population values ppt @ bec doms
Estimating population values ppt @ bec doms
 
Lect w4 Lect w3 estimation
Lect w4 Lect w3 estimationLect w4 Lect w3 estimation
Lect w4 Lect w3 estimation
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.ppt
 
Lecture8
Lecture8Lecture8
Lecture8
 

Más de Richard Ferreria (20)

Chapter6
Chapter6Chapter6
Chapter6
 
Chapter2
Chapter2Chapter2
Chapter2
 
Chapter3
Chapter3Chapter3
Chapter3
 
Chapter8
Chapter8Chapter8
Chapter8
 
Chapter1
Chapter1Chapter1
Chapter1
 
Chapter4
Chapter4Chapter4
Chapter4
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter9
Chapter9Chapter9
Chapter9
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chapter11
Chapter11Chapter11
Chapter11
 
Chapter12
Chapter12Chapter12
Chapter12
 
Chapter10
Chapter10Chapter10
Chapter10
 
Chapter13
Chapter13Chapter13
Chapter13
 
Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)
 
Stats chapter 14
Stats chapter 14Stats chapter 14
Stats chapter 14
 
Stats chapter 15
Stats chapter 15Stats chapter 15
Stats chapter 15
 
Stats chapter 13
Stats chapter 13Stats chapter 13
Stats chapter 13
 
Stats chapter 12
Stats chapter 12Stats chapter 12
Stats chapter 12
 

Stats chapter 9

  • 1. Chapter 9 Sampling Distributions
  • 3. Definitions Parameter the value of a characteristic for the entire population attained through census in practice, is usually an unknown or estimated value
  • 4. Definitions Statistic the value of a characteristic for the entire population attained through sampling In practice, the value of a statistic is used to estimate the parameter
  • 5. Sampling Variability Random samples will produce different values for a statistic The statistics are usually not the same value of the parameter Different sample produce different values (all of which are “close” to the parameter) This fact is known as sampling variability The value of a statistic for the same parameter varies in repeated sampling.
  • 6. Parameters Statistics ParameterStatistic Mean of a Pop Mean of a sample Prop. of a pop. Prop. of a sample
  • 7. Sampling Distribution All samples of size n are taken from a population of size N A histogram of these sample statistics is created This distribution is called the “sampling distribution” In practice, the sampling distribution is theorized, but never “created”
  • 8. Creating a Sampling Distribution Let’s look at a pop N = 5, who answered ‘yes’ or ‘no’ to the question “Do you like toast?” We want to know proportion who say ‘yes’ Here are the responses: ID Response01 Yes 02 No 03 Yes 04 No05 Yes
  • 9. Creating a Sampling Distribution Let’s look at each sample and the phatfor sample size n = 3 Sample #ID’s in samplep-hat 1 01, 02, 03 0.662 01, 02, 04 0.333 01, 02, 05 0.664 01, 03, 04 0.665 01, 03, 05 16 01, 04, 05 0.667 02, 03, 04 0.338 02, 03, 05 0.669 02, 04, 05 0.3310 03, 04, 05 0.66 You can imagine that this quickly gets labor intensive!
  • 10. Creating a Sampling Distribution Create a Histogram Class Count0.00-0.24 00.25-0.49 30.50-0.74 60.75-1.00 1 Notice that p = 0.6, and the mean of this distribution is approx 0.6 7 6 5 4 3 2 1 0 0.5 1
  • 11. Describing Sampling Distributions Like most 1-var data, we describe : Center Shape Spread Unusual features/Outliers If you are using a sample to estimate a parameter, of the sampling distribution: Where should the center be? What about the “ideal shape?” What would you like the spread to be? Would outliers be helpful?
  • 12. Sampling Distribution and Bias When a statistic is unbiased, the mean of the sampling distribution is the value of the parameter. This is actually a pretty powerful statement. In order to find the value of the parameter, you just need to take a lot of samples! (wait, that’s not good either) Revision: If a statistic is unbiased, then “chances are” the value of any sample should be close to the value of parameter Statistics that are unbiased are called “unbiased estimators” (these are good)
  • 13. Variability of a Statistic The spread of a sampling distribution is known as the variability of the statistic Large sample size = less variability
  • 14. The Enemies of Sampling Enemy #1: Bias Enemy #2: Variability A visual of the difference:
  • 15. The Enemies of Sampling Another look with Histograms:
  • 17. Sampling Distribution for Proportions For each sample, calculate p-hat: The sampling distribution of p-hat will have: Mean = p (the parameter) Standard deviation:
  • 18. Sampling Distribution for Proportions Notice that this is an unbiased estimator! The standard deviation decreases when the sample size is large Std. Dev. and sample size have an “inverse square” relation Ex. If we want ½ the std dev, we need to 4x the sample size Ex. If we want to 1/3 the std dev,we need to 9x the sample size
  • 19. Sampling Distribution for Proportions We will (almost) always use the Normal approximation for the sampling distribution for p-hat. This means we will need some conditions: We want “N> 10n”This ensures our std dev formula holds np> 10 andnq> 10This ensures our samp. dist. is approx. Normal
  • 20. Samp Dist for Prop. (Example) We are sampling from a large population. Our sample size is 1500. We know that the p = 0.35. What is the probability that our sample is more than 2 percent from the parameter?
  • 21. Samp Dist for Prop. (Example) To summarize the problem, we are trying to find out what proportion of samples have a p-hat greater than 0.37 or less than 0.33 It will be easier to use the rules of compliments and to find “1 – P(0.33 < p-hat < 0.37)”
  • 22. Samp Dist for Prop. (Example) Can we use a Normal approximation for this problem? Let’s check the conditions: Although we are not told the exact population size N, we are told the population is large. “We are told the population is large, so N > 10(1500)” Tip: when a problem says the population is large, you are to interpret that the population is greater than 10n
  • 23. Samp Dist for Prop. (Example) Can we use a Normal approximation for this problem? Let’s check the conditions: 2. np = 1500(0.35) = 525 > 10nq = 1500(0.65) = 975 > 10 “Since np = 525 > 10 and nq = 975 >10 and N > 10(1500), we can use the Normal distribution” Note: It is extremely important that you state and justify the use of the Normal distribution.
  • 24. Samp Dist for Prop. (Example) Time for a graph (before normalization)Remember, you don’t have to be too fancy here!
  • 25. Samp Dist for Prop. (Example) Let’s Normalize!
  • 26. Samp Dist for Prop. (Example) Now the normalized graph
  • 27. Samp Dist for Prop. (Example) Compute the area
  • 28. Samp Dist for Prop. (Example) Finish the normalized graph
  • 29. Samp Dist for Prop. (Example) Summary: “The probability that a sample (n=1500) is more than 2 percent from the parameter is 0.1032” Notes: remember that in this context, probability is the same as proportion, and proportion is the same as area. Actually, you’ve done many of these kinds of problems already, right?
  • 31. Samples vs. Census Histogram for returns on common stocks in 1987: Histogram for 5 stock portfolios in 1987
  • 32. Samples vs. Census We can see from the previous slide that the distribution of samples (portfolio) Are less variable than the census Are more Normal than the census
  • 33. Sampling Distribution for Means Suppose we have a sampling distribution of samples size n from a large population The mean of the sampling distribution is the mean of the population The std dev of the samp dist is given by:
  • 34. Sampling Distribution for Means The sample mean is an unbiased estimator of the population mean Like for proportions, the std dev and the population size have an inverse square relation Like for proportions, we need N> 10n for our std dev formula to hold up This sampling distribution holds true even if the population is not Normal!
  • 35. The Central Limit Theorem An SRS of size n from any population will produce a sampling distribution that is N( , /(n)) whenever n is large enough. Caution: this theorem is only true for means. Do not try to use the CLT for proportions!
  • 36. The Central Limit Theorem Why we use CLT: From the previous section, we saw that we use the Normal dist to gauge probability of producing samples We invoke the CLT to justify usage of the Normal distribution Using Normal dist w/o justification is a “nono”
  • 37. The Central Limit Theorem When to use the CLT: Sampling Distribution for a mean () We need to Normalize the sample mean The sample is described as “large” Generally, n> 30 The raw data is not given