SlideShare una empresa de Scribd logo
1 de 74
Module 7
Interval estimators
Master for Business Statistics
Dane McGuckian
Topics
7.1 Interval Estimate of the Population Mean with a Known
Population Standard Deviation
7.2 Sample Size Requirements for Estimating the Population
Mean
7.3 Interval Estimate of the Population Mean with an Unknown
Population Standard Deviation
7.4 Interval Estimate of the Population Proportion
7.5 Sample Size Requirements for Estimating the Population
Proportion
7.1
Interval Estimate of the Population Mean with a Known
Population Standard Deviation
Interval Estimators
Quantities like the sample mean and the sample standard
deviation are called point estimators because they are single
values derived from sample data that are used to estimate the
value of an unknown population parameter.
The point estimators used in Statistics have some very desirable
traits; however, they do not come with a measure of certainty.
In other words, there is no way to determine how close the
population parameter is to a value of our point estimate. For
this reason, the interval estimator was developed.
An interval estimator is a range of values derived from sample
data that has a certain probability of containing the population
parameter.
This probability is usually referred to as confidence, and it is
the main advantage that interval estimators have over point
estimators.
The confidence level for a confidence interval tells us the
likelihood that a given interval will contain the target parameter
we are trying to estimate.
The Meaning of “Confidence Level”
Interval estimates come with a level of confidence.
The level of confidence is specified by its confidence
coefficient – it is the probability (relative frequency) that an
interval estimator will enclose the target parameter when the
estimator is used repeatedly a very large number of times.
The most common confidence levels are 99%, 98%, 95%, and
90%.
Example: A manufacturer takes a random sample of 40
computer chips from its production line to construct a 95%
confidence interval to estimate the true average lifetime of the
chip. If the manufacturer formed confidence intervals for every
possible sample of 40 chips, 95% of those intervals would
contain the population average.
The Meaning of “Confidence Level”
In the previous example, it is important to note that once the
manufacturer has constructed a 95% confidence interval, it is no
longer acceptable to state that there is a 95% chance that the
interval contains the true average lifetime of the computer chip.
Prior to constructing the interval, there was a 95% chance that
the random interval limits would contain the true average, but
once the process of collecting the sample and constructing the
interval is complete, the resulting interval either does or does
not contain the true average.
Thus there is a probability of 1 or 0 that the true average is
contained within the interval, not a 0.95 probability.
The interval limits are random variables because their values
depend upon the results of a random sample of data.
However, once they are calculated from a particular sample,
those limits are no longer random variables – they become fixed
constants, so speaking about their probability in terms of the
confidence level is no longer valid.
The Meaning of “Confidence Level”
In the diagram, there are 8 different
confidence intervals represented. Each
confidence interval was constructed using
a sample of size n, drawn from the same
population, and all of the intervals have a
95% level of confidence. The vertical line
in the diagram indicates where the popula-
tion mean is located.
7 of the intervals capture the population mean, but the second
interval does not.
If we looked as a very large number of these intervals,
approximately 5% (100% - 95%) of them would fail to include
the mean.
All of the others would contain the mean as expected.
Confidence Interval for Estimating a Population Mean (with
Sigma Known)
The formula for the confidence interval to estimate the mean
consists of two values; a lower limit and an upper limit.
Confidence Interval for
where n = the sample size
σ = the population standard deviation
= the sample mean
= the z-score separating an area of in the upper tail of the
standard normal curve
The formula is often expressed as: Lower LimitUpper Limit
Confidence Interval for Estimating a Population Mean (with
Sigma Known)
However, the most common way to express the formula for the
confidence interval to estimate the mean is
where is called the margin of error.
The Margin of Error
In the formula for the confidence interval to estimate the
population mean (with σ known), there is a quantity called the
margin of error.
The margin of error is the maximum likely difference observed
between the sample mean and the population mean , and it is
denoted by E.
The margin of error for the confidence interval to estimate the
mean is given by the following formula:
where n = the sample size
σ = the population standard deviation
= the sample mean
= the z-score separating an area of in the upper tail of the
standard normal curve
The Margin of Error
The margin of error is what determines the width of the
confidence interval.
The width of a confidence interval is given by:
When estimating the mean using a confidence interval, the
smaller the margin of error, the better.
Since the confidence interval is designed to contain the mean, a
narrow interval gives us a better idea of where the mean is
located.
The Importance of a Known Population Standard Deviation for
a Confidence Interval
The population standard deviation for a random variable is part
of the margin of error formula used to estimate the population
mean of that random variable.
The reason for this is that the population standard deviation is
needed to determine the precise standard error of the sample
mean.
If we do not know the precise standard error of the sample
mean, we cannot guarantee the level of confidence specified for
the interval.
The standard error for :
The margin of error used to estimate
The critical value is determined by assuming the distribution of
the sample mean is normally distributed with an unknown mean
of and a known standard error of .
The Importance of a Known Population Standard Deviation for
a Confidence Interval
If we do not know , then we cannot know the standard error of
the sample mean.
This would prevent us from stating an accurate confidence level
for our interval estimate.
For this reason, we should know the population standard
deviation when using the following confidence interval formula
to estimate :
Formula to estimate (with known):
How to Find Critical Z Values
Find the critical Z value needed to construct a 90% confidence
interval: .
From the normal table,
looking up the probability
value of 0.4500, we get
because 0.4500 fell
between 1.64 and 1.65.
The Margin of Error when Sigma is Known
The mean quarterly earnings per share for a sample of 36 stocks
is $10.52, and the population standard deviation is $2.50.
Calculate the margin of error that would be used when
estimating the population mean with a confidence level of 95%.
The margin of error is given by:
Here n = 36, , CL = 0.95, So
Then
The Steps to Create a Confidence Interval when Sigma is
Known
Step 1: Gather the sample size, sample mean, population
standard deviation, and a confidence level
Step 2: Find the Z critical value
Step 3: Calculate the margin of error
Step 4: Calculate the confidence interval
The Confidence Interval for the Mean when Sigma is Known
A student wants to estimate the average amount of time it takes
to commute to campus from her apartment. For a random
selection of 32 days, she times her commute. The average
commute time for those days was 15.3 minutes. Assume the
population standard deviation is 3.5 minutes and form a 98%
confidence interval to estimate the true mean commute time for
the student.
Here n = 32, , CL = 0.98, So
The margin of error is given by:
The 98% Confidence interval is:
So we are 98% confident that the true mean is between 13.86
minutes and 16.74 minutes.
A Confidence Interval
A logistics company claims the true average price for a gallon
of regular, unleaded gas is $3.35. A researcher has recently used
sample data to form a 98% confidence interval estimate of the
true average price of regular, unleaded gas. The interval is
given by $3.26 ± $0.06. Do the results contradict the logistics
company’s claim?
The 98% confidence interval in this case is: ($3.26 - $0.06,
$3.26 + $0.06) = ($3.20, $3.32) which shows that we are 98%
confident that the true average price per gallon of unleaded gas
will be between $3.20 and $3.32.
So $3.35 is outside of this interval, the results of the researcher
does contradict the logistics company’s claim.
On the other hand, if the company had claimed that the average
price per gallon of unleaded gas was $3.31, this would have
been included in the interval and the results would not have
contradicted the company’s claim.
The Factors Affecting the Margin of Error or Width of
Confidence Intervals
The Margin of Error determines how wide our confidence
interval will be.
We do not want wide confidence intervals because narrow
intervals give us a better idea of where the mean lies on the
number line.
There are two ways to reduce the error in a confidence interval:
(1) decrease the confidence level, or (2) increase the sample
size.
Increasing the sample size is not always possible because of
costs and implementation considerations.
The population standard deviation is given with the data and
cannot be changed.
7.2
Sample Size Requirements for Estimating the Population Mean
The Formula for Determining the Sample Size Needed to
Estimate the Population Mean
We can use the formula for the Margin of Error to derive a
formula that will tell us the sample size needed to produce a
confidence level that has the particular margin of error and a
desired confidence level. That formula is:
The Special Rounding Rule for Sample Size Calculations
Remember than n represents the number of subjects that were
measured or surveyed in our study, so it cannot be a decimal or
a fraction.
When a decimal occurs, we always want to round up because
rounding up will produce less error in our confidence interval
while rounding down would produce more error.
Determining the Sample Size Needed to Estimate the Population
Mean
A stockbroker on Wall Street wants to estimate the average
daily-high price for a stock. What sample size is necessary to
form a 99% confidence interval to estimate the mean daily-high
within 0.50 dollars? Assume the population standard deviation
is known to be 4.59 dollars.
Here CL = 0.99,
So the sample size n is:
, by rounding up
Thus the sample size necessary to form a 99% confidence
interval is 560.
7.3
Interval Estimate of the Population Mean with an Unknown
Population Standard Deviation
Estimating the Mean when the Population Standard Deviation is
Unknown
Often we do not know the population standard deviation (σ)
when attempting to estimate the population mean using a
confidence interval.
When the population standard deviation is unknown, we must
use the sample standard deviation as a substitute
However, since the sample standard deviation is not the same as
the true population standard deviation, we cannot u se the z
distribution to construct the confidence interval.
When the population standard deviation is unknown, we use the
t distribution to form our interval estimate of the mean.
Like the z distribution, the t distribution is a bell-shaped
distribution, so we will still need to assume the sample mean
has a normal (or approximately normal) distribution to use the t
distribution.
Estimating the Mean when the Population Standard Deviation is
Unknown
When do we use the t distribution to estimate the population
mean?
When the population standard deviation (σ) is unknown, and we
can assume the distribution of the sample mean is
approximately normally distributed.
The Similarities Between the t and z Distribution
It should be stated at the outset that there is not just one t
distribution, but a family of t distributions.
For every different sample size n, (degree of freedom), there is
a slightly different corresponding t distribution.
These infinitely many t distributions will be defined by their
specific degrees of freedom (n – 1).
The family of t distributions is similar to the standard normal
(z) distribution in several important ways
The most basic similarity between the t distributions and the
standard normal distribution is the fact that they are continuous
distributions.
The shapes of the curves are also similar – both distributions
are symmetric and mound-shaped (i.e., bell-shaped).
The family of t distribution curves and the standard normal
curve have the same mean – that mean is zero.
The Similarities Between the t and z Distributions
The diagram contains the graph of the standard normal
distribution and the t distribution for a sample size of 12
The Differences that Exist Between the t and z Curves
The family of t distributions and the standard normal
distribution (z) are similar in three ways: (1) both a continuous
distributions, (2) both are bell-shaped distributions, and (3)
both have a mean of zero.
The differences that exist between the curves all stem from the
fact that they have different standard deviations.
The standard deviation for the standard normal curve is 1,
whereas for the family of t distributions, the standard deviation
varies, but it is always greater than 1.
For every different sample size n (and degree of freedom n – 1),
there is a slightly different standard deviation for the
corresponding t distribution.
This is the only thing that differentiates the otherwise identical
t curves from each other.
The Differences that Exist Between the t and z Curves
Since all probability distributions must have a total area of one,
the different standard deviations affect the overall shape of the
curves in a predictive way.
Curves with greater variation (a higher standard deviation) will
be flatter on top and more spread out.
This means that there is more area in the tails and less in the
center of the distribution.
When the curve has less variation (a smaller standard
deviation), it will have more data in the center and small tail
areas.
The Differences that Exist Between the t and z Curves
The closer the standard deviation is to 1, the more the t
distribution will look like the z distribution.
For the family of t distributions, there is an inverse relationship
between sample size (degrees of freedom) and standard
deviation.
As the sample size increases, the corresponding t distributions
have smaller and smaller standard deviations.
This implies that as n increases, the t distributions become more
and more like the standard normal distribution.
The Differences that Exist Between the t and z Curves
In the diagram, the two t distributions are graphed along with
the standard normal curve. In comparison to the standard normal
curve, you can see that the two t curves are thicker (i.e., have
more density) in the tails and have less area at the center. The
smaller the sample size the more pronounced these differences
are.
Confidence Interval for Estimating a Population Mean (with
Sigma Unknown)
The formula for the confidence interval to estimate the mean
consists of two values; a lower limit and an upper limit:
Confidence Interval for
where n = the sample size
σ = the population standard deviation
= the sample mean
= the t score separating an area of in the upper tail of the t
distribution with degrees of freedom
Lower LimitUpper Limit
Confidence Interval for Estimating a Population Mean (with
Sigma Unknown)
The formula is often expressed as:
However, the most common way to express the formula for the
confidence interval to estimate the mean is
, where
is called the margin of error.
Find Critical t Values
Assuming the population is approximately normal and sigma is
unknown, find the appropriate critical value for a 90%
confidence interval with a sample size of 20.
Since the population is approximately normal, sample size is
small and sigma is unknown, we need to use the t distribution
and hence calculate the t critical value. So here:
CL = 0.90
n = 20
df = n – 1 = 19
So the critical value is: (from the t-table on the next slide)
Find Critical t Values: t Table
Form the Margin of Error when Sigma is Unknown
Find the margin of error for a 98% confidence interval estimate
of the population mean when sigma is unknown. The sample
size is 15. The standard deviation is 20.1, and the data appear to
be normally distributed.
Here:
CL = 0.98; n = 15; df = n – 1 = 14; s = 20.1.
So the critical value is: .
So the margin of error is:
The Steps to Create a Confidence Interval (Sigma is Unknown)
The following are the steps to create a confidence interval when
sigma is unknown:
Gather the sample data for the problem, which will include and
the confidence interval.
Find the critical value
Calculate the margin of error (E).
Form the interval by subtracting the margin of error from the
sample mean and adding the margin of error to the sample mean
Construct a Confidence Interval when Sigma is Unknown
A waiter wants to know the average amount of time it takes a
table of guests in his section of the restaurant to “turn” (sit,
order, eat, pay, and leave). He times a random selection of 25
tables over several busy nights. For those tables, the average
time to turn was 42.1 minutes. The sample standard deviation
was 4.7 minutes. Assume the turn times are normally
distributed, and form a 90% confidence interval for the true
mean time to turn a table in this waiter’s section of the
restaurant.
Since the population standard deviation is not given and the
distribution of turn times is given to be normal, we use the t
distribution.
CL = 0.90; n = 25; df = n – 1 = 24; s = 4.7. So
, and the margin of error is .
So the 90% confidence interval for the true mean time to turn is
given by
7.4
Interval Estimate of the Population Proportion
Population Proportion
The term proportion refers to the fraction, ratio or percent of
the population having a particular trait of interest.
The symbols for population proportion and sample proportion
are ρ (rho) and (p-hat) respectively.
Examples of Population Proportions:
In the United States of America, 16.7% of all babies born have
blue eyes
In 2013, 31.7% of the U.S. population, aged 25 or older, held a
bachelor’s degree or higher.
85% of 18 to 24 year olds, who were raised by at least one
parent having a bachelor’s degree or higher, will attend college.
The Sample Proportion
To calculate the proportion of a sample that has some trait of
interest, we divide the number of subjects (or items) that have
the trait by the number of subjects (or items) belonging to the
sample.
Formula for the sample proportion ():
where
x = the number of subjects (or items) having the trait of interest
n = total number of subjects (or items) sampled
The Sample Proportion
For example, consider the survey results below:
The proportion of students reporting that they earned an A in
Business Statistics is given by:
Number of Survey ParticipantsNumber who earned an A in
Business Statistics21520
The Sampling Distribution of the Sample Proportion
Recall that if we randomly select n subjects and x of them have
some trait we are interested in, the sample proportion formed
from the data is:
where x = the number of subjects having the trait we are
interested in.
We use as a point estimate of the population proportion (ρ).
For different samples of size n, a different number (x) of
subjects will have the trait of interest.
This means the value of will vary from sample to sample.
If we want to use it to form an interval estimate of the true
population proportion (ρ), it is important that we know the
sampling distribution of .
The Sampling Distribution of the Sample Proportion
The sampling distribution of
is approximately normally distributed
The expected value (mean) for is the population proportion (ρ).
The standard error for is
can be assumed to be approximately normally distributed when
both and .
We can approximate the standard error of as
45
The Sample Size Requirement for Estimating the Population
Proportion
When constructing a confidence interval to estimate the
population proportion, we can assume is approximately
normally distributed if both and .
Example: A large corporation wants to estimate the proportion
of its part-time employees that would enroll in the company
health insurance plan, if it were made available to them. A
survey of 500 randomly selected part-time employees reveals
that 285 of them would enroll in the plan.
In this example, the sample size is 500 and the number of
employees interested in enrolling in the plan is 285. Using these
quantities, we can calculate the sample proportion ().
The Sample Size Requirement for Estimating the Population
Proportion
Using the sample proportion as an estimate for the population
proportion (ρ), we can check the sample size requirement to
ensure the sampling distribution of the sample proportion is
approximately normal.
and
Since both of the results above are at least 5, it is appropriate to
assume the sampling distribution of the sample proportion is
approximately normally distributed.
Formula to Calculate a Confidence Interval for the Population
Proportion
The formula to calculate the confidence interval for the
population proportion ( is given by
where
is the sample proportion
n is the sample size
is the critical value linked to the confidence interval
Constructing a Confidence Interval for the Population
Proportion
An efficiency consultant studied a random selection of 200 e-
mails received by company employees, to determine how many
were relevant to the recipient. Only 36 of the emails were
relevant to their recipients. Form a 95% confidence interval to
estimate the true proportion of relevant emails received by a
typical employee.
Here n = 200, (as only 36 out of the 200 e-mails were relevant),
, CL=0.95, .
So (from the z-table on the following slide) and the margin of
error is:
So the 95% confidence interval estimate is
We are thus 95% confident that the true proportion of relevant
emailed received by a typical employee lies between 0.127 and
0.233.
Question: Can we say that it seems that less than a quarter of
emails are relevant? Yes because the upper limit 0.233 < 0.25.
Constructing a Confidence Interval for the Population
Proportion
Interpreting a Confidence Interval for the Population Proportion
The CEO of a logistics company claims that only 5% of its
holiday deliveries arrive late. A 98% confidence interval to
estimate the proportion of late deliveries produced the
following interval: 0.06 to 0.11. Does the interval contradict the
CEO’s claim?
According to the confidence interval estimate, the true
proportion of late deliveries lies between 6% and 11% with 98%
confident.
Since both these numbers are higher than the stated value of 5%
(that is, the interval does not contain 5%), the CEO’s claim is
contradicted.
7.5
Sample Size Requirements for Estimating the Population
Proportion
The Formula for Estimating Sample Size for the Population
Proportion
The sample size formula when estimating a population is used
to specify the sample size required to guarantee that your
confidence interval has a certain margin of error and a certain
confidence level.
It is derived by taking the margin of error (E) from the
confidence interval formula for estimating the population
proportion and solving for n.
The equation is:
Since is unknown, we substitute the value 0.5 in the equation
because 0.5(1-0.5) is the maximum, so the value of n obtained
with this value of will be guaranteed to be as large as it
possibly need to be to cover all possible scenarios.
Calculate the Sample Size Needed to Estimate the Population
Proportion
A sales manager at a local car dealership wants to estimate the
proportion of used car sales that include an extended warranty.
What size sample would be needed to estimate the proportion of
extended warranties sold with error of no more than 0.05 and a
confidence level of 99%?
Here E = 0.05, CL=0.99, , and we use
We use the formula:
So n = 664, as we always “round up” in case of sample size
determination.
Module 6
point estimators and sampling distributions
Master for Business Statistics
Dane McGuckian
Topics
6.1 Point Estimators and Sampling Distributions
6.2 The Central Limit Theorem
6.1
Point Estimators and Sampling Distributions
Sampling Distributions
The sampling distribution of a statistic is a probability
distribution for all of the possible values of a sample statistic
that can be derived from samples of a given size.
Recall that a probability distribution provides all possible
outcomes for an experiment and the probability associated with
each of these outcomes.
Example: If we took every possible random sample of 25 values
from a population and calculated the sample mean for each
sample, the resulting sampling distribution for the sample mean
would provide all possible means that could result from a
sample of 25 values drawn from this population along with the
probability that each of those means occurs.
Depending on the type of data involved, the sampling
distribution can be represented in a table format, as a histogram,
or as a formula.
Sampling Distributions
There are essentially three things we want to know about the
sampling distribution for any sample statistic:
What is the shape of the sampling distribution?
Where is the center (the mean) of the sampling distribution?
How much spread or dispersion (variation) does the sampling
distribution have?
Sampling Distributions
Example: Imagine that we select 2 balls, with replacement, from
a box containing two numbered balls and average the values that
appear on the selected balls. One of the balls has the number 0
printed on it, and the other has the number 1 printed on it. In
this scenario, what is the sampling distribution for the sample
mean?
Let’s begin by listing all of the possible outcomes for the two
selections. The possible outcomes are: 00, 01, 10, and 11. Next,
we can determine each of the possible means:
Sampling Distributions
Because each ball has an equal probability of being chosen,
each of the listed outcomes on the previous slide (00, 01, 10,
and 11) has an equal chance of occurring (
Consider the table below:
Next, we will convert this table into a probability distribution
for t he sample mean.
SampleP()0,000.250,10.50.251,00.50.251,110.25
Sampling DistributionP()00.250.50.510.25
Sampling distribution of the Sample Mean
Now that we have the probability distribution for the sample
mean, we can use it to calculate the mean of the sample means
and the standard deviation of the sample means:
Point Estimators
A point estimate is a statistic computed from a sample that is
designed to estimate a population parameter.
The preferred estimate for the population mean () is the sample
mean ().
So if we want to estimate the population mean, we would get
some sample data and then we would determine the sample
mean for the sample data, and that would be our point estimate.
The Standard Error of an Estimator
The Standard Error of an Estimator tells us how the estimator
will vary from sample to sample.
The estimator will not be the same for every sample, so the
standard error helps us understand how consistent the estimator
will be from sample to sample.
Population mean:
Point Estimator:
Standard Error:
The Desired Traits of a Point Estimator
Ideally, our point estimators should be unbiased estimators.
Among unbiased estimators, we want the estimator with the
minimum variance.
If an unbiased estimator is available they are preferred over
biased estimators.
Example: Estimator A is not unbiased
because it misses almost always.
Estimators B and C are unbiased.
Estimator C has smaller variance than
Estimator B, hence it is called Minimum
Variance Unbiased Estimator (MVUE).
6.2 The Central Limit Theorem
The Central Limit Theorem
The Central Limit Theorem states that for a sufficiently large
sample of size n, taken from a population that is not normally
distributed, the sample mean has an approximately normal
probability distribution.
In most cases, a sample size greater than thirty is large enough
to assume that is approximately normal.
The Central Limit Theorem
The Central Limit Theorem describes the sampling distribution
of the sample mean.
If all samples of size n are selected from a population of
measurements with mean, , and standard deviation, , the
distribution of the sample mean has the following mean and
standard deviation (standard error):
Mean of the sample means is:
Standard error of the sample mean (the standard deviation of the
distribution of sample means) is:
The Central Limit Theorem
If the population of measurements is normally distributed, the
distribution of the sample means will be normal regardless of
the size (n) of the sample.
However, if the population of measurements is not normally
distributed, the distribution of the sample means will only be
approximately normal when the sample size is suitably large.
As a good rule of thumb, we will assume that any sample size
larger than 30 is large enough to ensure the distribution for the
sample means is approximately normal.
This approximation will improve for larger values on n.
The Central Limit Theorem
Examples:
The random variable X has a highly skewed distribution. If
samples of size 5 are taken from the population of X values, the
distribution of the sample means will not necessarily be normal;
however, if samples of size 35 are taken from the population,
we can assume the distribution of the sample means will be
approximately normal.
The random variable X has a normal distribution. If samples of
size 2 are taken from the population of X values, the
distribution of the sample means will be normal because the
distribution of the sample means is normal at any sample size
when X is normal.
The Mean of the Sample Mean
When discussing the Central Limit Theorem, which describes
the sampling distribution of the sample mean, we stated that the
mean of the sample means for all samples of size n is always
equal to the population mean.
In other words, if all samples of size n are selected from a
population of measurements with mean , the mean of the sample
means is .
Example: If the true average IQ score for a population is 100
and every possible sample of size 15 is taken from the
population, the sample means calculated from each of those
samples will have an average equal to100 because that is the
mean for the population.
For any particular sample size, the mean of all of the sample
means is equal to the population mean.
The Mean of the Sample Mean
The Sample Mean IQ Scores for all Possible Groups of 15
People: (this is a partial list because the actual list would be
very long)
To understand the idea discussed on the previous slide, imagine
that for each sample of 15 individuals selected we calculate an
average IQ score.
These sample means will be recorded (perhaps in a list like the
ones illustrated above), and once we have calculated a sample
mean from every possible sample of 15 people, we will then
average all of those sample means in our list.
The result will be the population mean IQ score, which in this
case is 100.
105989711195919410011310195…
The Standard Error of the Mean
When we introduced the Central Limit Theorem, which
describes the sampling distribution of the sample mean, we
discussed the standard error of the mean (the standard deviation
of the sample means) for all samples of size n.
In that discussion we stated that if all samples of size n are
selected from a population of measurements with standard
deviation , the standard error of the mean is .
Example: If the true standard deviation for IQ scores for a
population is 15 and every possible samples of size 9 is taken
from the population, the sample means calculated from each of
those samples will have a standard error that is equal to
For any particular sample size, the standard error for the mean
is equal to the population standard deviation divided by the
square root of the sample size.
The Standard Error of the Mean
This definition of the standard error for the mean assumes that
the sampling is done with replacement of that the population we
are sampling from in infinite.
Sampling with replacement implies that a value that has been
selected during the sampling procedure is available to be
selected again and again in the same sample.
In the extreme case, this sampling procedure could produce a
sample of n measurements which consists entirely of one value
repeated n times.
This underlying assumption is a concern, because typically we
do not take samples from infinite populations, and typically, we
do not sample with replacement.
For example, if we are conducting a study on human height by
measuring 10 randomly selected people, we probably would not
want our sample to consist of one person’s height repeated 10
times.
The Standard Error of the Mean
Fortunately, we can modify the standard error formula to
accommodate the finite population case pretty easily.
To find the standard error of the mean when sampling from a
finite population, we use a multiplier often referred to as the
finite population correction factor:
where N is the size of our population and we are selecting a
sample of size n
If we are taking a sample of size n, without replacement,
from a finite population of size N, the standard error for
the mean becomes:
The Standard Error of the Mean
The formula for the standard error of the mean when sampling
from a finite population only differs from our previous formula
by the finite population correction factor., and often, we can
ignore this difference.
When sampling from a large finite population without
replacement, it is acceptable to use the original formula we
provided as an approximation to the standard error of the mean.
How large does our finite population have to be to use this
approximation?
Typically, if our sample size is not more than 5 percent of the
population, we can use the population standard deviation
divided by the square root of the sample size to approximate our
standard error for the mean.
For the exercises included in this course, we have assumed that
you will not be using the finite population correction factor.
This means you can safely use the formula provided on the first
slide when you are asked to determine the standard error for the
mean.
The Variation in the Sample Means
If all samples of size n are selected from a population of
measurements with standard deviation , the standard error of the
mean is .
Because the standard error of the mean is equal to the standard
deviation, σ, divided by the square root of the sample size, the
standard error for the mean is always less than the standard
deviation for the random variable.
The Variation in the Sample Means
This implies that a set of sample means from a population will
also exhibit less variation than the random variable for that
same population.
For example, if the sample size for the sample means is 4, the
standard deviation for the sample means will be half as large as
the standard deviation () for the random variable.
This means the distribution for the sample means is more
clustered around the mean for the population than the
distribution for the random variable is.
This is a useful trait because it implies that as the sample size
increases, our sample means will move closer and closer to the
true population mean.
The Variation in the Sample Means
Example: An investor has two sets of data involving the closing
stock price for a company in the NASDAQ. One set of data
contains the closing stock prices for a random selection of 12
days taken over the course of the year, and the other set of data
contains 12 averages obtained from random samples of 4 days
of closing prices taken over the same year. Which data exhibits
a larger amount of variation?
The Variation in the Sample Means
Closing prices of 12 randomly chosen days (sample standard
deviation s = $85.24):
Average closing prices for 12 samples of n = 4 days (sample
standard deviation s = $38.89):
It is clear that the set of averages has far less dispersion than
the set of individual observations.
If we had every sample mean possible for all samples of four
(selected with replacement), the standard deviation of these
sample means would be , where is the standard deviation for
the daily closing prices.
The Central Limit Theorem and Calculating Probabilities for the
Sample Mean
A software company’s average daily stock price last year was
$38.12. The standard deviation for those prices was $2.45. If a
random selection of 32 days were chosen from last year, what is
the probability that the average price of the company’s stock for
those 32 days is more that $37.00?
By Central Limit Theorem, since the sample size (32) is greater
than 30, the distribution of the average stock prices is
approximately normal. So
The z-score for $37 is:
.
So
Module 5
continuous random variables
Master for Business Statistics
Dane McGuckian
Topics
5.1 Continuous Random Variables
5.2 The Normal Distribution
5.3 Applications of the Normal Distribution
5.4 Normal Approximation to the Binomial Distribution
5.1
Continuous Random Variables
Continuous Random Variables
Continuous random variables usually result from measuring
something like a distance, a weight, a length of time, a volume,
or some other similar quantity.
Because they can take on any value inside a particular interval,
there are an infinite, uncountable number of possible values for
any continuous random variable.
For this reason, when working with a continuous random
variable, we will discuss the probability that the random
variable is within some specified range.
Continuous Random Variables
Example: A fast-food restaurant manager tracks the length of
time his customers wait or their orders.
The random variable is continuous because it consists of
measured lengths of time.
We could consider the probability that a customer waits more
that five minutes for his or her order, less than five minutes,
between four and five minutes, or some other suitable interval
of time.
But the probability that a customer waits exactly five minutes
for the order is zero.
A consequence of continuous random variables having an
infinite, uncountable set of possible set of values is that the
probability of any continuous random variable equaling a
specific value is always zero.
Continuous Random Variables
The probability distribution for a continuous random variable is
usually represented by a function called the probability density
function (pdf).
These functions produce smooth curves when graphed, and
probability for the random variable is defined as the area under
the curve between any two specified points.
Discrete versus Continuous Random Variables
It is common to discuss the probability that a discrete random
variable takes on a specific value, but because a continuous
random variable has an infinite number of possible values in a
particular range, we do not typically discuss the probability that
a continuous random variable takes on a specific value.
Continuous probability distribution
Discrete probability
distribution
The Area under a Continuous Probability Function
In a continuous probability distribution, the probability than an
event, x, is between two numbers is represented by an area, A.
The area under the curve represents all the possible
probabilities that can occur from negative infinity to positive
infinity.
Because the total probability for all continuous probability
distributions is one, the area under the curve must also be one.
Continuous Uniform Distribution
A continuous random variable has a uniform distribution if the
graph of its probability distribution is rectangular in shape and
can be completely defined by its minimum and maximum
values.
Like all continuous distributions, the total area under the graph
of a uniform distribution is equal to one, and there is a direct
relationship between the area under the curve between two
specific points and the probability of the random variable
assuming a value between those two points.
The mean for the uniform distribution is
Continuous Uniform Distribution
The standard deviation for the uniform distribution is
where:
is the minimum value for the distribution
is the maximum value for the distribution.
For any uniform distribution, there is a uniform height to its
curve.
The height of the uniform distribution for any value such that
is given by .
For any value outside of the interval the height of the curve is
zero.
Continuous Uniform Distribution
Since the shape of the uniform curve is rectangular and
probability corresponds to area under the curve, the probability
that for a uniformly distributed random variable defined by the
interval is
when .
The Probability of a Uniform Distribution
For a uniform distribution defined on the interval
when .
Probabilities for the Uniform Distribution
The amount of time it takes an accountant to prepare tax returns
for her clients is uniformly distributed over the interval between
15 minutes and 60 minutes. What is the probability that she will
finish a tax return in 40 minutes or less?
Here a = c = 15; b = 60, d = 40. So
Thus there is a 55.6% probability that she will finish a tax
return in 40 minutes or less.
5.2 The Normal Distribution
The Normal Distribution
The normal distribution is a continuous distribution that appears
in many applications
Many natural phenomena can be modeled using the normal
probability density function.
The formula for the normal curve is
where
Note: and
The Normal Distribution
Notice that the formula for the normal distribution contains the
symbols and , which represent the mean and the standard
deviation respectively.
The mean determines the location of the curve on the number
line, and the standard deviation determines the width or spread
of the distribution.
The values of these parameters depend on the population being
studied.
For this reason, the formula on the previous slide actually
represents a family of normal distributions, not just one curve.
Example: The heights of men have a normal distribution, and
men have mean height of 69 inches.
The heights if women are also normally distributed, but the
mean for women’s heights is 64 inches.
The two curves also have different standard deviations.
The Normal Distribution
In the illustration below, the two normal curves have different
means and different standard deviations.
The difference in the shape of the two curves is a result of the
curves having different standard deviations.
The difference in their position on the number line is due to
their different means.
The taller and the narrower curve belongs to the distribution
with the smaller standard deviation.
The Shape of the Normal Distribution
The graph of the normal probability density function is bell-
shaped, but there is not just one normal curve.
There is no limit to the different possible combinations of and ,
so there are an infinite number of different normal curves.
The particular scale and location of a normal distribution will
depend on the distribution’s specific mean and standard
deviation.
However, all normal curves are bell-shaped, and they are always
perfectly symmetric around their mean.
The Shape of the Normal Distribution
The graph of the normal probability distribution function is
bell-shaped and perfectly symmetric around its mean.
This indicates that the left side of the normal distribution is a
perfect mirror image of the right side of the distribution.
Since the total area under all normal curves is 1.00 and all
normal curves are symmetric around their mean, half of the area
(0.50) is below the mean, and half of the area (0.50) is above
the mean.
This is very useful information.
Example: If women’s heights are normally distributed and the
average height for women is 64 inches, we can say with
certainty that half of all women are shorter than 64 inches.
Of course that also implies half of all women are taller than 64
inches.
The Normal Distribution
There is not just one normal curve but there are an unlimited
number of normal curves.
Example: Human height is normally distributed, but the heights
of men and women form different normal distributions.
IQ scores are also normally distributed, but those scores form a
different normal curve than the ones formed by male and female
heights.
The list of examples is endless, so when we speak of the normal
distribution, we are referring to a family of curves that have the
same underlying structure.
The mean and standard deviation, and , allow a single
probability density function to produce a family of normal
distributions.
Converting Normal Random Variables into Standard Normal
Random Variables
When working with normal random variables, we have a need to
find areas or probabilities, but the probability density function
for the normal distribution is mathematically difficult to work
with.
For this reason, when solving problems involving a normally
distributed random variable, it would be very helpful to have a
table of probabilities for the normal curve.
However, there isn’t just one normal curve to tabulate
probabilities for. Because there is no limit to the different
possible combinations of and , there are an infinite number of
different normal curves. Therefore, we would need an infinite
number of normal probability tables to handle every possible
application of the normal curve.
Fortunately, there is a way to work around this difficulty.
Converting Normal Random Variables into Standard Normal
Random Variables
It is possible to convert any normal random variable with mean
() and standard deviation () into a standard normal random
variable.
A standard normal random variable is a normally distributed
random variable that has a mean equal to zero ( = 0) and a
standard deviation equal to one ( = 1).
To convert a normal random variable () into a standard normal
random variable (), we use the following formula:
where
is the value of a measurement (or observation) taken from a
normally distributed population
is the mean of the distribution for
is the standard deviation of the distribution for
is the standard normal value
Parameters of the Standard Normal Distribution
A standard normal random variable is a normally distributed
random variable that has a mean equal to zero ( = 0) and a
standard deviation equal to one ( = 1).
Because the standard normal distribution has a mean of zero and
a standard deviation of one, values equate to the number of
standard deviations above (or below) the mean.
Example: A standard normal value of 1 is the same as one
standard deviation above average.
By using a standard normal probability table, it is possible to
find the probability that a standard normal value falls between
any two points on the –axis.
Z tables and Finding the Areas Under the Standard Normal
Curve Between the Mean and a Value
Use the standard normal curve to find
P(
, because
the normal curve is symmetric.
= 0.4429 from the table.
Areas Under the Standard Normal Curve Inside an Interval
Surrounding the Mean
Use the standard normal curve to find
because of symmetry
, from the normal table
= 0.7597
Areas Under the Standard Normal Curve between a Positive Z
Value and Infinity
Use the standard normal curve to find
, as the total area to the left of the mean (0) is 0.5
, from the table
Areas Under the Standard Normal Curve between Two Values
on the Same Side of the Mean
Use the standard normal curve to find
Areas Under the Standard Normal Curve between a Negative Z
Value and Infinity
Use the standard normal curve to find
, as the total area to the left of the mean (0) is 0.50.
, from the table
5.3
Applications of the Normal Distribution
The Probability that a Non-Standard Normal Random Variable
is Greater than an Above-Average Value
The time it takes a computer chip manufacturer to produce a
single chip is normally distributed with a mean of 18.0 seconds
and a standard deviation of 1.2 seconds. Find the probability
that a chip will take longer than 19.8 seconds to produce.
The Probability that a Non-Standard Normal Random Variable
is Less than an Above-Average Value
A large investment bank in Miami released a report on the
starting salary offers it made to MBA graduates. The salaries
are normally distributed with a mean of $89,200 and a standard
deviation of $2,100. Find the probability that a randomly
selected MBA graduate was offered a starting salary of less than
$92,000.
The Probability that a Non-Standard Normal Random Variable
is Between Two Values that Surround the Mean
The containers on a mega-cargo ship in the port of Los Angeles
have weights that are normally distributed with a mean of
55,600 pounds and a standard deviation of 2,800 pounds. What
is the probability that a randomly selected container from the
ship weighs between 53,123 pounds and 60,123 pounds?
The Probability that a Non-Standard Normal Random Variable
is Between Two Values that are on the Same Side of the Curve
A manufacturer produces gears for use in an engine’s
transmission that have a mean diameter of 10.00mm and a
standard deviation of 0.03mm. The lengths of these diameters
have a normal distribution. Find the probability that a randomly
selected gear has a diameter between 9.94mm and a 9.96mm.
The Value Corresponding to an Upper Percentile of the Normal
Distribution
A company in California is concerned about the length of time
that its employees spend commuting to work. The one-way
commute times for its employees are normally distributed with a
mean of 32.1 minutes and a standard deviation of 5.3 minutes.
What is the commute time that separates the longest 20% of
commutes from the rest?
Here we will work in the “reverse” direction – from % Z
(from table) X (using formula).
The z-value such that
is 0.84 (closest
probability in the table being 0.2995.
Now,
So 36.6 minutes is the commute time that separates the longest
20% of commutes from the rest.
The Value Corresponding to a Lower Percentile of the Normal
Distribution
A financial services company gives an analytical reasoning test
to all job applicants. The completion times for the test are
normally distributed with a mean of 50.40 minutes and a
standard deviation of 3.10 minutes. What completion time
separates the fastest 6% of applicants from the others?
We again work in the “reverse”
direction. Here the z-value such
that
is -1.555 (closest
probability value is 0.4394).
Now
So 45.58 minutes is the completion time that separates the
fastest 6% of applicants from the others.
5.4
Normal Approximation to the Binomial Distribution
Using the Normal Distribution to Approximate the Binomial
Distribution
When using the normal curve to estimate a binomial probability
distribution, we must check two things to confirm the fit is
reasonably good:
If either of these is not true, we need to find a different method
of approximation.
The Use of the Continuity Correction Factor
Continuity correction is used when using the normal
approximation to binomial probability.
Example: The rectangle for x=2 actually goes from 1.5 to 2.5 on
the normal distribution.
Therefore we need to add or subtract that extra 0.5 when we are
looking at the probability that x is less than or greater than 2.
Know the Reason for the Use of the Continuity Correction
Factor
A marketing firm for the movie industry reports that the average
film is 128 minutes with a standard deviation of 15 minutes.
Assuming these film durations have a bell-shaped distribution,
what percent of films have a duration between 158 minutes and
173 minutes?
The area marked in red is required, which is given by:
49.85% - 47.5% = 2.35%
Thus, 2.35% of films have a duration between 158 minutes and
173 minutes.
The Use of the Continuity Correction Factor
Based on prior experience, a car dealership has a 45% chance of
selling an extended warranty with each used car that is sold. We
want to use the normal approximation to the binomial
distribution to find the probability of selling 25 or less extended
warranties when 60 cars are sold. Using continuity correction,
state the appropriate probability that will need to be found on
the normal curve.
Here X is the number of warranties to be sold, so X = 25 or less.
n = 60, p = 0.45, and 1-p = q =0.55.
So, on
the bell-shaped curve (normal) is the
probability that will need to be found.
The Normal Distribution and the Probability that a Binomial
Random Variable is Greater than a Value
Thirty percent of visitors to a local toy retailer will make a
purchase before exiting the store. Use the normal approximation
for binomial probability to determine the probability that more
than 50 visitors out of 200 will make a purchase.
Here X is the number of visitors who make a purchase, so X =
more than 50. Also n = 200, p = 0.30, and 1-p = q =0.70. It is a
binomial distribution because a customer will either make a
purchase or not.
Mean = = 200.0.30 = 60
Using the continuity correction factor, we have to find because
the problem states “more that 50”. The z-score is:
So, (from table).
The Normal Distribution and the Probability that a Binomial
Random Variable is Less than a Value
A small regional airline overbooks its flights because
historically only 90% of the reservations will actually show up
for the flight. If a flight has 100 available seats, the airline will
typically sell 110 reservations for the flight. What is the
probability that at most 95 people show up for a flight with 110
reservations?
Here X is the number of people who show up so X = at most 95.
Also n = 110, p = 0.90, and 1-p = q =0.10. It is a binomial
distribution because a person will either show up for the flight
or not.
Mean = n.p = 110.0.90 = 99
Using the continuity correction factor, we have to find because
the problem states “at most 95”. The z-score is:
So, (from table).
The Normal Distribution and the Probability that a Binomial
Random Variable is Between Two Values
A small regional airline overbooks its flights because
historically only 90% of the reservations will actually show up
for the flight. If a flight has 100 available seats, the airline will
typically sell 110 reservations for the flight. Use the normal
approximation for binomial probability to determine the
probability that between 100 and 107 people (inclusive) show
up for a flight with 110 reservations?
Here X is the number of people who show up so X = between
100 and 107 or [100,107]. Also n = 110, p = 0.90, and 1-p = q
=0.10.
Mean = = 110.0.90 = 99
Using the continuity correction factor, we have to find because
100 and 107 are both included”. The z-scores are:
So, (from table).
Module 4
Discrete Random variables
Master for Business Statistics
Dane McGuckian
Topics
4.1 Probability Distributions for Discrete Random Variables
4.2 Expected Value, Variance, and Standard Deviation for
Discrete Random Variables
4.3 The Binomial Probability Distribution
4.4 The Poisson Probability Distribution
4.1
Probability Distributions for Discrete Random Variables
Discrete Random Variable
A discrete random variable is a variable that can only assume a
countable number of values.
The achievable values of a discrete random variable are
separated by gaps.
Example: a publisher may sell 300,000 or 300,001 copies of its
latest book, but it cannot sell 300,000.159 copies of its latest
book
Discrete random variables contain observations that are not
measured on a continuous scale
Most often a discrete random variable contains observations
that are derived from counting something.
Discrete Random Variables
Examples of Discrete Random variables:
The number of clicks received by an online advertisement over
the past hour
The number of books sold by an author yesterday
The number of people missing the most recent flight from
Miami to London
The number of parking violations last semester on campus
Discrete Probability Distributions
The probability distribution of a discrete random variables lists
all of the possible outcomes for the random variable and the
associated probability for each of those outcomes.
The distribution can be represented by a table, a graph or a
formula.
Number of Female Jurors, XProbability of Outcome
P(X)00.00810.06120.18630.30340.27850.13660.028
Characteristics of a Probability Distribution
A probability distribution lists all possible outcomes for the
experiment and the corresponding probability for each of those
outcomes.
Remember that the probabilities cannot be negative
Each probability must lie between zero and one
The sum of the probabilities for all of the outcomes must be
one.
Example: Probability distribution of the number of free throws
made by basketball players who make free throws 80% of the
time (X). For instance, there is a 4%
chance that a player misses both throws
All these probabilities are non-negative
Each probability lies between 0 and 1
The sum of these probabilities are:
0.04+0.32+0.64 = 1
XP(X)00.0410.3220.64
4.2
Expected Value, Variance, and Standard Deviation for Discrete
Random Variables
The Mean of a Discrete Probability Distribution
The average value for a probability distribution is referred to as
the expected value of the probability distribution.
It represents the long-run typical value for the random variable.
If it were possible to run the trials indefinitely, the expected
value would be the mean for the infinite set of outcomes for the
random variable that would result from those trials.
The Expected value
The expected value is essentially a weighted average of the
possible outcomes for the random variable. The weights are the
corresponding probabilities for those outcomes.
Just as the arithmetic mean we studied earlier, it is common for
the expected value to be a decimal of a fraction even when the
original set of outcomes must be whole numbers.
The Expected value of a Discrete Random Variable
How much money on average will an insurance company make
off of a 1-year life insurance policy worth $50,000, if they
charge $1000.00 for the policy and each policy holder has a
0.9999 of surviving the year?
Average implies mean, and that mean “expected value” in the
context of a probability distribution. The formula is:
If a person lives, the company makes
$1000 (hence it is “positive”); if the person
dies, it pays the family $50,000 (but also
gets $1000 from the family), so their loss
is $50,000 - 1000 = $49,000.
EventsXP(X)x.P(x)Lives+10000.9999999.90Dies-49,0001 –
0.9999 = 0.0001-4.901.0000995.00 =
The Expected value of a Discrete Random Variable
A life insurance policy that sells for $200 and should the person
pass away before the end of the year, the family gets a check for
$10,000 from the company.
The company can expect their profits divided by the number of
policies sold (profit per policy) to be approximately $190.
The expected value is the long-run average after many, many
trials, so while the company’s average profit is unlikely to ever
be exactly $190, the more policies they sell, the closer and
closer the company’s profit per policy will be to $190.
EventsXP(X)x.P(x)Dies-9,8000.001-
9.80Lives2000.999+199.801.0000$190.00 =
Using Expected value to Distinguish between Two Possible
Courses of Action
A bank can either risk $20,000 on a currency investment that
has a 51% chance of earning them $40,000 in profit, or then can
risk $700,000 on a bond investment that has a 98% chance of
earning them $40,000. In the long run, which strategy will yield
the most profit?
Currency: Bond:
Average profit from currency investment is $10,600 and that
from the bond investment is $25,200. Thus the bond yields
higher average profit, and is the better
choice.xP(x)x.P(x)Profit+40,0000.5120,400Loss-20,0000.49-
9,8001.00$10,600 = xP(x)x.P(x)Profit+40,0000.9839,200Loss-
700,0000.02-14,0001.00$25,2000 =
The Variance and Standard Deviation of a Discrete Random
Variable
The mean value for a discrete probability distribution provides
the typical value for the random variable.
In other words, the mean tells us what we can expect to happen
on average over the long run, but if we want to know how
varied the outcomes for the random variable will be, we can
calculate the variance or standard deviation for the random
variable.
Variance for a probability distribution
The standard deviation for the random variable is found by
taking t he square root of the variance of the random variable.
Standard Deviation for a probability distribution
The Variance and Standard Deviation of a Discrete Random
Variable
Calculate the standard deviation of the probability distribution
shown here (round to the thousandths place):
xP(x)x.P(x)000.1500110.480.480.48420.250.501.00930.120.361
.081.34 = 2.56 =
Determine if an Event is Unusual using the Mean and Standard
Deviation of a Random Variable
A business venture offers an expected profit of $28,000 with a
standard deviation of $5,250. Would it be unusual to earn less
than $20,000 on the deal? (Hint: consider any value more than
two standard deviations away from the mean as unusual)
Thus an earning of $20,000 is not unusual because it falls
within the interval above.
4.3
The Binomial Probability Distribution
The Five Characteristics of a Binomial Experiment
A Binomial Experiment has a fixed number of trials, only two
possible outcomes for each trial, one trial cannot affect outcome
of the next trial, the probability has to remain constant from one
trial to the next, and x must represent the number of successes.
Example: Flip a coin 3 times and count the number of heads that
turn up (say, 1). Is this a binomial experiment?
There is a fixed number of trials, n = 3
There are 2 outcomes in each trial – heads and tails
The trials are independent, and the outcome of each flip does
not affect that of the later flips
Each flip had a 50% chance of turning up heads
x = 1 here (success class is “heads”)
Binomial Probability Formula
The probability of having X successes out of n trials during a
binomial experiment is given by the following formula (recall
that :
where
n = the number of trials for the binomial experiment
x = the number o successes
p = the probability of a success
q = the probability of a failure (
Binomial Probability Formula
Example: If a binomial experiment involves slipping a fair coin
7 times and counting the number of heads that result, the
probability of 5 heads turning up in 7 flips is provided below:
n = 7 (there are 7 flips of the fair coin)
x = 5 (we are looking for the probability of getting 5 heads)
p = 0.50 (a fair coin has a 50% chance of turning up heads on a
single flip)
q = 0.50 (the probability of failure is found by subtracting the
probability of success from 1)
Thus the probability of getting 5 heads out of 7 flips is 1.64%.
The Probability of X successes in a Binomial Experiment
A cable company believes that their new promotion will
convince 20% of satellite television subscribers to sign up for
cable. If the company is correct, what is the probability that 2
out of 8 randomly selected satellite users end up switching to
cable after hearing the promotion?
The fact that there are 2 groups that will behave differently
(some will switch and some will not) denotes it’s the binomial
distribution.
Test to determine if this is a binomial experiment: (1) fixed
number of trials = 8; (2) there are 2 possible outcomes (switches
or not); (3) constant probability of success (switching) is 20%;
(4) 8 unique users, so trials are independent (assume no user is
called twice).
n = 8, X = 2 switch, p = 0.20, q = 1 – 0.20 = 0.80. So
Thus there is a 29.4% probability that 2 out of the 8 satellite
users switch to cable after hearing the promotion.
The Probability of a Cumulative Set of Events for a Binomial
Experiment
A drug company reports that 65% of balding men would benefit
from using an over the counter hair-loss solution they
manufacture. Assuming the company’s claim is correct and a
random sample of 10 men are selected for a clinical trial of the
product, what is the probability that at least 9 of the men
benefit from the solution?
This is a binomial experiment because: (1) there is a fixed
number of trials, 10 men selected; (2) each trial has 2 outcomes
(working or not working); (3) constant probability of 65% of
benefitting from the solution; (4) trials are independent (no
chance of repetition)
Here n = 10, X = 9 or 10, p = 0.65, q = 1 – 0.65 = 0.35. So
P(X = 9 or X = 10) = P(X = 9) + P(X = 10)
= +
= 0.086
So there is an 8.6% chance that at least 9 men would benefit
from the solution.
The Probability of a Cumulative Set of Events for a Binomial
Experiment
A laptop manufacturer knows that 30% of its laptops will fail
within the first two years of use. If seven randomly selected
customers are surveyed, what is the probability that more than 3
of them experienced a laptop failure within the first two years
of use?
This is a binomial experiment because: (1) there is a fixed
number of trials – 7 men selected randomly; (2) there are two
outcomes (a laptop fails or not within the first two years of
use); (3) constant probability of 30% of a laptop failing in two
years’ time; (4) customers are independent.
Here n= 7, X = more than 3 had a failure (4, 5, 6, or 7), p =
0.30, q = 1 – 0.30 = 0.70. So, from the table:
P(X=4) = 0.0972
P(X=5) = 0.0250
P(X=6) = 0.0036
P(X = 7) = 0.0002
Adding all these,
P(X more than 3) = 0.1260
Mean for a Binomial Probability Distribution
We could calculate the mean of the binomial probability
distribution by listing all of the possible outcomes for the
experiment, listing all of their corresponding probabilities, and
then applying the formula:
However, that method could be very time consuming.
There is a simpler approach when trying to find the mean of a
binomial probability distribution.
First we identify the number of trials for the experiment (n) and
the probability of success (p). Then we apply the following
formula:
Mean for a Binomial Probability Distribution
Example: If a quality control manager samples computer chips
from a production line with replacement that have a 0.009
probability of being defective, what is the average number of
defective chips that will be found in a sample of 200 chips?
This sampling procedure produces a binomial probability
distribution, so we can apply the formula for finding the mean
of a binomial distribution.
In this example, there are 200 trials, and the probability of
finding a defective chip in 0.009
defective chips
For groups of 200 computer chips random selected from the
production line, the average number of defective chips is 1.8.
The Mean of a Binomial Random Variable
An electronics retailer notes that only 8% of its online
customers choose to purchase their extended service plan. If the
retailer has 300 online sales over the next month, what is the
expected number of customers that will choose to purchase the
extended service plan?
This satisfies the conditions of a binomial experiment. So
since n = 300 and p = 0.08 here.
Thus the expected number of customers that will choose to
purchase the extended service plan is 24.
Standard Deviation for a Binomial Probability Distribution
We could calculate the standard deviation for the binomial
probability distribution by listing all of their corresponding
probabilities, and then applying the formula
.
However, that method would be very time consuming.
There is a simpler approach when trying to find the standard
deviation of a binomial probability distribution.
First, we identify the number of trials for the experiment (n)
and the probability of success (p).
Then we apply the following formula:
Standard Deviation for a Binomial Probability Distribution
Example: If a quality control manager samples computer chips
from a production line with replacement that have a 0.009
probability of being defective, what is the standard deviation
for the number of defective chips that will be found in a sample
of 200 chips?
This sampling procedure produces a binomial probability
distribution, so we can apply the formula for finding the
standard deviation of a binomial distribution.
In this example, there are 200 trials, and the probability of
finding a defective chip in 0.009.
n = 200, p = 0.009, q = 1 – 0.009 = 0.991
Standard Deviation for a Binomial Probability Distribution
The variance will be
Then we simply take the square root to find the standard
deviation:
defective chips
For groups of 200 computer chips randomly selected from this
factory, the standard deviation for the number of defective chips
is 1.336.
Standard Deviation of a Binomial Random Variable
A recent report states that only 28% of software projects were
expected to finish on time and on budget. If we randomly
sample 80 software projects, what is the standard deviation for
the number of projects that are expected to finish on time and
on budget?
This satisfies all the conditions for binomial experiment, so we
can use the formula for the standard deviation to calculate this:
4.4
The Poisson Probability Distribution
The Poisson Distribution
The Poisson distribution is a discrete probability distribution
that provides probabilities for the number of occurrences of
some event over a given period, interval, distance, or space.
Example: A customer service call center might use the Poisson
distribution to describe the behavior of incoming calls over
different time periods.
Example: A website might use the Poisson distribution to
estimate the likelihood of some number of individuals logging
onto the site between the hours of 12:00AM and 1:00AM.
Example: A mining company might use the Poisson distribution
to model the number of methane gas releases over a specified
depth.
The Poisson distribution is typically used to model the
occurrences of rare events.
The Poisson Distribution
The probability that the specified events occurs X times over
some defined interval is given by the following formula:
where
(mu) is the mean (expected) number of occurrences (successes)
over a particular interval
x is the number of occurrences (successes)
e is a constant (the base of the natural log) that is approximately
equal to 2.71828
The Poisson Distribution
Here are some important characteristics of the Poisson
distribution:
The random variable is the number of occurrences of some
event over some defined interval.
The probability of the event is proportional to the size of the
closed interval.
The intervals do not overlap.
The occurrences are independent of each other.
Mean and Standard Deviation of the Poisson Distribution
The mean of the Poisson distribution is
The standard deviation for the Poisson distribution is given by
Some Important Differences between the Binomial and the
Poisson Distributions
The binomial distribution is dependent upon the sample size and
the probability of success, while the Poisson distribution only
depends on the mean .
In the binomial distribution, the random variable can take on
values of 0,1,..,n, while in the Poisson distribution, the random
variable can be any integer greater than or equal to zero.
In other words, there is no upper bound for the number of
occurrences in the given interval.
Probability and the Poisson Distribution
A cellular communication company finds that during a 10-
minute phone call there will typically be one incidence of poor
reception. Use a Poisson distribution to calculate the probability
that there will be 5 incidences of poor reception during a call
that lasts an hour.
Here x = 5, = 6 since an average of 1 incidence of poor
reception in a 10-minute period implies an average of 6
incidences in 60 mins (1 hour).

Más contenido relacionado

Similar a Module 7 Interval estimatorsMaster for Business Statistics.docx

Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Syed Muhammad Danish
 
Section 7 Analyzing our Marketing Test, Survey Results .docx
Section 7 Analyzing our Marketing Test, Survey Results .docxSection 7 Analyzing our Marketing Test, Survey Results .docx
Section 7 Analyzing our Marketing Test, Survey Results .docxkenjordan97598
 
Lecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxLecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxshakirRahman10
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeRose Jenkins
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample Sizeguest3720ca
 
Chapter 09
Chapter 09Chapter 09
Chapter 09bmcfad01
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
 
Estimasi Parameter (Teori Estimasi)
Estimasi Parameter (Teori Estimasi)Estimasi Parameter (Teori Estimasi)
Estimasi Parameter (Teori Estimasi)miaakmt
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 reviewdrahkos1
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
How to compute for sample size.pptx
How to compute for sample size.pptxHow to compute for sample size.pptx
How to compute for sample size.pptxnoelmartinez003
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statisticsanjaemerry
 

Similar a Module 7 Interval estimatorsMaster for Business Statistics.docx (20)

Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)
 
QT1 - 07 - Estimation
QT1 - 07 - EstimationQT1 - 07 - Estimation
QT1 - 07 - Estimation
 
Section 7 Analyzing our Marketing Test, Survey Results .docx
Section 7 Analyzing our Marketing Test, Survey Results .docxSection 7 Analyzing our Marketing Test, Survey Results .docx
Section 7 Analyzing our Marketing Test, Survey Results .docx
 
Lecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxLecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptx
 
Chapter 09
Chapter 09 Chapter 09
Chapter 09
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample Size
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample Size
 
Chapter 09
Chapter 09Chapter 09
Chapter 09
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
 
Estimasi Parameter (Teori Estimasi)
Estimasi Parameter (Teori Estimasi)Estimasi Parameter (Teori Estimasi)
Estimasi Parameter (Teori Estimasi)
 
Chap009.ppt
Chap009.pptChap009.ppt
Chap009.ppt
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 review
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
How to compute for sample size.pptx
How to compute for sample size.pptxHow to compute for sample size.pptx
How to compute for sample size.pptx
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
 

Más de gilpinleeanna

Name 1. The table shows the number of days per week, x, that 100.docx
Name 1. The table shows the number of days per week, x, that 100.docxName 1. The table shows the number of days per week, x, that 100.docx
Name 1. The table shows the number of days per week, x, that 100.docxgilpinleeanna
 
Name _____________________Date ________________________ESL.docx
Name  _____________________Date  ________________________ESL.docxName  _____________________Date  ________________________ESL.docx
Name _____________________Date ________________________ESL.docxgilpinleeanna
 
Name Bijapur Fort Year 1599 Location Bijapur city.docx
Name Bijapur Fort Year 1599 Location Bijapur city.docxName Bijapur Fort Year 1599 Location Bijapur city.docx
Name Bijapur Fort Year 1599 Location Bijapur city.docxgilpinleeanna
 
Name _______________________________ (Ex2 rework) CHM 33.docx
Name  _______________________________ (Ex2 rework) CHM 33.docxName  _______________________________ (Ex2 rework) CHM 33.docx
Name _______________________________ (Ex2 rework) CHM 33.docxgilpinleeanna
 
Name 1 Should Transportation Security Officers Be A.docx
Name 1 Should Transportation Security Officers Be A.docxName 1 Should Transportation Security Officers Be A.docx
Name 1 Should Transportation Security Officers Be A.docxgilpinleeanna
 
Name Don’t ForgetDate UNIT 3 TEST(The direct.docx
Name   Don’t ForgetDate       UNIT 3 TEST(The direct.docxName   Don’t ForgetDate       UNIT 3 TEST(The direct.docx
Name Don’t ForgetDate UNIT 3 TEST(The direct.docxgilpinleeanna
 
Name Add name hereConcept Matching From Disease to Treatmen.docx
Name  Add name hereConcept Matching From Disease to Treatmen.docxName  Add name hereConcept Matching From Disease to Treatmen.docx
Name Add name hereConcept Matching From Disease to Treatmen.docxgilpinleeanna
 
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docx
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docxName Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docx
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docxgilpinleeanna
 
Name Add name hereHIM 2214 Module 6 Medical Record Abstractin.docx
Name  Add name hereHIM 2214 Module 6 Medical Record Abstractin.docxName  Add name hereHIM 2214 Module 6 Medical Record Abstractin.docx
Name Add name hereHIM 2214 Module 6 Medical Record Abstractin.docxgilpinleeanna
 
Name Sophocles, AntigoneMain Characters Antigone, Cre.docx
Name    Sophocles, AntigoneMain Characters Antigone, Cre.docxName    Sophocles, AntigoneMain Characters Antigone, Cre.docx
Name Sophocles, AntigoneMain Characters Antigone, Cre.docxgilpinleeanna
 
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docx
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docxN4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docx
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docxgilpinleeanna
 
Name Habitable Zones – Student GuideExercisesPlease r.docx
Name  Habitable Zones – Student GuideExercisesPlease r.docxName  Habitable Zones – Student GuideExercisesPlease r.docx
Name Habitable Zones – Student GuideExercisesPlease r.docxgilpinleeanna
 
Name Class Date SKILL ACTIVITY Giving an Eff.docx
Name    Class    Date   SKILL ACTIVITY Giving an Eff.docxName    Class    Date   SKILL ACTIVITY Giving an Eff.docx
Name Class Date SKILL ACTIVITY Giving an Eff.docxgilpinleeanna
 
Name Speech Title I. Intro A) Atten.docx
Name  Speech Title    I. Intro  A) Atten.docxName  Speech Title    I. Intro  A) Atten.docx
Name Speech Title I. Intro A) Atten.docxgilpinleeanna
 
n engl j med 352;16www.nejm.org april 21, .docx
n engl j med 352;16www.nejm.org april 21, .docxn engl j med 352;16www.nejm.org april 21, .docx
n engl j med 352;16www.nejm.org april 21, .docxgilpinleeanna
 
Name Class Date HUMR 211 Spr.docx
Name     Class     Date    HUMR 211 Spr.docxName     Class     Date    HUMR 211 Spr.docx
Name Class Date HUMR 211 Spr.docxgilpinleeanna
 
NAME ----------------------------------- CLASS -------------- .docx
NAME ----------------------------------- CLASS -------------- .docxNAME ----------------------------------- CLASS -------------- .docx
NAME ----------------------------------- CLASS -------------- .docxgilpinleeanna
 
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docx
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docxNAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docx
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docxgilpinleeanna
 
Name Understanding by Design (UbD) TemplateStage 1—Desir.docx
Name  Understanding by Design (UbD) TemplateStage 1—Desir.docxName  Understanding by Design (UbD) TemplateStage 1—Desir.docx
Name Understanding by Design (UbD) TemplateStage 1—Desir.docxgilpinleeanna
 
Name MUS108 Music Cultures of the World .docx
Name              MUS108 Music Cultures of the World              .docxName              MUS108 Music Cultures of the World              .docx
Name MUS108 Music Cultures of the World .docxgilpinleeanna
 

Más de gilpinleeanna (20)

Name 1. The table shows the number of days per week, x, that 100.docx
Name 1. The table shows the number of days per week, x, that 100.docxName 1. The table shows the number of days per week, x, that 100.docx
Name 1. The table shows the number of days per week, x, that 100.docx
 
Name _____________________Date ________________________ESL.docx
Name  _____________________Date  ________________________ESL.docxName  _____________________Date  ________________________ESL.docx
Name _____________________Date ________________________ESL.docx
 
Name Bijapur Fort Year 1599 Location Bijapur city.docx
Name Bijapur Fort Year 1599 Location Bijapur city.docxName Bijapur Fort Year 1599 Location Bijapur city.docx
Name Bijapur Fort Year 1599 Location Bijapur city.docx
 
Name _______________________________ (Ex2 rework) CHM 33.docx
Name  _______________________________ (Ex2 rework) CHM 33.docxName  _______________________________ (Ex2 rework) CHM 33.docx
Name _______________________________ (Ex2 rework) CHM 33.docx
 
Name 1 Should Transportation Security Officers Be A.docx
Name 1 Should Transportation Security Officers Be A.docxName 1 Should Transportation Security Officers Be A.docx
Name 1 Should Transportation Security Officers Be A.docx
 
Name Don’t ForgetDate UNIT 3 TEST(The direct.docx
Name   Don’t ForgetDate       UNIT 3 TEST(The direct.docxName   Don’t ForgetDate       UNIT 3 TEST(The direct.docx
Name Don’t ForgetDate UNIT 3 TEST(The direct.docx
 
Name Add name hereConcept Matching From Disease to Treatmen.docx
Name  Add name hereConcept Matching From Disease to Treatmen.docxName  Add name hereConcept Matching From Disease to Treatmen.docx
Name Add name hereConcept Matching From Disease to Treatmen.docx
 
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docx
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docxName Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docx
Name Abdulla AlsuwaidiITA 160Uncle VanyaMan has been en.docx
 
Name Add name hereHIM 2214 Module 6 Medical Record Abstractin.docx
Name  Add name hereHIM 2214 Module 6 Medical Record Abstractin.docxName  Add name hereHIM 2214 Module 6 Medical Record Abstractin.docx
Name Add name hereHIM 2214 Module 6 Medical Record Abstractin.docx
 
Name Sophocles, AntigoneMain Characters Antigone, Cre.docx
Name    Sophocles, AntigoneMain Characters Antigone, Cre.docxName    Sophocles, AntigoneMain Characters Antigone, Cre.docx
Name Sophocles, AntigoneMain Characters Antigone, Cre.docx
 
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docx
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docxN4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docx
N4455 Nursing Leadership and ManagementWeek 3 Assignment 1.docx
 
Name Habitable Zones – Student GuideExercisesPlease r.docx
Name  Habitable Zones – Student GuideExercisesPlease r.docxName  Habitable Zones – Student GuideExercisesPlease r.docx
Name Habitable Zones – Student GuideExercisesPlease r.docx
 
Name Class Date SKILL ACTIVITY Giving an Eff.docx
Name    Class    Date   SKILL ACTIVITY Giving an Eff.docxName    Class    Date   SKILL ACTIVITY Giving an Eff.docx
Name Class Date SKILL ACTIVITY Giving an Eff.docx
 
Name Speech Title I. Intro A) Atten.docx
Name  Speech Title    I. Intro  A) Atten.docxName  Speech Title    I. Intro  A) Atten.docx
Name Speech Title I. Intro A) Atten.docx
 
n engl j med 352;16www.nejm.org april 21, .docx
n engl j med 352;16www.nejm.org april 21, .docxn engl j med 352;16www.nejm.org april 21, .docx
n engl j med 352;16www.nejm.org april 21, .docx
 
Name Class Date HUMR 211 Spr.docx
Name     Class     Date    HUMR 211 Spr.docxName     Class     Date    HUMR 211 Spr.docx
Name Class Date HUMR 211 Spr.docx
 
NAME ----------------------------------- CLASS -------------- .docx
NAME ----------------------------------- CLASS -------------- .docxNAME ----------------------------------- CLASS -------------- .docx
NAME ----------------------------------- CLASS -------------- .docx
 
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docx
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docxNAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docx
NAHQ Code of Ethics and Standards of Practice ©Copyright 2011 .docx
 
Name Understanding by Design (UbD) TemplateStage 1—Desir.docx
Name  Understanding by Design (UbD) TemplateStage 1—Desir.docxName  Understanding by Design (UbD) TemplateStage 1—Desir.docx
Name Understanding by Design (UbD) TemplateStage 1—Desir.docx
 
Name MUS108 Music Cultures of the World .docx
Name              MUS108 Music Cultures of the World              .docxName              MUS108 Music Cultures of the World              .docx
Name MUS108 Music Cultures of the World .docx
 

Último

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 

Último (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 

Module 7 Interval estimatorsMaster for Business Statistics.docx

  • 1. Module 7 Interval estimators Master for Business Statistics Dane McGuckian Topics 7.1 Interval Estimate of the Population Mean with a Known Population Standard Deviation 7.2 Sample Size Requirements for Estimating the Population Mean 7.3 Interval Estimate of the Population Mean with an Unknown Population Standard Deviation 7.4 Interval Estimate of the Population Proportion 7.5 Sample Size Requirements for Estimating the Population Proportion 7.1 Interval Estimate of the Population Mean with a Known Population Standard Deviation Interval Estimators Quantities like the sample mean and the sample standard
  • 2. deviation are called point estimators because they are single values derived from sample data that are used to estimate the value of an unknown population parameter. The point estimators used in Statistics have some very desirable traits; however, they do not come with a measure of certainty. In other words, there is no way to determine how close the population parameter is to a value of our point estimate. For this reason, the interval estimator was developed. An interval estimator is a range of values derived from sample data that has a certain probability of containing the population parameter. This probability is usually referred to as confidence, and it is the main advantage that interval estimators have over point estimators. The confidence level for a confidence interval tells us the likelihood that a given interval will contain the target parameter we are trying to estimate. The Meaning of “Confidence Level” Interval estimates come with a level of confidence. The level of confidence is specified by its confidence coefficient – it is the probability (relative frequency) that an interval estimator will enclose the target parameter when the estimator is used repeatedly a very large number of times. The most common confidence levels are 99%, 98%, 95%, and 90%. Example: A manufacturer takes a random sample of 40 computer chips from its production line to construct a 95% confidence interval to estimate the true average lifetime of the chip. If the manufacturer formed confidence intervals for every possible sample of 40 chips, 95% of those intervals would contain the population average.
  • 3. The Meaning of “Confidence Level” In the previous example, it is important to note that once the manufacturer has constructed a 95% confidence interval, it is no longer acceptable to state that there is a 95% chance that the interval contains the true average lifetime of the computer chip. Prior to constructing the interval, there was a 95% chance that the random interval limits would contain the true average, but once the process of collecting the sample and constructing the interval is complete, the resulting interval either does or does not contain the true average. Thus there is a probability of 1 or 0 that the true average is contained within the interval, not a 0.95 probability. The interval limits are random variables because their values depend upon the results of a random sample of data. However, once they are calculated from a particular sample, those limits are no longer random variables – they become fixed constants, so speaking about their probability in terms of the confidence level is no longer valid. The Meaning of “Confidence Level” In the diagram, there are 8 different confidence intervals represented. Each confidence interval was constructed using a sample of size n, drawn from the same population, and all of the intervals have a 95% level of confidence. The vertical line in the diagram indicates where the popula- tion mean is located. 7 of the intervals capture the population mean, but the second interval does not.
  • 4. If we looked as a very large number of these intervals, approximately 5% (100% - 95%) of them would fail to include the mean. All of the others would contain the mean as expected. Confidence Interval for Estimating a Population Mean (with Sigma Known) The formula for the confidence interval to estimate the mean consists of two values; a lower limit and an upper limit. Confidence Interval for where n = the sample size σ = the population standard deviation = the sample mean = the z-score separating an area of in the upper tail of the standard normal curve The formula is often expressed as: Lower LimitUpper Limit Confidence Interval for Estimating a Population Mean (with Sigma Known) However, the most common way to express the formula for the confidence interval to estimate the mean is
  • 5. where is called the margin of error. The Margin of Error In the formula for the confidence interval to estimate the population mean (with σ known), there is a quantity called the margin of error. The margin of error is the maximum likely difference observed between the sample mean and the population mean , and it is denoted by E. The margin of error for the confidence interval to estimate the mean is given by the following formula: where n = the sample size σ = the population standard deviation = the sample mean = the z-score separating an area of in the upper tail of the standard normal curve The Margin of Error The margin of error is what determines the width of the confidence interval. The width of a confidence interval is given by: When estimating the mean using a confidence interval, the smaller the margin of error, the better. Since the confidence interval is designed to contain the mean, a narrow interval gives us a better idea of where the mean is located.
  • 6. The Importance of a Known Population Standard Deviation for a Confidence Interval The population standard deviation for a random variable is part of the margin of error formula used to estimate the population mean of that random variable. The reason for this is that the population standard deviation is needed to determine the precise standard error of the sample mean. If we do not know the precise standard error of the sample mean, we cannot guarantee the level of confidence specified for the interval. The standard error for : The margin of error used to estimate The critical value is determined by assuming the distribution of the sample mean is normally distributed with an unknown mean of and a known standard error of . The Importance of a Known Population Standard Deviation for a Confidence Interval If we do not know , then we cannot know the standard error of the sample mean. This would prevent us from stating an accurate confidence level for our interval estimate. For this reason, we should know the population standard deviation when using the following confidence interval formula to estimate : Formula to estimate (with known):
  • 7. How to Find Critical Z Values Find the critical Z value needed to construct a 90% confidence interval: . From the normal table, looking up the probability value of 0.4500, we get because 0.4500 fell between 1.64 and 1.65. The Margin of Error when Sigma is Known The mean quarterly earnings per share for a sample of 36 stocks is $10.52, and the population standard deviation is $2.50. Calculate the margin of error that would be used when estimating the population mean with a confidence level of 95%. The margin of error is given by: Here n = 36, , CL = 0.95, So Then
  • 8. The Steps to Create a Confidence Interval when Sigma is Known Step 1: Gather the sample size, sample mean, population standard deviation, and a confidence level Step 2: Find the Z critical value Step 3: Calculate the margin of error Step 4: Calculate the confidence interval The Confidence Interval for the Mean when Sigma is Known A student wants to estimate the average amount of time it takes to commute to campus from her apartment. For a random selection of 32 days, she times her commute. The average commute time for those days was 15.3 minutes. Assume the population standard deviation is 3.5 minutes and form a 98% confidence interval to estimate the true mean commute time for the student. Here n = 32, , CL = 0.98, So The margin of error is given by: The 98% Confidence interval is: So we are 98% confident that the true mean is between 13.86 minutes and 16.74 minutes.
  • 9. A Confidence Interval A logistics company claims the true average price for a gallon of regular, unleaded gas is $3.35. A researcher has recently used sample data to form a 98% confidence interval estimate of the true average price of regular, unleaded gas. The interval is given by $3.26 ± $0.06. Do the results contradict the logistics company’s claim? The 98% confidence interval in this case is: ($3.26 - $0.06, $3.26 + $0.06) = ($3.20, $3.32) which shows that we are 98% confident that the true average price per gallon of unleaded gas will be between $3.20 and $3.32. So $3.35 is outside of this interval, the results of the researcher does contradict the logistics company’s claim. On the other hand, if the company had claimed that the average price per gallon of unleaded gas was $3.31, this would have been included in the interval and the results would not have contradicted the company’s claim. The Factors Affecting the Margin of Error or Width of Confidence Intervals The Margin of Error determines how wide our confidence interval will be. We do not want wide confidence intervals because narrow intervals give us a better idea of where the mean lies on the number line. There are two ways to reduce the error in a confidence interval: (1) decrease the confidence level, or (2) increase the sample size. Increasing the sample size is not always possible because of
  • 10. costs and implementation considerations. The population standard deviation is given with the data and cannot be changed. 7.2 Sample Size Requirements for Estimating the Population Mean The Formula for Determining the Sample Size Needed to Estimate the Population Mean We can use the formula for the Margin of Error to derive a formula that will tell us the sample size needed to produce a confidence level that has the particular margin of error and a desired confidence level. That formula is: The Special Rounding Rule for Sample Size Calculations Remember than n represents the number of subjects that were measured or surveyed in our study, so it cannot be a decimal or a fraction. When a decimal occurs, we always want to round up because rounding up will produce less error in our confidence interval while rounding down would produce more error.
  • 11. Determining the Sample Size Needed to Estimate the Population Mean A stockbroker on Wall Street wants to estimate the average daily-high price for a stock. What sample size is necessary to form a 99% confidence interval to estimate the mean daily-high within 0.50 dollars? Assume the population standard deviation is known to be 4.59 dollars. Here CL = 0.99, So the sample size n is: , by rounding up Thus the sample size necessary to form a 99% confidence interval is 560. 7.3 Interval Estimate of the Population Mean with an Unknown Population Standard Deviation Estimating the Mean when the Population Standard Deviation is Unknown Often we do not know the population standard deviation (σ) when attempting to estimate the population mean using a confidence interval. When the population standard deviation is unknown, we must use the sample standard deviation as a substitute However, since the sample standard deviation is not the same as the true population standard deviation, we cannot u se the z distribution to construct the confidence interval. When the population standard deviation is unknown, we use the
  • 12. t distribution to form our interval estimate of the mean. Like the z distribution, the t distribution is a bell-shaped distribution, so we will still need to assume the sample mean has a normal (or approximately normal) distribution to use the t distribution. Estimating the Mean when the Population Standard Deviation is Unknown When do we use the t distribution to estimate the population mean? When the population standard deviation (σ) is unknown, and we can assume the distribution of the sample mean is approximately normally distributed. The Similarities Between the t and z Distribution It should be stated at the outset that there is not just one t distribution, but a family of t distributions. For every different sample size n, (degree of freedom), there is a slightly different corresponding t distribution. These infinitely many t distributions will be defined by their specific degrees of freedom (n – 1). The family of t distributions is similar to the standard normal (z) distribution in several important ways The most basic similarity between the t distributions and the standard normal distribution is the fact that they are continuous distributions. The shapes of the curves are also similar – both distributions are symmetric and mound-shaped (i.e., bell-shaped). The family of t distribution curves and the standard normal
  • 13. curve have the same mean – that mean is zero. The Similarities Between the t and z Distributions The diagram contains the graph of the standard normal distribution and the t distribution for a sample size of 12 The Differences that Exist Between the t and z Curves The family of t distributions and the standard normal distribution (z) are similar in three ways: (1) both a continuous distributions, (2) both are bell-shaped distributions, and (3) both have a mean of zero. The differences that exist between the curves all stem from the fact that they have different standard deviations. The standard deviation for the standard normal curve is 1, whereas for the family of t distributions, the standard deviation varies, but it is always greater than 1. For every different sample size n (and degree of freedom n – 1), there is a slightly different standard deviation for the corresponding t distribution. This is the only thing that differentiates the otherwise identical t curves from each other.
  • 14. The Differences that Exist Between the t and z Curves Since all probability distributions must have a total area of one, the different standard deviations affect the overall shape of the curves in a predictive way. Curves with greater variation (a higher standard deviation) will be flatter on top and more spread out. This means that there is more area in the tails and less in the center of the distribution. When the curve has less variation (a smaller standard deviation), it will have more data in the center and small tail areas. The Differences that Exist Between the t and z Curves The closer the standard deviation is to 1, the more the t distribution will look like the z distribution. For the family of t distributions, there is an inverse relationship between sample size (degrees of freedom) and standard deviation. As the sample size increases, the corresponding t distributions have smaller and smaller standard deviations. This implies that as n increases, the t distributions become more and more like the standard normal distribution. The Differences that Exist Between the t and z Curves
  • 15. In the diagram, the two t distributions are graphed along with the standard normal curve. In comparison to the standard normal curve, you can see that the two t curves are thicker (i.e., have more density) in the tails and have less area at the center. The smaller the sample size the more pronounced these differences are. Confidence Interval for Estimating a Population Mean (with Sigma Unknown) The formula for the confidence interval to estimate the mean consists of two values; a lower limit and an upper limit: Confidence Interval for where n = the sample size σ = the population standard deviation = the sample mean = the t score separating an area of in the upper tail of the t distribution with degrees of freedom Lower LimitUpper Limit
  • 16. Confidence Interval for Estimating a Population Mean (with Sigma Unknown) The formula is often expressed as: However, the most common way to express the formula for the confidence interval to estimate the mean is , where is called the margin of error. Find Critical t Values Assuming the population is approximately normal and sigma is unknown, find the appropriate critical value for a 90% confidence interval with a sample size of 20. Since the population is approximately normal, sample size is small and sigma is unknown, we need to use the t distribution and hence calculate the t critical value. So here: CL = 0.90 n = 20 df = n – 1 = 19 So the critical value is: (from the t-table on the next slide) Find Critical t Values: t Table
  • 17. Form the Margin of Error when Sigma is Unknown Find the margin of error for a 98% confidence interval estimate of the population mean when sigma is unknown. The sample size is 15. The standard deviation is 20.1, and the data appear to be normally distributed. Here: CL = 0.98; n = 15; df = n – 1 = 14; s = 20.1. So the critical value is: . So the margin of error is: The Steps to Create a Confidence Interval (Sigma is Unknown) The following are the steps to create a confidence interval when sigma is unknown: Gather the sample data for the problem, which will include and the confidence interval. Find the critical value Calculate the margin of error (E). Form the interval by subtracting the margin of error from the sample mean and adding the margin of error to the sample mean
  • 18. Construct a Confidence Interval when Sigma is Unknown A waiter wants to know the average amount of time it takes a table of guests in his section of the restaurant to “turn” (sit, order, eat, pay, and leave). He times a random selection of 25 tables over several busy nights. For those tables, the average time to turn was 42.1 minutes. The sample standard deviation was 4.7 minutes. Assume the turn times are normally distributed, and form a 90% confidence interval for the true mean time to turn a table in this waiter’s section of the restaurant. Since the population standard deviation is not given and the distribution of turn times is given to be normal, we use the t distribution. CL = 0.90; n = 25; df = n – 1 = 24; s = 4.7. So , and the margin of error is . So the 90% confidence interval for the true mean time to turn is given by 7.4 Interval Estimate of the Population Proportion Population Proportion The term proportion refers to the fraction, ratio or percent of the population having a particular trait of interest. The symbols for population proportion and sample proportion are ρ (rho) and (p-hat) respectively.
  • 19. Examples of Population Proportions: In the United States of America, 16.7% of all babies born have blue eyes In 2013, 31.7% of the U.S. population, aged 25 or older, held a bachelor’s degree or higher. 85% of 18 to 24 year olds, who were raised by at least one parent having a bachelor’s degree or higher, will attend college. The Sample Proportion To calculate the proportion of a sample that has some trait of interest, we divide the number of subjects (or items) that have the trait by the number of subjects (or items) belonging to the sample. Formula for the sample proportion (): where x = the number of subjects (or items) having the trait of interest n = total number of subjects (or items) sampled The Sample Proportion For example, consider the survey results below: The proportion of students reporting that they earned an A in Business Statistics is given by: Number of Survey ParticipantsNumber who earned an A in
  • 20. Business Statistics21520 The Sampling Distribution of the Sample Proportion Recall that if we randomly select n subjects and x of them have some trait we are interested in, the sample proportion formed from the data is: where x = the number of subjects having the trait we are interested in. We use as a point estimate of the population proportion (ρ). For different samples of size n, a different number (x) of subjects will have the trait of interest. This means the value of will vary from sample to sample. If we want to use it to form an interval estimate of the true population proportion (ρ), it is important that we know the sampling distribution of . The Sampling Distribution of the Sample Proportion The sampling distribution of is approximately normally distributed The expected value (mean) for is the population proportion (ρ). The standard error for is can be assumed to be approximately normally distributed when both and . We can approximate the standard error of as
  • 21. 45 The Sample Size Requirement for Estimating the Population Proportion When constructing a confidence interval to estimate the population proportion, we can assume is approximately normally distributed if both and . Example: A large corporation wants to estimate the proportion of its part-time employees that would enroll in the company health insurance plan, if it were made available to them. A survey of 500 randomly selected part-time employees reveals that 285 of them would enroll in the plan. In this example, the sample size is 500 and the number of employees interested in enrolling in the plan is 285. Using these quantities, we can calculate the sample proportion (). The Sample Size Requirement for Estimating the Population Proportion Using the sample proportion as an estimate for the population proportion (ρ), we can check the sample size requirement to ensure the sampling distribution of the sample proportion is approximately normal. and Since both of the results above are at least 5, it is appropriate to assume the sampling distribution of the sample proportion is approximately normally distributed.
  • 22. Formula to Calculate a Confidence Interval for the Population Proportion The formula to calculate the confidence interval for the population proportion ( is given by where is the sample proportion n is the sample size is the critical value linked to the confidence interval Constructing a Confidence Interval for the Population Proportion An efficiency consultant studied a random selection of 200 e- mails received by company employees, to determine how many were relevant to the recipient. Only 36 of the emails were relevant to their recipients. Form a 95% confidence interval to estimate the true proportion of relevant emails received by a typical employee. Here n = 200, (as only 36 out of the 200 e-mails were relevant), , CL=0.95, . So (from the z-table on the following slide) and the margin of error is: So the 95% confidence interval estimate is We are thus 95% confident that the true proportion of relevant emailed received by a typical employee lies between 0.127 and 0.233.
  • 23. Question: Can we say that it seems that less than a quarter of emails are relevant? Yes because the upper limit 0.233 < 0.25. Constructing a Confidence Interval for the Population Proportion Interpreting a Confidence Interval for the Population Proportion The CEO of a logistics company claims that only 5% of its holiday deliveries arrive late. A 98% confidence interval to estimate the proportion of late deliveries produced the following interval: 0.06 to 0.11. Does the interval contradict the CEO’s claim? According to the confidence interval estimate, the true proportion of late deliveries lies between 6% and 11% with 98% confident. Since both these numbers are higher than the stated value of 5% (that is, the interval does not contain 5%), the CEO’s claim is contradicted. 7.5 Sample Size Requirements for Estimating the Population
  • 24. Proportion The Formula for Estimating Sample Size for the Population Proportion The sample size formula when estimating a population is used to specify the sample size required to guarantee that your confidence interval has a certain margin of error and a certain confidence level. It is derived by taking the margin of error (E) from the confidence interval formula for estimating the population proportion and solving for n. The equation is: Since is unknown, we substitute the value 0.5 in the equation because 0.5(1-0.5) is the maximum, so the value of n obtained with this value of will be guaranteed to be as large as it possibly need to be to cover all possible scenarios. Calculate the Sample Size Needed to Estimate the Population Proportion A sales manager at a local car dealership wants to estimate the proportion of used car sales that include an extended warranty. What size sample would be needed to estimate the proportion of extended warranties sold with error of no more than 0.05 and a confidence level of 99%? Here E = 0.05, CL=0.99, , and we use We use the formula:
  • 25. So n = 664, as we always “round up” in case of sample size determination. Module 6 point estimators and sampling distributions Master for Business Statistics Dane McGuckian Topics 6.1 Point Estimators and Sampling Distributions 6.2 The Central Limit Theorem 6.1 Point Estimators and Sampling Distributions Sampling Distributions The sampling distribution of a statistic is a probability distribution for all of the possible values of a sample statistic
  • 26. that can be derived from samples of a given size. Recall that a probability distribution provides all possible outcomes for an experiment and the probability associated with each of these outcomes. Example: If we took every possible random sample of 25 values from a population and calculated the sample mean for each sample, the resulting sampling distribution for the sample mean would provide all possible means that could result from a sample of 25 values drawn from this population along with the probability that each of those means occurs. Depending on the type of data involved, the sampling distribution can be represented in a table format, as a histogram, or as a formula. Sampling Distributions There are essentially three things we want to know about the sampling distribution for any sample statistic: What is the shape of the sampling distribution? Where is the center (the mean) of the sampling distribution? How much spread or dispersion (variation) does the sampling distribution have? Sampling Distributions Example: Imagine that we select 2 balls, with replacement, from a box containing two numbered balls and average the values that appear on the selected balls. One of the balls has the number 0 printed on it, and the other has the number 1 printed on it. In this scenario, what is the sampling distribution for the sample mean?
  • 27. Let’s begin by listing all of the possible outcomes for the two selections. The possible outcomes are: 00, 01, 10, and 11. Next, we can determine each of the possible means: Sampling Distributions Because each ball has an equal probability of being chosen, each of the listed outcomes on the previous slide (00, 01, 10, and 11) has an equal chance of occurring ( Consider the table below: Next, we will convert this table into a probability distribution for t he sample mean. SampleP()0,000.250,10.50.251,00.50.251,110.25 Sampling DistributionP()00.250.50.510.25 Sampling distribution of the Sample Mean
  • 28. Now that we have the probability distribution for the sample mean, we can use it to calculate the mean of the sample means and the standard deviation of the sample means: Point Estimators A point estimate is a statistic computed from a sample that is designed to estimate a population parameter. The preferred estimate for the population mean () is the sample mean (). So if we want to estimate the population mean, we would get some sample data and then we would determine the sample mean for the sample data, and that would be our point estimate. The Standard Error of an Estimator The Standard Error of an Estimator tells us how the estimator will vary from sample to sample. The estimator will not be the same for every sample, so the standard error helps us understand how consistent the estimator will be from sample to sample. Population mean: Point Estimator: Standard Error:
  • 29. The Desired Traits of a Point Estimator Ideally, our point estimators should be unbiased estimators. Among unbiased estimators, we want the estimator with the minimum variance. If an unbiased estimator is available they are preferred over biased estimators. Example: Estimator A is not unbiased because it misses almost always. Estimators B and C are unbiased. Estimator C has smaller variance than Estimator B, hence it is called Minimum Variance Unbiased Estimator (MVUE). 6.2 The Central Limit Theorem The Central Limit Theorem The Central Limit Theorem states that for a sufficiently large sample of size n, taken from a population that is not normally distributed, the sample mean has an approximately normal probability distribution. In most cases, a sample size greater than thirty is large enough to assume that is approximately normal.
  • 30. The Central Limit Theorem The Central Limit Theorem describes the sampling distribution of the sample mean. If all samples of size n are selected from a population of measurements with mean, , and standard deviation, , the distribution of the sample mean has the following mean and standard deviation (standard error): Mean of the sample means is: Standard error of the sample mean (the standard deviation of the distribution of sample means) is: The Central Limit Theorem If the population of measurements is normally distributed, the distribution of the sample means will be normal regardless of the size (n) of the sample. However, if the population of measurements is not normally distributed, the distribution of the sample means will only be approximately normal when the sample size is suitably large. As a good rule of thumb, we will assume that any sample size larger than 30 is large enough to ensure the distribution for the sample means is approximately normal. This approximation will improve for larger values on n. The Central Limit Theorem Examples: The random variable X has a highly skewed distribution. If samples of size 5 are taken from the population of X values, the
  • 31. distribution of the sample means will not necessarily be normal; however, if samples of size 35 are taken from the population, we can assume the distribution of the sample means will be approximately normal. The random variable X has a normal distribution. If samples of size 2 are taken from the population of X values, the distribution of the sample means will be normal because the distribution of the sample means is normal at any sample size when X is normal. The Mean of the Sample Mean When discussing the Central Limit Theorem, which describes the sampling distribution of the sample mean, we stated that the mean of the sample means for all samples of size n is always equal to the population mean. In other words, if all samples of size n are selected from a population of measurements with mean , the mean of the sample means is . Example: If the true average IQ score for a population is 100 and every possible sample of size 15 is taken from the population, the sample means calculated from each of those samples will have an average equal to100 because that is the mean for the population. For any particular sample size, the mean of all of the sample means is equal to the population mean. The Mean of the Sample Mean The Sample Mean IQ Scores for all Possible Groups of 15
  • 32. People: (this is a partial list because the actual list would be very long) To understand the idea discussed on the previous slide, imagine that for each sample of 15 individuals selected we calculate an average IQ score. These sample means will be recorded (perhaps in a list like the ones illustrated above), and once we have calculated a sample mean from every possible sample of 15 people, we will then average all of those sample means in our list. The result will be the population mean IQ score, which in this case is 100. 105989711195919410011310195… The Standard Error of the Mean When we introduced the Central Limit Theorem, which describes the sampling distribution of the sample mean, we discussed the standard error of the mean (the standard deviation of the sample means) for all samples of size n. In that discussion we stated that if all samples of size n are selected from a population of measurements with standard deviation , the standard error of the mean is . Example: If the true standard deviation for IQ scores for a population is 15 and every possible samples of size 9 is taken from the population, the sample means calculated from each of those samples will have a standard error that is equal to For any particular sample size, the standard error for the mean is equal to the population standard deviation divided by the square root of the sample size.
  • 33. The Standard Error of the Mean This definition of the standard error for the mean assumes that the sampling is done with replacement of that the population we are sampling from in infinite. Sampling with replacement implies that a value that has been selected during the sampling procedure is available to be selected again and again in the same sample. In the extreme case, this sampling procedure could produce a sample of n measurements which consists entirely of one value repeated n times. This underlying assumption is a concern, because typically we do not take samples from infinite populations, and typically, we do not sample with replacement. For example, if we are conducting a study on human height by measuring 10 randomly selected people, we probably would not want our sample to consist of one person’s height repeated 10 times. The Standard Error of the Mean Fortunately, we can modify the standard error formula to accommodate the finite population case pretty easily. To find the standard error of the mean when sampling from a finite population, we use a multiplier often referred to as the finite population correction factor: where N is the size of our population and we are selecting a sample of size n If we are taking a sample of size n, without replacement, from a finite population of size N, the standard error for the mean becomes:
  • 34. The Standard Error of the Mean The formula for the standard error of the mean when sampling from a finite population only differs from our previous formula by the finite population correction factor., and often, we can ignore this difference. When sampling from a large finite population without replacement, it is acceptable to use the original formula we provided as an approximation to the standard error of the mean. How large does our finite population have to be to use this approximation? Typically, if our sample size is not more than 5 percent of the population, we can use the population standard deviation divided by the square root of the sample size to approximate our standard error for the mean. For the exercises included in this course, we have assumed that you will not be using the finite population correction factor. This means you can safely use the formula provided on the first slide when you are asked to determine the standard error for the mean. The Variation in the Sample Means If all samples of size n are selected from a population of measurements with standard deviation , the standard error of the mean is . Because the standard error of the mean is equal to the standard deviation, σ, divided by the square root of the sample size, the standard error for the mean is always less than the standard deviation for the random variable.
  • 35. The Variation in the Sample Means This implies that a set of sample means from a population will also exhibit less variation than the random variable for that same population. For example, if the sample size for the sample means is 4, the standard deviation for the sample means will be half as large as the standard deviation () for the random variable. This means the distribution for the sample means is more clustered around the mean for the population than the distribution for the random variable is. This is a useful trait because it implies that as the sample size increases, our sample means will move closer and closer to the true population mean. The Variation in the Sample Means Example: An investor has two sets of data involving the closing stock price for a company in the NASDAQ. One set of data contains the closing stock prices for a random selection of 12 days taken over the course of the year, and the other set of data contains 12 averages obtained from random samples of 4 days of closing prices taken over the same year. Which data exhibits a larger amount of variation?
  • 36. The Variation in the Sample Means Closing prices of 12 randomly chosen days (sample standard deviation s = $85.24): Average closing prices for 12 samples of n = 4 days (sample standard deviation s = $38.89): It is clear that the set of averages has far less dispersion than the set of individual observations. If we had every sample mean possible for all samples of four (selected with replacement), the standard deviation of these sample means would be , where is the standard deviation for the daily closing prices. The Central Limit Theorem and Calculating Probabilities for the Sample Mean A software company’s average daily stock price last year was $38.12. The standard deviation for those prices was $2.45. If a random selection of 32 days were chosen from last year, what is the probability that the average price of the company’s stock for those 32 days is more that $37.00? By Central Limit Theorem, since the sample size (32) is greater than 30, the distribution of the average stock prices is approximately normal. So The z-score for $37 is: . So
  • 37. Module 5 continuous random variables Master for Business Statistics Dane McGuckian Topics 5.1 Continuous Random Variables 5.2 The Normal Distribution 5.3 Applications of the Normal Distribution 5.4 Normal Approximation to the Binomial Distribution 5.1 Continuous Random Variables Continuous Random Variables Continuous random variables usually result from measuring something like a distance, a weight, a length of time, a volume, or some other similar quantity.
  • 38. Because they can take on any value inside a particular interval, there are an infinite, uncountable number of possible values for any continuous random variable. For this reason, when working with a continuous random variable, we will discuss the probability that the random variable is within some specified range. Continuous Random Variables Example: A fast-food restaurant manager tracks the length of time his customers wait or their orders. The random variable is continuous because it consists of measured lengths of time. We could consider the probability that a customer waits more that five minutes for his or her order, less than five minutes, between four and five minutes, or some other suitable interval of time. But the probability that a customer waits exactly five minutes for the order is zero. A consequence of continuous random variables having an infinite, uncountable set of possible set of values is that the probability of any continuous random variable equaling a specific value is always zero. Continuous Random Variables The probability distribution for a continuous random variable is usually represented by a function called the probability density function (pdf). These functions produce smooth curves when graphed, and probability for the random variable is defined as the area under the curve between any two specified points.
  • 39. Discrete versus Continuous Random Variables It is common to discuss the probability that a discrete random variable takes on a specific value, but because a continuous random variable has an infinite number of possible values in a particular range, we do not typically discuss the probability that a continuous random variable takes on a specific value. Continuous probability distribution Discrete probability distribution The Area under a Continuous Probability Function In a continuous probability distribution, the probability than an event, x, is between two numbers is represented by an area, A. The area under the curve represents all the possible probabilities that can occur from negative infinity to positive infinity. Because the total probability for all continuous probability distributions is one, the area under the curve must also be one.
  • 40. Continuous Uniform Distribution A continuous random variable has a uniform distribution if the graph of its probability distribution is rectangular in shape and can be completely defined by its minimum and maximum values. Like all continuous distributions, the total area under the graph of a uniform distribution is equal to one, and there is a direct relationship between the area under the curve between two specific points and the probability of the random variable assuming a value between those two points. The mean for the uniform distribution is Continuous Uniform Distribution The standard deviation for the uniform distribution is where: is the minimum value for the distribution is the maximum value for the distribution. For any uniform distribution, there is a uniform height to its curve. The height of the uniform distribution for any value such that
  • 41. is given by . For any value outside of the interval the height of the curve is zero. Continuous Uniform Distribution Since the shape of the uniform curve is rectangular and probability corresponds to area under the curve, the probability that for a uniformly distributed random variable defined by the interval is when . The Probability of a Uniform Distribution For a uniform distribution defined on the interval when . Probabilities for the Uniform Distribution The amount of time it takes an accountant to prepare tax returns for her clients is uniformly distributed over the interval between
  • 42. 15 minutes and 60 minutes. What is the probability that she will finish a tax return in 40 minutes or less? Here a = c = 15; b = 60, d = 40. So Thus there is a 55.6% probability that she will finish a tax return in 40 minutes or less. 5.2 The Normal Distribution The Normal Distribution The normal distribution is a continuous distribution that appears in many applications Many natural phenomena can be modeled using the normal probability density function. The formula for the normal curve is where Note: and The Normal Distribution
  • 43. Notice that the formula for the normal distribution contains the symbols and , which represent the mean and the standard deviation respectively. The mean determines the location of the curve on the number line, and the standard deviation determines the width or spread of the distribution. The values of these parameters depend on the population being studied. For this reason, the formula on the previous slide actually represents a family of normal distributions, not just one curve. Example: The heights of men have a normal distribution, and men have mean height of 69 inches. The heights if women are also normally distributed, but the mean for women’s heights is 64 inches. The two curves also have different standard deviations. The Normal Distribution In the illustration below, the two normal curves have different means and different standard deviations. The difference in the shape of the two curves is a result of the curves having different standard deviations. The difference in their position on the number line is due to their different means. The taller and the narrower curve belongs to the distribution with the smaller standard deviation. The Shape of the Normal Distribution
  • 44. The graph of the normal probability density function is bell- shaped, but there is not just one normal curve. There is no limit to the different possible combinations of and , so there are an infinite number of different normal curves. The particular scale and location of a normal distribution will depend on the distribution’s specific mean and standard deviation. However, all normal curves are bell-shaped, and they are always perfectly symmetric around their mean. The Shape of the Normal Distribution The graph of the normal probability distribution function is bell-shaped and perfectly symmetric around its mean. This indicates that the left side of the normal distribution is a perfect mirror image of the right side of the distribution. Since the total area under all normal curves is 1.00 and all normal curves are symmetric around their mean, half of the area (0.50) is below the mean, and half of the area (0.50) is above the mean. This is very useful information. Example: If women’s heights are normally distributed and the average height for women is 64 inches, we can say with certainty that half of all women are shorter than 64 inches. Of course that also implies half of all women are taller than 64 inches. The Normal Distribution
  • 45. There is not just one normal curve but there are an unlimited number of normal curves. Example: Human height is normally distributed, but the heights of men and women form different normal distributions. IQ scores are also normally distributed, but those scores form a different normal curve than the ones formed by male and female heights. The list of examples is endless, so when we speak of the normal distribution, we are referring to a family of curves that have the same underlying structure. The mean and standard deviation, and , allow a single probability density function to produce a family of normal distributions. Converting Normal Random Variables into Standard Normal Random Variables When working with normal random variables, we have a need to find areas or probabilities, but the probability density function for the normal distribution is mathematically difficult to work with. For this reason, when solving problems involving a normally distributed random variable, it would be very helpful to have a table of probabilities for the normal curve. However, there isn’t just one normal curve to tabulate probabilities for. Because there is no limit to the different possible combinations of and , there are an infinite number of different normal curves. Therefore, we would need an infinite number of normal probability tables to handle every possible application of the normal curve. Fortunately, there is a way to work around this difficulty.
  • 46. Converting Normal Random Variables into Standard Normal Random Variables It is possible to convert any normal random variable with mean () and standard deviation () into a standard normal random variable. A standard normal random variable is a normally distributed random variable that has a mean equal to zero ( = 0) and a standard deviation equal to one ( = 1). To convert a normal random variable () into a standard normal random variable (), we use the following formula: where is the value of a measurement (or observation) taken from a normally distributed population is the mean of the distribution for is the standard deviation of the distribution for is the standard normal value Parameters of the Standard Normal Distribution A standard normal random variable is a normally distributed random variable that has a mean equal to zero ( = 0) and a standard deviation equal to one ( = 1). Because the standard normal distribution has a mean of zero and
  • 47. a standard deviation of one, values equate to the number of standard deviations above (or below) the mean. Example: A standard normal value of 1 is the same as one standard deviation above average. By using a standard normal probability table, it is possible to find the probability that a standard normal value falls between any two points on the –axis. Z tables and Finding the Areas Under the Standard Normal Curve Between the Mean and a Value Use the standard normal curve to find P( , because the normal curve is symmetric. = 0.4429 from the table.
  • 48. Areas Under the Standard Normal Curve Inside an Interval Surrounding the Mean Use the standard normal curve to find because of symmetry , from the normal table = 0.7597 Areas Under the Standard Normal Curve between a Positive Z Value and Infinity Use the standard normal curve to find , as the total area to the left of the mean (0) is 0.5 , from the table
  • 49. Areas Under the Standard Normal Curve between Two Values on the Same Side of the Mean Use the standard normal curve to find Areas Under the Standard Normal Curve between a Negative Z Value and Infinity Use the standard normal curve to find
  • 50. , as the total area to the left of the mean (0) is 0.50. , from the table 5.3 Applications of the Normal Distribution The Probability that a Non-Standard Normal Random Variable is Greater than an Above-Average Value The time it takes a computer chip manufacturer to produce a single chip is normally distributed with a mean of 18.0 seconds and a standard deviation of 1.2 seconds. Find the probability that a chip will take longer than 19.8 seconds to produce.
  • 51. The Probability that a Non-Standard Normal Random Variable is Less than an Above-Average Value A large investment bank in Miami released a report on the starting salary offers it made to MBA graduates. The salaries are normally distributed with a mean of $89,200 and a standard deviation of $2,100. Find the probability that a randomly selected MBA graduate was offered a starting salary of less than $92,000. The Probability that a Non-Standard Normal Random Variable is Between Two Values that Surround the Mean The containers on a mega-cargo ship in the port of Los Angeles have weights that are normally distributed with a mean of 55,600 pounds and a standard deviation of 2,800 pounds. What is the probability that a randomly selected container from the ship weighs between 53,123 pounds and 60,123 pounds?
  • 52. The Probability that a Non-Standard Normal Random Variable is Between Two Values that are on the Same Side of the Curve A manufacturer produces gears for use in an engine’s transmission that have a mean diameter of 10.00mm and a standard deviation of 0.03mm. The lengths of these diameters have a normal distribution. Find the probability that a randomly selected gear has a diameter between 9.94mm and a 9.96mm. The Value Corresponding to an Upper Percentile of the Normal Distribution A company in California is concerned about the length of time that its employees spend commuting to work. The one-way commute times for its employees are normally distributed with a mean of 32.1 minutes and a standard deviation of 5.3 minutes. What is the commute time that separates the longest 20% of commutes from the rest?
  • 53. Here we will work in the “reverse” direction – from % Z (from table) X (using formula). The z-value such that is 0.84 (closest probability in the table being 0.2995. Now, So 36.6 minutes is the commute time that separates the longest 20% of commutes from the rest. The Value Corresponding to a Lower Percentile of the Normal Distribution A financial services company gives an analytical reasoning test to all job applicants. The completion times for the test are normally distributed with a mean of 50.40 minutes and a standard deviation of 3.10 minutes. What completion time separates the fastest 6% of applicants from the others? We again work in the “reverse” direction. Here the z-value such that is -1.555 (closest probability value is 0.4394). Now So 45.58 minutes is the completion time that separates the fastest 6% of applicants from the others.
  • 54. 5.4 Normal Approximation to the Binomial Distribution Using the Normal Distribution to Approximate the Binomial Distribution When using the normal curve to estimate a binomial probability distribution, we must check two things to confirm the fit is reasonably good: If either of these is not true, we need to find a different method of approximation. The Use of the Continuity Correction Factor Continuity correction is used when using the normal approximation to binomial probability. Example: The rectangle for x=2 actually goes from 1.5 to 2.5 on the normal distribution. Therefore we need to add or subtract that extra 0.5 when we are looking at the probability that x is less than or greater than 2.
  • 55. Know the Reason for the Use of the Continuity Correction Factor A marketing firm for the movie industry reports that the average film is 128 minutes with a standard deviation of 15 minutes. Assuming these film durations have a bell-shaped distribution, what percent of films have a duration between 158 minutes and 173 minutes? The area marked in red is required, which is given by: 49.85% - 47.5% = 2.35% Thus, 2.35% of films have a duration between 158 minutes and 173 minutes. The Use of the Continuity Correction Factor Based on prior experience, a car dealership has a 45% chance of selling an extended warranty with each used car that is sold. We want to use the normal approximation to the binomial distribution to find the probability of selling 25 or less extended warranties when 60 cars are sold. Using continuity correction, state the appropriate probability that will need to be found on the normal curve. Here X is the number of warranties to be sold, so X = 25 or less. n = 60, p = 0.45, and 1-p = q =0.55. So, on the bell-shaped curve (normal) is the probability that will need to be found.
  • 56. The Normal Distribution and the Probability that a Binomial Random Variable is Greater than a Value Thirty percent of visitors to a local toy retailer will make a purchase before exiting the store. Use the normal approximation for binomial probability to determine the probability that more than 50 visitors out of 200 will make a purchase. Here X is the number of visitors who make a purchase, so X = more than 50. Also n = 200, p = 0.30, and 1-p = q =0.70. It is a binomial distribution because a customer will either make a purchase or not. Mean = = 200.0.30 = 60 Using the continuity correction factor, we have to find because the problem states “more that 50”. The z-score is: So, (from table). The Normal Distribution and the Probability that a Binomial Random Variable is Less than a Value A small regional airline overbooks its flights because historically only 90% of the reservations will actually show up for the flight. If a flight has 100 available seats, the airline will typically sell 110 reservations for the flight. What is the probability that at most 95 people show up for a flight with 110 reservations? Here X is the number of people who show up so X = at most 95. Also n = 110, p = 0.90, and 1-p = q =0.10. It is a binomial distribution because a person will either show up for the flight
  • 57. or not. Mean = n.p = 110.0.90 = 99 Using the continuity correction factor, we have to find because the problem states “at most 95”. The z-score is: So, (from table). The Normal Distribution and the Probability that a Binomial Random Variable is Between Two Values A small regional airline overbooks its flights because historically only 90% of the reservations will actually show up for the flight. If a flight has 100 available seats, the airline will typically sell 110 reservations for the flight. Use the normal approximation for binomial probability to determine the probability that between 100 and 107 people (inclusive) show up for a flight with 110 reservations? Here X is the number of people who show up so X = between 100 and 107 or [100,107]. Also n = 110, p = 0.90, and 1-p = q =0.10. Mean = = 110.0.90 = 99 Using the continuity correction factor, we have to find because 100 and 107 are both included”. The z-scores are: So, (from table). Module 4
  • 58. Discrete Random variables Master for Business Statistics Dane McGuckian Topics 4.1 Probability Distributions for Discrete Random Variables 4.2 Expected Value, Variance, and Standard Deviation for Discrete Random Variables 4.3 The Binomial Probability Distribution 4.4 The Poisson Probability Distribution 4.1 Probability Distributions for Discrete Random Variables Discrete Random Variable A discrete random variable is a variable that can only assume a countable number of values. The achievable values of a discrete random variable are separated by gaps. Example: a publisher may sell 300,000 or 300,001 copies of its latest book, but it cannot sell 300,000.159 copies of its latest book Discrete random variables contain observations that are not measured on a continuous scale Most often a discrete random variable contains observations that are derived from counting something.
  • 59. Discrete Random Variables Examples of Discrete Random variables: The number of clicks received by an online advertisement over the past hour The number of books sold by an author yesterday The number of people missing the most recent flight from Miami to London The number of parking violations last semester on campus Discrete Probability Distributions The probability distribution of a discrete random variables lists all of the possible outcomes for the random variable and the associated probability for each of those outcomes. The distribution can be represented by a table, a graph or a formula. Number of Female Jurors, XProbability of Outcome P(X)00.00810.06120.18630.30340.27850.13660.028 Characteristics of a Probability Distribution A probability distribution lists all possible outcomes for the experiment and the corresponding probability for each of those outcomes. Remember that the probabilities cannot be negative Each probability must lie between zero and one The sum of the probabilities for all of the outcomes must be one.
  • 60. Example: Probability distribution of the number of free throws made by basketball players who make free throws 80% of the time (X). For instance, there is a 4% chance that a player misses both throws All these probabilities are non-negative Each probability lies between 0 and 1 The sum of these probabilities are: 0.04+0.32+0.64 = 1 XP(X)00.0410.3220.64 4.2 Expected Value, Variance, and Standard Deviation for Discrete Random Variables The Mean of a Discrete Probability Distribution The average value for a probability distribution is referred to as the expected value of the probability distribution. It represents the long-run typical value for the random variable. If it were possible to run the trials indefinitely, the expected value would be the mean for the infinite set of outcomes for the random variable that would result from those trials. The Expected value The expected value is essentially a weighted average of the possible outcomes for the random variable. The weights are the corresponding probabilities for those outcomes. Just as the arithmetic mean we studied earlier, it is common for the expected value to be a decimal of a fraction even when the
  • 61. original set of outcomes must be whole numbers. The Expected value of a Discrete Random Variable How much money on average will an insurance company make off of a 1-year life insurance policy worth $50,000, if they charge $1000.00 for the policy and each policy holder has a 0.9999 of surviving the year? Average implies mean, and that mean “expected value” in the context of a probability distribution. The formula is: If a person lives, the company makes $1000 (hence it is “positive”); if the person dies, it pays the family $50,000 (but also gets $1000 from the family), so their loss is $50,000 - 1000 = $49,000. EventsXP(X)x.P(x)Lives+10000.9999999.90Dies-49,0001 – 0.9999 = 0.0001-4.901.0000995.00 = The Expected value of a Discrete Random Variable A life insurance policy that sells for $200 and should the person pass away before the end of the year, the family gets a check for $10,000 from the company. The company can expect their profits divided by the number of
  • 62. policies sold (profit per policy) to be approximately $190. The expected value is the long-run average after many, many trials, so while the company’s average profit is unlikely to ever be exactly $190, the more policies they sell, the closer and closer the company’s profit per policy will be to $190. EventsXP(X)x.P(x)Dies-9,8000.001- 9.80Lives2000.999+199.801.0000$190.00 = Using Expected value to Distinguish between Two Possible Courses of Action A bank can either risk $20,000 on a currency investment that has a 51% chance of earning them $40,000 in profit, or then can risk $700,000 on a bond investment that has a 98% chance of earning them $40,000. In the long run, which strategy will yield the most profit? Currency: Bond: Average profit from currency investment is $10,600 and that from the bond investment is $25,200. Thus the bond yields higher average profit, and is the better choice.xP(x)x.P(x)Profit+40,0000.5120,400Loss-20,0000.49- 9,8001.00$10,600 = xP(x)x.P(x)Profit+40,0000.9839,200Loss- 700,0000.02-14,0001.00$25,2000 = The Variance and Standard Deviation of a Discrete Random
  • 63. Variable The mean value for a discrete probability distribution provides the typical value for the random variable. In other words, the mean tells us what we can expect to happen on average over the long run, but if we want to know how varied the outcomes for the random variable will be, we can calculate the variance or standard deviation for the random variable. Variance for a probability distribution The standard deviation for the random variable is found by taking t he square root of the variance of the random variable. Standard Deviation for a probability distribution The Variance and Standard Deviation of a Discrete Random Variable Calculate the standard deviation of the probability distribution shown here (round to the thousandths place): xP(x)x.P(x)000.1500110.480.480.48420.250.501.00930.120.361 .081.34 = 2.56 =
  • 64. Determine if an Event is Unusual using the Mean and Standard Deviation of a Random Variable A business venture offers an expected profit of $28,000 with a standard deviation of $5,250. Would it be unusual to earn less than $20,000 on the deal? (Hint: consider any value more than two standard deviations away from the mean as unusual) Thus an earning of $20,000 is not unusual because it falls within the interval above. 4.3 The Binomial Probability Distribution The Five Characteristics of a Binomial Experiment A Binomial Experiment has a fixed number of trials, only two possible outcomes for each trial, one trial cannot affect outcome of the next trial, the probability has to remain constant from one trial to the next, and x must represent the number of successes. Example: Flip a coin 3 times and count the number of heads that turn up (say, 1). Is this a binomial experiment? There is a fixed number of trials, n = 3 There are 2 outcomes in each trial – heads and tails The trials are independent, and the outcome of each flip does
  • 65. not affect that of the later flips Each flip had a 50% chance of turning up heads x = 1 here (success class is “heads”) Binomial Probability Formula The probability of having X successes out of n trials during a binomial experiment is given by the following formula (recall that : where n = the number of trials for the binomial experiment x = the number o successes p = the probability of a success q = the probability of a failure ( Binomial Probability Formula Example: If a binomial experiment involves slipping a fair coin 7 times and counting the number of heads that result, the probability of 5 heads turning up in 7 flips is provided below: n = 7 (there are 7 flips of the fair coin) x = 5 (we are looking for the probability of getting 5 heads) p = 0.50 (a fair coin has a 50% chance of turning up heads on a single flip) q = 0.50 (the probability of failure is found by subtracting the probability of success from 1) Thus the probability of getting 5 heads out of 7 flips is 1.64%.
  • 66. The Probability of X successes in a Binomial Experiment A cable company believes that their new promotion will convince 20% of satellite television subscribers to sign up for cable. If the company is correct, what is the probability that 2 out of 8 randomly selected satellite users end up switching to cable after hearing the promotion? The fact that there are 2 groups that will behave differently (some will switch and some will not) denotes it’s the binomial distribution. Test to determine if this is a binomial experiment: (1) fixed number of trials = 8; (2) there are 2 possible outcomes (switches or not); (3) constant probability of success (switching) is 20%; (4) 8 unique users, so trials are independent (assume no user is called twice). n = 8, X = 2 switch, p = 0.20, q = 1 – 0.20 = 0.80. So Thus there is a 29.4% probability that 2 out of the 8 satellite users switch to cable after hearing the promotion. The Probability of a Cumulative Set of Events for a Binomial Experiment A drug company reports that 65% of balding men would benefit from using an over the counter hair-loss solution they manufacture. Assuming the company’s claim is correct and a
  • 67. random sample of 10 men are selected for a clinical trial of the product, what is the probability that at least 9 of the men benefit from the solution? This is a binomial experiment because: (1) there is a fixed number of trials, 10 men selected; (2) each trial has 2 outcomes (working or not working); (3) constant probability of 65% of benefitting from the solution; (4) trials are independent (no chance of repetition) Here n = 10, X = 9 or 10, p = 0.65, q = 1 – 0.65 = 0.35. So P(X = 9 or X = 10) = P(X = 9) + P(X = 10) = + = 0.086 So there is an 8.6% chance that at least 9 men would benefit from the solution. The Probability of a Cumulative Set of Events for a Binomial Experiment A laptop manufacturer knows that 30% of its laptops will fail within the first two years of use. If seven randomly selected customers are surveyed, what is the probability that more than 3 of them experienced a laptop failure within the first two years of use? This is a binomial experiment because: (1) there is a fixed number of trials – 7 men selected randomly; (2) there are two outcomes (a laptop fails or not within the first two years of use); (3) constant probability of 30% of a laptop failing in two years’ time; (4) customers are independent. Here n= 7, X = more than 3 had a failure (4, 5, 6, or 7), p = 0.30, q = 1 – 0.30 = 0.70. So, from the table: P(X=4) = 0.0972 P(X=5) = 0.0250 P(X=6) = 0.0036 P(X = 7) = 0.0002
  • 68. Adding all these, P(X more than 3) = 0.1260 Mean for a Binomial Probability Distribution We could calculate the mean of the binomial probability distribution by listing all of the possible outcomes for the experiment, listing all of their corresponding probabilities, and then applying the formula: However, that method could be very time consuming. There is a simpler approach when trying to find the mean of a binomial probability distribution. First we identify the number of trials for the experiment (n) and the probability of success (p). Then we apply the following formula: Mean for a Binomial Probability Distribution Example: If a quality control manager samples computer chips from a production line with replacement that have a 0.009 probability of being defective, what is the average number of defective chips that will be found in a sample of 200 chips? This sampling procedure produces a binomial probability
  • 69. distribution, so we can apply the formula for finding the mean of a binomial distribution. In this example, there are 200 trials, and the probability of finding a defective chip in 0.009 defective chips For groups of 200 computer chips random selected from the production line, the average number of defective chips is 1.8. The Mean of a Binomial Random Variable An electronics retailer notes that only 8% of its online customers choose to purchase their extended service plan. If the retailer has 300 online sales over the next month, what is the expected number of customers that will choose to purchase the extended service plan? This satisfies the conditions of a binomial experiment. So since n = 300 and p = 0.08 here. Thus the expected number of customers that will choose to purchase the extended service plan is 24. Standard Deviation for a Binomial Probability Distribution We could calculate the standard deviation for the binomial probability distribution by listing all of their corresponding probabilities, and then applying the formula . However, that method would be very time consuming.
  • 70. There is a simpler approach when trying to find the standard deviation of a binomial probability distribution. First, we identify the number of trials for the experiment (n) and the probability of success (p). Then we apply the following formula: Standard Deviation for a Binomial Probability Distribution Example: If a quality control manager samples computer chips from a production line with replacement that have a 0.009 probability of being defective, what is the standard deviation for the number of defective chips that will be found in a sample of 200 chips? This sampling procedure produces a binomial probability distribution, so we can apply the formula for finding the standard deviation of a binomial distribution. In this example, there are 200 trials, and the probability of finding a defective chip in 0.009. n = 200, p = 0.009, q = 1 – 0.009 = 0.991 Standard Deviation for a Binomial Probability Distribution The variance will be Then we simply take the square root to find the standard deviation:
  • 71. defective chips For groups of 200 computer chips randomly selected from this factory, the standard deviation for the number of defective chips is 1.336. Standard Deviation of a Binomial Random Variable A recent report states that only 28% of software projects were expected to finish on time and on budget. If we randomly sample 80 software projects, what is the standard deviation for the number of projects that are expected to finish on time and on budget? This satisfies all the conditions for binomial experiment, so we can use the formula for the standard deviation to calculate this: 4.4 The Poisson Probability Distribution The Poisson Distribution The Poisson distribution is a discrete probability distribution that provides probabilities for the number of occurrences of some event over a given period, interval, distance, or space. Example: A customer service call center might use the Poisson
  • 72. distribution to describe the behavior of incoming calls over different time periods. Example: A website might use the Poisson distribution to estimate the likelihood of some number of individuals logging onto the site between the hours of 12:00AM and 1:00AM. Example: A mining company might use the Poisson distribution to model the number of methane gas releases over a specified depth. The Poisson distribution is typically used to model the occurrences of rare events. The Poisson Distribution The probability that the specified events occurs X times over some defined interval is given by the following formula: where (mu) is the mean (expected) number of occurrences (successes) over a particular interval x is the number of occurrences (successes) e is a constant (the base of the natural log) that is approximately equal to 2.71828 The Poisson Distribution Here are some important characteristics of the Poisson distribution: The random variable is the number of occurrences of some event over some defined interval. The probability of the event is proportional to the size of the closed interval. The intervals do not overlap.
  • 73. The occurrences are independent of each other. Mean and Standard Deviation of the Poisson Distribution The mean of the Poisson distribution is The standard deviation for the Poisson distribution is given by Some Important Differences between the Binomial and the Poisson Distributions The binomial distribution is dependent upon the sample size and the probability of success, while the Poisson distribution only depends on the mean . In the binomial distribution, the random variable can take on values of 0,1,..,n, while in the Poisson distribution, the random variable can be any integer greater than or equal to zero. In other words, there is no upper bound for the number of occurrences in the given interval. Probability and the Poisson Distribution A cellular communication company finds that during a 10- minute phone call there will typically be one incidence of poor reception. Use a Poisson distribution to calculate the probability that there will be 5 incidences of poor reception during a call that lasts an hour.
  • 74. Here x = 5, = 6 since an average of 1 incidence of poor reception in a 10-minute period implies an average of 6 incidences in 60 mins (1 hour).