SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
© 2008 Brooks/Cole, a division of1
Chapter 4
Numerical Methods for
Describing Data
© 2008 Brooks/Cole, a division of2
Describing the Center of a Data
Set with the arithmetic mean
1 2 n
sum of all observations in the sample
x
number of observations in the sample
x x x x
n n
=
+ + +
= = ∑L
The sample mean of a numerical
sample, x1, x2, x3,…, xn, denoted
, is
The sample mean of a numerical
sample, x1, x2, x3,…, xn, denoted
, is
© 2008 Brooks/Cole, a division of3
Describing the Center of a Data
Set with the arithmetic mean
The population mean is
denoted by µ, is the average of
all x values in the entire
population.
© 2008 Brooks/Cole, a division of4
The “average” or mean
price for this sample of 10
houses in Fancytown is
$295,000
Example calculations
During a two week period 10 houses
were sold in Fancytown.
x 2,950,000
x
n 10
295,000
= =
=
∑House Price
in Fancytown
x
231,000
313,000
299,000
312,000
285,000
317,000
294,000
297,000
315,000
287,000
2,950,000x =∑
© 2008 Brooks/Cole, a division of5
Example calculations
During a two week period 10 houses
were sold in Lowtown.
The “average” or mean
price for this sample of 10
houses in Lowtown is
$295,000
Outlier
House Price
in Lowtown
x
97,000
93,000
110,000
121,000
113,000
95,000
100,000
122,000
99,000
2,000,000
2,950,000x =∑
x 2,950,000
x
n 10
295,000
= =
=
∑
© 2008 Brooks/Cole, a division of6
Reflections on the Sample
calculations
Looking at the dotplots of the samples for
Fancytown and Lowtown we can see that the
mean, $295,000 appears to accurately
represent the “center” of the data for
Fancytown, but it is not representative of the
Lowtown data.
Clearly, the mean can be greatly affected by
the presence of even a single outlier.
Outlier
© 2008 Brooks/Cole, a division of7
Comments
1. In the previous example of the house
prices in the sample of 10 houses
from Lowtown, the mean was
affected very strongly by the one
house with the extremely high price.
2. The other 9 houses had selling
prices around $100,000.
3. This illustrates that the mean can be
very sensitive to a few extreme
values.
© 2008 Brooks/Cole, a division of8
Describing the Center of a Data
Set with the median
The sample median is obtained by first
ordering the n observations from smallest
to largest (with any repeated values
included, so that every sample
observation appears in the ordered list).
Then
the single middle value if n is odd
sample median=
the mean of the middle two values if n is even



© 2008 Brooks/Cole, a division of9
Consider the Fancytown data. First, we put
the data in numerical increasing order to
get
231,000 285,000 287,000 294,000
297,000 299,000 312,000 313,000
315,000 317,000
Since there are 10 (even) data values, the
median is the mean of the two values in
the middle.
297000 299000
median $298,000
2
+
= =
Example of Median Calculation
© 2008 Brooks/Cole, a division of10
Example of Median Calculation
Consider the Lowtown data. We put the
data in numerical increasing order to get
93,000 95,000 97,000 99,000
100,000 110,000 113,000 121,000
122,000 2,000,000
Since there are 10 (even) data values, the
median is the mean of the two values in
the middle.
297000 299000
median $298,000
2
+
= =
© 2008 Brooks/Cole, a division of11
Comparing the Sample Mean
& Sample Median
© 2008 Brooks/Cole, a division of12
Comparing the Sample Mean
& Sample Median
© 2008 Brooks/Cole, a division of13
Comparing the Sample Mean &
Sample Median
Typically,
1. when a distribution is skewed positively, the mean
is larger than the median,
2. when a distribution is skewed negatively, the mean
is smaller then the median, and
3. when a distribution is symmetric, the mean and the
median are equal.
Notice from the preceding pictures that the
median splits the area in the distribution in
half and the mean is the point of balance.
© 2008 Brooks/Cole, a division of14
The Trimmed Mean
A trimmed mean is computed by first
ordering the data values from smallest to
largest, deleting a selected number of
values from each end of the ordered list,
and finally computing the mean of the
remaining values.
The trimming percentage is the
percentage of values deleted from each
end of the ordered list.
© 2008 Brooks/Cole, a division of15
Example of Trimmed Mean
House Price
in Fancytown
231,000
285,000
287,000
294,000
297,000
299,000
312,000
313,000
315,000
317,000
2,950,000
291,000
295,000
300,250
2,402,000
Sum of the eight
middle values is
Divide this value
by 8 to obtain
the 10%
trimmed mean.
x =∑
x =
median =
10% Trim Mean =
© 2008 Brooks/Cole, a division of16
Example of Trimmed Mean
House Price
in Lowtown
93,000
95,000
857,000 97,000
99,000
100,000
110,000
113,000
121,000
122,000
2,000,000
2,950,000
295,000
105,000
107,125
Divide this value
by 8 to obtain
the 10%
trimmed mean.
Sum of the eight
middle values is
x =∑
x =
median =
10% Trim Mean =
© 2008 Brooks/Cole, a division of17
Another Example
Here’s an example of what happens if you
compute the mean, median, and 5% & 10%
trimmed means for the Ages for the 79 students
taking Data Analysis
© 2008 Brooks/Cole, a division of18
Categorical Data - Sample Proportion
The sample proportion of
successes, denoted by p, is
Where S is the label used for the
response designated as success.
The population proportion of
successes is denoted by π.
p sample proportion of successes
number of S's in the sample
n
=
=
The sample proportion of
successes, denoted by p, is
Where S is the label used for the
response designated as success.
The population proportion of
successes is denoted by π.
p sample proportion of successes
number of S's in the sample
n
=
=
© 2008 Brooks/Cole, a division of19
If we look at the student data
sample, consider the variable
gender and treat being female as
a success, we have 25 of the
sample of 79 students are
female, so the sample proportion
(of females) is
Categorical Data - Sample
Proportion
25p 0.316
79
= =
© 2008 Brooks/Cole, a division of20
Describing Variability
The simplest numerical measure of the
variability of a numerical data set is the
range, which is defined to be the
difference between the largest and
smallest data values.
range = maximum - minimum
© 2008 Brooks/Cole, a division of21
Describing Variability
The n deviations from the
sample mean are the differences:
Note: The sum of all of the deviations from the
sample mean will be equal to 0, except possibly
for the effects of rounding the numbers. This
means that the average deviation from the
mean is always 0 and cannot be used as a
measure of variability.
1 2 2 nx x,x x, x x, ,x x,− − − −K
© 2008 Brooks/Cole, a division of22
Sample Variance
The sample variance, denoted s2
is the sum
of the squared deviations from the mean
divided by n-1.
2
2 xx
(x x) S
s
n 1 n 1
−
= =
− −
∑
( )
2 2 2
xx 1 2 n
2
S (x x) (x x) (x x)
x x
= − + − + + −
= −∑
KNote:
( )
2 2 2
xx 1 2 n
2
S (x x) (x x) (x x)
x x
= − + − + + −
= −∑
KNote:
© 2008 Brooks/Cole, a division of23
Sample Standard Deviation
The sample standard deviation,
denoted s is the positive square root of
the sample variance.
The population standard deviation is
denoted by σ.
2
2 xx
(x x) S
s s
n 1 n 1
−
= = =
− −
∑
© 2008 Brooks/Cole, a division of24
Example calculations
10 Macintosh Apples were randomly
selected and weighed (in ounces).
Range 8.48 6.24 2.24= − =
x 74.24
x 7.424
n 10
= = =
∑
2
2 (x x) 5.5398
s
n 1 10 1
5.5398
0.61554
9
−
= =
− −
= =
∑
s= 0.61554 0.78456=
x
7.52 0.096 0.0092
8.48 1.056 1.1151
7.36 -0.064 0.0041
6.24 -1.184 1.4019
7.68 0.256 0.0655
6.56 -0.864 0.7465
6.40 -1.024 1.0486
8.16 0.736 0.5417
7.68 0.256 0.0655
8.16 0.736 0.5417
74.24 0.000 5.5398
x x−
2
(x x)−
© 2008 Brooks/Cole, a division of25
Calculator Formula for s2
and s
A computational formula for the sample variance
is given by
A little algebra can establish the sum of the
square deviations,
( )
2
2 2
xx
x
S (x x) x
n
= − = − ∑∑ ∑
( )
2
2
2 xx
x
xS ns
n 1 n 1
−
= =
− −
∑∑
© 2008 Brooks/Cole, a division of26
Calculations Revisited
The values for s2
and s are exactly the same as
were obtained earlier.
x x2
7.52 56.5504
8.48 71.9104
7.36 54.1696
6.24 38.9376
7.68 58.9824
6.56 43.0336
6.40 40.9600
8.16 66.5856
7.68 58.9824
8.16 66.5856
74.24 556.6976
2
n 10, x 74.24, x 556.6976= = =∑ ∑
( )
2
2
xx
2
x
S x
n
(74.24)
556.6976 5.53984
10
= −
= − =
∑∑
2 5.53984
s 0.615538
9
= =
s 0.615538 0.78456= =
© 2008 Brooks/Cole, a division of27
Quartiles and the Interquartile
Range
Lower quartile (Q1) = median of the lower half
of the data set.
Upper Quartile (Q3) = median of the upper half
of the data set.
Note: If n is odd, the median is excluded from both the
lower and upper halves of the data.
The interquartile range (iqr), a resistant
measure of variability is given by
iqr = upper quartile – lower quartile
= Q3 – Q1
© 2008 Brooks/Cole, a division of28
Quartiles and IQR Example
15 students with part time jobs were
randomly selected and the number of
hours worked last week was recorded.
2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25
19, 12, 14, 10, 12, 10, 25,
9, 8, 4, 2, 10, 7, 11, 15
The data is put in increasing order to get
© 2008 Brooks/Cole, a division of29
With 15 data values, the median is the
8th
value. Specifically, the median is 10.
2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25
Median
Lower Half Upper Half
Lower quartile Q1 Upper quartile Q3
Lower quartile = 8 Upper quartile = 14
Iqr = 14 - 8 = 6
Quartiles and IQR Example
© 2008 Brooks/Cole, a division of30
Boxplots
Constructing a Skeletal Boxplot
1. Draw a horizontal (or vertical) scale.
2. Construct a rectangular box whose left (or
lower) edge is at the lower quartile and
whose right (or upper) edge is at the upper
quartile (the box width = iqr). Draw a vertical
(or horizontal) line segment inside the box at
the location of the median.
3. Extend horizontal (or vertical) line segments
from each end of the box to the smallest and
largest observations in the data set. (These
lines are called whiskers.)
© 2008 Brooks/Cole, a division of31
Skeletal Boxplot Example
Using the student work hours data we have
0 5 10 15 20 25
© 2008 Brooks/Cole, a division of32
Outliers
An observations is an outlier if it is more
than 1.5 iqr away from the closest end of
the box (less than the lower quartile minus
1.5 iqr or more than the upper quartile plus
1.5 iqr.
An outlier is extreme if it is more than 3 iqr
from the closest end of the box, and it is
mild otherwise.
© 2008 Brooks/Cole, a division of33
Modified Boxplots
A modified boxplot represents mild
outliers by shaded circles and extreme
outliers by open circles. Whiskers extend on
each end to the most extreme observations
that are not outliers.
© 2008 Brooks/Cole, a division of34
Modified Boxplot Example
Using the student work hours data we have
0 5 10 15 20 25
Lower quartile + 1.5 iqr = 14 - 1.5(6) = -1
Upper quartile + 1.5 iqr = 14 + 1.5(6) = 23
Smallest data
value that isn’t
an outlier
Largest data
value that isn’t
an outlier
Upper quartile + 3 iqr = 14 + 3(6) = 32
Mild
Outlier
© 2008 Brooks/Cole, a division of35
Modified Boxplot Example
Consider the ages of the 79 students from
the classroom data set from the slideshow
Chapter 3.
Iqr = 22 – 19 = 3
17 18 18 18 18 18 19 19 19 19
19 19 19 19 19 19 19 19 19 19
19 19 19 19 19 19 20 20 20 20
20 20 20 20 20 20 21 21 21 21
21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 24 24 24
25 26 28 28 30 37 38 44 47
Median
Lower
Quartile
Upper
Quartile
Moderate Outliers Extreme Outliers
Lower quartile – 3 iqr = 10 Lower quartile – 1.5 iqr =14.5
Upper quartile + 3 iqr = 31 Upper quartile + 1.5 iqr = 26.5
© 2008 Brooks/Cole, a division of36
Smallest data
value that isn’t
an outlier
Largest data
value that isn’t
an outlier
Mild
Outliers
Extreme
Outliers
15 20 25 30 35 40 45 50
Modified Boxplot Example
Here is the modified boxplot for the
student age data.
© 2008 Brooks/Cole, a division of37
Modified Boxplot Example
50
45
40
35
30
25
20
15
Here is the same boxplot
reproduced with a vertical
orientation.
© 2008 Brooks/Cole, a division of38
Comparative Boxplot Example
100 120 140 160 180 200 220 240
Females
MalesG
e
n
d
e
r
Student Weight
By putting boxplots of two separate groups or
subgroups we can compare their distributional
behaviors.
Notice that the distributional pattern of female
and male student weights have similar
shapes, although the females are roughly 20
lbs lighter (as a group).
© 2008 Brooks/Cole, a division of39
Comparative Boxplot Example
Male
Female
50
40
30
20
Gender
Age
Boxplots of Age by Gender
(means are indicated by solid circles)red
© 2008 Brooks/Cole, a division of40
Interpreting Variability
Chebyshev’s Rule
2
1
100 1 %
k
 
− 
 
Chebyshev’s Rule
Consider any number k, where
k ≥ 1. Then the percentage of
observations that are within k standard
deviations of the mean is at least
.
2
1
100 1 %
k
 
− 
 
Chebyshev’s Rule
Consider any number k, where
k ≥ 1. Then the percentage of
observations that are within k standard
deviations of the mean is at least
.
© 2008 Brooks/Cole, a division of41
For specific values of k Chebyshev’s Rule
reads
At least 75% of the observations are within 2
standard deviations of the mean.
 At least 89% of the observations are within 3
standard deviations of the mean.
 At least 90% of the observations are within 3.16
standard deviations of the mean.
 At least 94% of the observations are within 4
standard deviations of the mean.
 At least 96% of the observations are within 5
standard deviations of the mean.
 At least 99% of the observations are with 10
standard deviations of the mean.
Interpreting Variability
Chebyshev’s Rule
© 2008 Brooks/Cole, a division of42
Consider the student age data
Example - Chebyshev’s Rule
17 18 18 18 18 18 19 19 19 19
19 19 19 19 19 19 19 19 19 19
19 19 19 19 19 19 20 20 20 20
20 20 20 20 20 20 21 21 21 21
21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 24 24 24
25 26 28 28 30 37 38 44 47
Color code: within 1 standard deviation of the mean
within 2 standard deviations of the mean
within 3 standard deviations of the mean
within 4 standard deviations of the mean
within 5 standard deviations of the mean
© 2008 Brooks/Cole, a division of43
Summarizing the student age data
Example - Chebyshev’s Rule
Interval Chebyshev’s Actual
within 1 standard
deviation of the mean
≥ 0% 72/79 = 91.1%
within 2 standard
deviations of the mean
≥ 75% 75/79 = 94.9%
within 3 standard
deviations of the mean
≥ 88.8% 76/79 = 96.2%
within 4 standard
deviations of the mean
≥ 93.8% 77/79 = 97.5%
within 5 standard
deviations of the mean
≥ 96.0% 79/79 = 100%
Notice that Chebyshev gives very conservative lower bounds
and the values aren’t very close to the actual percentages.
© 2008 Brooks/Cole, a division of44
Empirical Rule
If the histogram of values in a data set is
reasonably symmetric and unimodal
(specifically, is reasonably approximated by a
normal curve), then
1. Approximately 68% of the observations
are within 1 standard deviation of the
mean.
2. Approximately 95% of the observations
are within 2 standard deviation of the
mean.
3. Approximately 99.7% of the observations
are within 3 standard deviation of the
mean.
© 2008 Brooks/Cole, a division of45
Z Scores
The z score is how many standard
deviations the observation is from the
mean.
A positive z score indicates the observation
is above the mean and a negative z score
indicates the observation is below the
mean.
The z score corresponding to a particular
observation in a data set is
observation meanzscore
standard deviation
−=
The z score corresponding to a particular
observation in a data set is
observation meanzscore
standard deviation
−=
© 2008 Brooks/Cole, a division of46
Computing the z score is often referred to
as standardization and the z score is
called a standardized score.
Z Scores
x xz score s
−=
The formula used with sample data is
x xz score s
−=
The formula used with sample data is
© 2008 Brooks/Cole, a division of47
Example
A sample of GPAs of 38 statistics students
appear below (sorted in increasing order)
2.00 2.25 2.36 2.37 2.50 2.50 2.60
2.67 2.70 2.70 2.75 2.78 2.80 2.80
2.82 2.90 2.90 3.00 3.02 3.07 3.15
3.20 3.20 3.20 3.23 3.29 3.30 3.30
3.42 3.46 3.48 3.50 3.50 3.58
3.75 3.80 3.83 3.97
x 3.0434 and s 0.4720= =
© 2008 Brooks/Cole, a division of48
Example
The following stem and leaf
indicates that the GPA data is
reasonably symmetric and unimodal.
2 0
2 233
2 55
2 667777
2 88899
3 0001
3 2222233
3 444555
3 7
3 889
Stem: Units digit
Leaf: Tenths digit
© 2008 Brooks/Cole, a division of49
Example
Using the formula we compute
the z scores and color code the values as we
did in an earlier example.
-2.21 -1.68 -1.45 -1.43 -1.15 -1.15
-0.94 -0.79 -0.73 -0.73 -0.62 -0.56
-0.52 -0.52 -0.47 -0.30 -0.30 -0.09
-0.05 0.06 0.23 0.33 0.33 0.33
0.40 0.52 0.54 0.54 0.80 0.88
0.93 0.97 0.97 1.14 1.50 1.60
1.67 1.96
x xz score s
−=Using the formula we compute
the z scores and color code the values as we
did in an earlier example.
-2.21 -1.68 -1.45 -1.43 -1.15 -1.15
-0.94 -0.79 -0.73 -0.73 -0.62 -0.56
-0.52 -0.52 -0.47 -0.30 -0.30 -0.09
-0.05 0.06 0.23 0.33 0.33 0.33
0.40 0.52 0.54 0.54 0.80 0.88
0.93 0.97 0.97 1.14 1.50 1.60
1.67 1.96
x xz score s
−=
© 2008 Brooks/Cole, a division of50
Example
Interval Empirical Rule Actual
within 1 standard
deviation of the mean
≈ 68% 27/38 = 71%
within 2 standard
deviations of the mean
≈ 95% 37/38 = 97%
within 3 standard
deviations of the mean
≈99.7% 38/38 = 100%
Notice that the empirical rule gives
reasonably good estimates for this
example.
© 2008 Brooks/Cole, a division of51
Comparison of Chebyshev’s
Rule and the Empirical Rule
The following refers to the
weights in the sample of
79 students. Notice that
the stem and leaf diagram
suggest the data
distribution is unimodal but
is positively skewed
because of the outliers on
the high side.
Nevertheless, the results
for the Empirical Rule are
good.
10 3
11 37
12 011444555
13 000000455589
14 000000000555
15 000000555567
16 000005558
17 0000005555
18 0358
19 5
20 00
21 0
22 55
23 79
Stem: Hundreds & tens digits
Leaf: Units digit
© 2008 Brooks/Cole, a division of52
Comparison of Chebyshev’ Rule and
the Empirical Rule
Interval
Chebyshev
’s Rule
Empiric
al Rule
Actual
within 1 standard
deviation of the
mean
≥ 0% ≈ 68% 56/79 = 70.9%
within 2 standard
deviations of the
mean
≥ 75% ≈ 95% 75/79 = 94.9%
within 3 standard
deviations of the
mean
≥ 88.8% ≈99.7% 79/79 = 100%
Notice that even with moderate positive skewing of the
data, the Empirical Rule gave a much more usable and
meaningful result.

Más contenido relacionado

La actualidad más candente

Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...MaxineBoyd
 
Hypergeometric distribution
Hypergeometric distributionHypergeometric distribution
Hypergeometric distributionmohammad nouman
 
06 ch ken black solution
06 ch ken black solution06 ch ken black solution
06 ch ken black solutionKrunal Shah
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questionsGeorgeGidudu
 
Linear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleLinear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleMarjan Sterjev
 
4 2 continuous probability distributionn
4 2 continuous probability    distributionn4 2 continuous probability    distributionn
4 2 continuous probability distributionnLama K Banna
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMichael Ogoy
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationjemille6
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distributionnumanmunir01
 
6334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-26334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-2jemille6
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statisticsTarun Gehlot
 
Solutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomerySolutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomeryByron CZ
 

La actualidad más candente (19)

Chapter3
Chapter3Chapter3
Chapter3
 
Chapter9
Chapter9Chapter9
Chapter9
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
 
Hypergeometric distribution
Hypergeometric distributionHypergeometric distribution
Hypergeometric distribution
 
06 ch ken black solution
06 ch ken black solution06 ch ken black solution
06 ch ken black solution
 
Poisson Probability Distributions
Poisson Probability DistributionsPoisson Probability Distributions
Poisson Probability Distributions
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questions
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
Linear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleLinear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation Example
 
4 2 continuous probability distributionn
4 2 continuous probability    distributionn4 2 continuous probability    distributionn
4 2 continuous probability distributionn
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random Variable
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Stats chapter 8
Stats chapter 8Stats chapter 8
Stats chapter 8
 
6334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-26334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-2
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
Input analysis
Input analysisInput analysis
Input analysis
 
Solutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomerySolutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. Montgomery
 

Similar a Chapter4

Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplotsLong Beach City College
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numericalJudianto Nugroho
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distributionNur Suaidah
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdfthaersyam
 
Measures of Variation (Ungrouped Data)
Measures of Variation (Ungrouped Data)Measures of Variation (Ungrouped Data)
Measures of Variation (Ungrouped Data)Zaira Mae
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1AbdelmonsifFadl
 
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Osama Yousaf
 
Estimation and confidence interval
Estimation and confidence intervalEstimation and confidence interval
Estimation and confidence intervalHomework Guru
 
SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docxagnesdcarey33086
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsLong Beach City College
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxanhlodge
 

Similar a Chapter4 (20)

Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Chapter04
Chapter04Chapter04
Chapter04
 
Chapter04
Chapter04Chapter04
Chapter04
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numerical
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distribution
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Measures of Variation (Ungrouped Data)
Measures of Variation (Ungrouped Data)Measures of Variation (Ungrouped Data)
Measures of Variation (Ungrouped Data)
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1
 
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
 
3.2 Measures of variation
3.2 Measures of variation3.2 Measures of variation
3.2 Measures of variation
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
 
Statistical ppt
Statistical pptStatistical ppt
Statistical ppt
 
Estimation and confidence interval
Estimation and confidence intervalEstimation and confidence interval
Estimation and confidence interval
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docx
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and Boxplots
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 

Más de Richard Ferreria (19)

Chapter6
Chapter6Chapter6
Chapter6
 
Chapter2
Chapter2Chapter2
Chapter2
 
Chapter1
Chapter1Chapter1
Chapter1
 
Chapter10
Chapter10Chapter10
Chapter10
 
Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)
 
Stats chapter 14
Stats chapter 14Stats chapter 14
Stats chapter 14
 
Stats chapter 15
Stats chapter 15Stats chapter 15
Stats chapter 15
 
Stats chapter 13
Stats chapter 13Stats chapter 13
Stats chapter 13
 
Stats chapter 12
Stats chapter 12Stats chapter 12
Stats chapter 12
 
Stats chapter 11
Stats chapter 11Stats chapter 11
Stats chapter 11
 
Stats chapter 11
Stats chapter 11Stats chapter 11
Stats chapter 11
 
Stats chapter 10
Stats chapter 10Stats chapter 10
Stats chapter 10
 
Stats chapter 9
Stats chapter 9Stats chapter 9
Stats chapter 9
 
Stats chapter 8
Stats chapter 8Stats chapter 8
Stats chapter 8
 
Stats chapter 7
Stats chapter 7Stats chapter 7
Stats chapter 7
 
Stats chapter 6
Stats chapter 6Stats chapter 6
Stats chapter 6
 
Podcasting and audio editing
Podcasting and audio editingPodcasting and audio editing
Podcasting and audio editing
 
Adding grades to your google site
Adding grades to your google siteAdding grades to your google site
Adding grades to your google site
 
Stats chapter 5
Stats chapter 5Stats chapter 5
Stats chapter 5
 

Último

Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineCeline George
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 

Último (20)

Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command Line
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 

Chapter4

  • 1. © 2008 Brooks/Cole, a division of1 Chapter 4 Numerical Methods for Describing Data
  • 2. © 2008 Brooks/Cole, a division of2 Describing the Center of a Data Set with the arithmetic mean 1 2 n sum of all observations in the sample x number of observations in the sample x x x x n n = + + + = = ∑L The sample mean of a numerical sample, x1, x2, x3,…, xn, denoted , is The sample mean of a numerical sample, x1, x2, x3,…, xn, denoted , is
  • 3. © 2008 Brooks/Cole, a division of3 Describing the Center of a Data Set with the arithmetic mean The population mean is denoted by µ, is the average of all x values in the entire population.
  • 4. © 2008 Brooks/Cole, a division of4 The “average” or mean price for this sample of 10 houses in Fancytown is $295,000 Example calculations During a two week period 10 houses were sold in Fancytown. x 2,950,000 x n 10 295,000 = = = ∑House Price in Fancytown x 231,000 313,000 299,000 312,000 285,000 317,000 294,000 297,000 315,000 287,000 2,950,000x =∑
  • 5. © 2008 Brooks/Cole, a division of5 Example calculations During a two week period 10 houses were sold in Lowtown. The “average” or mean price for this sample of 10 houses in Lowtown is $295,000 Outlier House Price in Lowtown x 97,000 93,000 110,000 121,000 113,000 95,000 100,000 122,000 99,000 2,000,000 2,950,000x =∑ x 2,950,000 x n 10 295,000 = = = ∑
  • 6. © 2008 Brooks/Cole, a division of6 Reflections on the Sample calculations Looking at the dotplots of the samples for Fancytown and Lowtown we can see that the mean, $295,000 appears to accurately represent the “center” of the data for Fancytown, but it is not representative of the Lowtown data. Clearly, the mean can be greatly affected by the presence of even a single outlier. Outlier
  • 7. © 2008 Brooks/Cole, a division of7 Comments 1. In the previous example of the house prices in the sample of 10 houses from Lowtown, the mean was affected very strongly by the one house with the extremely high price. 2. The other 9 houses had selling prices around $100,000. 3. This illustrates that the mean can be very sensitive to a few extreme values.
  • 8. © 2008 Brooks/Cole, a division of8 Describing the Center of a Data Set with the median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then the single middle value if n is odd sample median= the mean of the middle two values if n is even   
  • 9. © 2008 Brooks/Cole, a division of9 Consider the Fancytown data. First, we put the data in numerical increasing order to get 231,000 285,000 287,000 294,000 297,000 299,000 312,000 313,000 315,000 317,000 Since there are 10 (even) data values, the median is the mean of the two values in the middle. 297000 299000 median $298,000 2 + = = Example of Median Calculation
  • 10. © 2008 Brooks/Cole, a division of10 Example of Median Calculation Consider the Lowtown data. We put the data in numerical increasing order to get 93,000 95,000 97,000 99,000 100,000 110,000 113,000 121,000 122,000 2,000,000 Since there are 10 (even) data values, the median is the mean of the two values in the middle. 297000 299000 median $298,000 2 + = =
  • 11. © 2008 Brooks/Cole, a division of11 Comparing the Sample Mean & Sample Median
  • 12. © 2008 Brooks/Cole, a division of12 Comparing the Sample Mean & Sample Median
  • 13. © 2008 Brooks/Cole, a division of13 Comparing the Sample Mean & Sample Median Typically, 1. when a distribution is skewed positively, the mean is larger than the median, 2. when a distribution is skewed negatively, the mean is smaller then the median, and 3. when a distribution is symmetric, the mean and the median are equal. Notice from the preceding pictures that the median splits the area in the distribution in half and the mean is the point of balance.
  • 14. © 2008 Brooks/Cole, a division of14 The Trimmed Mean A trimmed mean is computed by first ordering the data values from smallest to largest, deleting a selected number of values from each end of the ordered list, and finally computing the mean of the remaining values. The trimming percentage is the percentage of values deleted from each end of the ordered list.
  • 15. © 2008 Brooks/Cole, a division of15 Example of Trimmed Mean House Price in Fancytown 231,000 285,000 287,000 294,000 297,000 299,000 312,000 313,000 315,000 317,000 2,950,000 291,000 295,000 300,250 2,402,000 Sum of the eight middle values is Divide this value by 8 to obtain the 10% trimmed mean. x =∑ x = median = 10% Trim Mean =
  • 16. © 2008 Brooks/Cole, a division of16 Example of Trimmed Mean House Price in Lowtown 93,000 95,000 857,000 97,000 99,000 100,000 110,000 113,000 121,000 122,000 2,000,000 2,950,000 295,000 105,000 107,125 Divide this value by 8 to obtain the 10% trimmed mean. Sum of the eight middle values is x =∑ x = median = 10% Trim Mean =
  • 17. © 2008 Brooks/Cole, a division of17 Another Example Here’s an example of what happens if you compute the mean, median, and 5% & 10% trimmed means for the Ages for the 79 students taking Data Analysis
  • 18. © 2008 Brooks/Cole, a division of18 Categorical Data - Sample Proportion The sample proportion of successes, denoted by p, is Where S is the label used for the response designated as success. The population proportion of successes is denoted by π. p sample proportion of successes number of S's in the sample n = = The sample proportion of successes, denoted by p, is Where S is the label used for the response designated as success. The population proportion of successes is denoted by π. p sample proportion of successes number of S's in the sample n = =
  • 19. © 2008 Brooks/Cole, a division of19 If we look at the student data sample, consider the variable gender and treat being female as a success, we have 25 of the sample of 79 students are female, so the sample proportion (of females) is Categorical Data - Sample Proportion 25p 0.316 79 = =
  • 20. © 2008 Brooks/Cole, a division of20 Describing Variability The simplest numerical measure of the variability of a numerical data set is the range, which is defined to be the difference between the largest and smallest data values. range = maximum - minimum
  • 21. © 2008 Brooks/Cole, a division of21 Describing Variability The n deviations from the sample mean are the differences: Note: The sum of all of the deviations from the sample mean will be equal to 0, except possibly for the effects of rounding the numbers. This means that the average deviation from the mean is always 0 and cannot be used as a measure of variability. 1 2 2 nx x,x x, x x, ,x x,− − − −K
  • 22. © 2008 Brooks/Cole, a division of22 Sample Variance The sample variance, denoted s2 is the sum of the squared deviations from the mean divided by n-1. 2 2 xx (x x) S s n 1 n 1 − = = − − ∑ ( ) 2 2 2 xx 1 2 n 2 S (x x) (x x) (x x) x x = − + − + + − = −∑ KNote: ( ) 2 2 2 xx 1 2 n 2 S (x x) (x x) (x x) x x = − + − + + − = −∑ KNote:
  • 23. © 2008 Brooks/Cole, a division of23 Sample Standard Deviation The sample standard deviation, denoted s is the positive square root of the sample variance. The population standard deviation is denoted by σ. 2 2 xx (x x) S s s n 1 n 1 − = = = − − ∑
  • 24. © 2008 Brooks/Cole, a division of24 Example calculations 10 Macintosh Apples were randomly selected and weighed (in ounces). Range 8.48 6.24 2.24= − = x 74.24 x 7.424 n 10 = = = ∑ 2 2 (x x) 5.5398 s n 1 10 1 5.5398 0.61554 9 − = = − − = = ∑ s= 0.61554 0.78456= x 7.52 0.096 0.0092 8.48 1.056 1.1151 7.36 -0.064 0.0041 6.24 -1.184 1.4019 7.68 0.256 0.0655 6.56 -0.864 0.7465 6.40 -1.024 1.0486 8.16 0.736 0.5417 7.68 0.256 0.0655 8.16 0.736 0.5417 74.24 0.000 5.5398 x x− 2 (x x)−
  • 25. © 2008 Brooks/Cole, a division of25 Calculator Formula for s2 and s A computational formula for the sample variance is given by A little algebra can establish the sum of the square deviations, ( ) 2 2 2 xx x S (x x) x n = − = − ∑∑ ∑ ( ) 2 2 2 xx x xS ns n 1 n 1 − = = − − ∑∑
  • 26. © 2008 Brooks/Cole, a division of26 Calculations Revisited The values for s2 and s are exactly the same as were obtained earlier. x x2 7.52 56.5504 8.48 71.9104 7.36 54.1696 6.24 38.9376 7.68 58.9824 6.56 43.0336 6.40 40.9600 8.16 66.5856 7.68 58.9824 8.16 66.5856 74.24 556.6976 2 n 10, x 74.24, x 556.6976= = =∑ ∑ ( ) 2 2 xx 2 x S x n (74.24) 556.6976 5.53984 10 = − = − = ∑∑ 2 5.53984 s 0.615538 9 = = s 0.615538 0.78456= =
  • 27. © 2008 Brooks/Cole, a division of27 Quartiles and the Interquartile Range Lower quartile (Q1) = median of the lower half of the data set. Upper Quartile (Q3) = median of the upper half of the data set. Note: If n is odd, the median is excluded from both the lower and upper halves of the data. The interquartile range (iqr), a resistant measure of variability is given by iqr = upper quartile – lower quartile = Q3 – Q1
  • 28. © 2008 Brooks/Cole, a division of28 Quartiles and IQR Example 15 students with part time jobs were randomly selected and the number of hours worked last week was recorded. 2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25 19, 12, 14, 10, 12, 10, 25, 9, 8, 4, 2, 10, 7, 11, 15 The data is put in increasing order to get
  • 29. © 2008 Brooks/Cole, a division of29 With 15 data values, the median is the 8th value. Specifically, the median is 10. 2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25 Median Lower Half Upper Half Lower quartile Q1 Upper quartile Q3 Lower quartile = 8 Upper quartile = 14 Iqr = 14 - 8 = 6 Quartiles and IQR Example
  • 30. © 2008 Brooks/Cole, a division of30 Boxplots Constructing a Skeletal Boxplot 1. Draw a horizontal (or vertical) scale. 2. Construct a rectangular box whose left (or lower) edge is at the lower quartile and whose right (or upper) edge is at the upper quartile (the box width = iqr). Draw a vertical (or horizontal) line segment inside the box at the location of the median. 3. Extend horizontal (or vertical) line segments from each end of the box to the smallest and largest observations in the data set. (These lines are called whiskers.)
  • 31. © 2008 Brooks/Cole, a division of31 Skeletal Boxplot Example Using the student work hours data we have 0 5 10 15 20 25
  • 32. © 2008 Brooks/Cole, a division of32 Outliers An observations is an outlier if it is more than 1.5 iqr away from the closest end of the box (less than the lower quartile minus 1.5 iqr or more than the upper quartile plus 1.5 iqr. An outlier is extreme if it is more than 3 iqr from the closest end of the box, and it is mild otherwise.
  • 33. © 2008 Brooks/Cole, a division of33 Modified Boxplots A modified boxplot represents mild outliers by shaded circles and extreme outliers by open circles. Whiskers extend on each end to the most extreme observations that are not outliers.
  • 34. © 2008 Brooks/Cole, a division of34 Modified Boxplot Example Using the student work hours data we have 0 5 10 15 20 25 Lower quartile + 1.5 iqr = 14 - 1.5(6) = -1 Upper quartile + 1.5 iqr = 14 + 1.5(6) = 23 Smallest data value that isn’t an outlier Largest data value that isn’t an outlier Upper quartile + 3 iqr = 14 + 3(6) = 32 Mild Outlier
  • 35. © 2008 Brooks/Cole, a division of35 Modified Boxplot Example Consider the ages of the 79 students from the classroom data set from the slideshow Chapter 3. Iqr = 22 – 19 = 3 17 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 25 26 28 28 30 37 38 44 47 Median Lower Quartile Upper Quartile Moderate Outliers Extreme Outliers Lower quartile – 3 iqr = 10 Lower quartile – 1.5 iqr =14.5 Upper quartile + 3 iqr = 31 Upper quartile + 1.5 iqr = 26.5
  • 36. © 2008 Brooks/Cole, a division of36 Smallest data value that isn’t an outlier Largest data value that isn’t an outlier Mild Outliers Extreme Outliers 15 20 25 30 35 40 45 50 Modified Boxplot Example Here is the modified boxplot for the student age data.
  • 37. © 2008 Brooks/Cole, a division of37 Modified Boxplot Example 50 45 40 35 30 25 20 15 Here is the same boxplot reproduced with a vertical orientation.
  • 38. © 2008 Brooks/Cole, a division of38 Comparative Boxplot Example 100 120 140 160 180 200 220 240 Females MalesG e n d e r Student Weight By putting boxplots of two separate groups or subgroups we can compare their distributional behaviors. Notice that the distributional pattern of female and male student weights have similar shapes, although the females are roughly 20 lbs lighter (as a group).
  • 39. © 2008 Brooks/Cole, a division of39 Comparative Boxplot Example Male Female 50 40 30 20 Gender Age Boxplots of Age by Gender (means are indicated by solid circles)red
  • 40. © 2008 Brooks/Cole, a division of40 Interpreting Variability Chebyshev’s Rule 2 1 100 1 % k   −    Chebyshev’s Rule Consider any number k, where k ≥ 1. Then the percentage of observations that are within k standard deviations of the mean is at least . 2 1 100 1 % k   −    Chebyshev’s Rule Consider any number k, where k ≥ 1. Then the percentage of observations that are within k standard deviations of the mean is at least .
  • 41. © 2008 Brooks/Cole, a division of41 For specific values of k Chebyshev’s Rule reads At least 75% of the observations are within 2 standard deviations of the mean.  At least 89% of the observations are within 3 standard deviations of the mean.  At least 90% of the observations are within 3.16 standard deviations of the mean.  At least 94% of the observations are within 4 standard deviations of the mean.  At least 96% of the observations are within 5 standard deviations of the mean.  At least 99% of the observations are with 10 standard deviations of the mean. Interpreting Variability Chebyshev’s Rule
  • 42. © 2008 Brooks/Cole, a division of42 Consider the student age data Example - Chebyshev’s Rule 17 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 25 26 28 28 30 37 38 44 47 Color code: within 1 standard deviation of the mean within 2 standard deviations of the mean within 3 standard deviations of the mean within 4 standard deviations of the mean within 5 standard deviations of the mean
  • 43. © 2008 Brooks/Cole, a division of43 Summarizing the student age data Example - Chebyshev’s Rule Interval Chebyshev’s Actual within 1 standard deviation of the mean ≥ 0% 72/79 = 91.1% within 2 standard deviations of the mean ≥ 75% 75/79 = 94.9% within 3 standard deviations of the mean ≥ 88.8% 76/79 = 96.2% within 4 standard deviations of the mean ≥ 93.8% 77/79 = 97.5% within 5 standard deviations of the mean ≥ 96.0% 79/79 = 100% Notice that Chebyshev gives very conservative lower bounds and the values aren’t very close to the actual percentages.
  • 44. © 2008 Brooks/Cole, a division of44 Empirical Rule If the histogram of values in a data set is reasonably symmetric and unimodal (specifically, is reasonably approximated by a normal curve), then 1. Approximately 68% of the observations are within 1 standard deviation of the mean. 2. Approximately 95% of the observations are within 2 standard deviation of the mean. 3. Approximately 99.7% of the observations are within 3 standard deviation of the mean.
  • 45. © 2008 Brooks/Cole, a division of45 Z Scores The z score is how many standard deviations the observation is from the mean. A positive z score indicates the observation is above the mean and a negative z score indicates the observation is below the mean. The z score corresponding to a particular observation in a data set is observation meanzscore standard deviation −= The z score corresponding to a particular observation in a data set is observation meanzscore standard deviation −=
  • 46. © 2008 Brooks/Cole, a division of46 Computing the z score is often referred to as standardization and the z score is called a standardized score. Z Scores x xz score s −= The formula used with sample data is x xz score s −= The formula used with sample data is
  • 47. © 2008 Brooks/Cole, a division of47 Example A sample of GPAs of 38 statistics students appear below (sorted in increasing order) 2.00 2.25 2.36 2.37 2.50 2.50 2.60 2.67 2.70 2.70 2.75 2.78 2.80 2.80 2.82 2.90 2.90 3.00 3.02 3.07 3.15 3.20 3.20 3.20 3.23 3.29 3.30 3.30 3.42 3.46 3.48 3.50 3.50 3.58 3.75 3.80 3.83 3.97 x 3.0434 and s 0.4720= =
  • 48. © 2008 Brooks/Cole, a division of48 Example The following stem and leaf indicates that the GPA data is reasonably symmetric and unimodal. 2 0 2 233 2 55 2 667777 2 88899 3 0001 3 2222233 3 444555 3 7 3 889 Stem: Units digit Leaf: Tenths digit
  • 49. © 2008 Brooks/Cole, a division of49 Example Using the formula we compute the z scores and color code the values as we did in an earlier example. -2.21 -1.68 -1.45 -1.43 -1.15 -1.15 -0.94 -0.79 -0.73 -0.73 -0.62 -0.56 -0.52 -0.52 -0.47 -0.30 -0.30 -0.09 -0.05 0.06 0.23 0.33 0.33 0.33 0.40 0.52 0.54 0.54 0.80 0.88 0.93 0.97 0.97 1.14 1.50 1.60 1.67 1.96 x xz score s −=Using the formula we compute the z scores and color code the values as we did in an earlier example. -2.21 -1.68 -1.45 -1.43 -1.15 -1.15 -0.94 -0.79 -0.73 -0.73 -0.62 -0.56 -0.52 -0.52 -0.47 -0.30 -0.30 -0.09 -0.05 0.06 0.23 0.33 0.33 0.33 0.40 0.52 0.54 0.54 0.80 0.88 0.93 0.97 0.97 1.14 1.50 1.60 1.67 1.96 x xz score s −=
  • 50. © 2008 Brooks/Cole, a division of50 Example Interval Empirical Rule Actual within 1 standard deviation of the mean ≈ 68% 27/38 = 71% within 2 standard deviations of the mean ≈ 95% 37/38 = 97% within 3 standard deviations of the mean ≈99.7% 38/38 = 100% Notice that the empirical rule gives reasonably good estimates for this example.
  • 51. © 2008 Brooks/Cole, a division of51 Comparison of Chebyshev’s Rule and the Empirical Rule The following refers to the weights in the sample of 79 students. Notice that the stem and leaf diagram suggest the data distribution is unimodal but is positively skewed because of the outliers on the high side. Nevertheless, the results for the Empirical Rule are good. 10 3 11 37 12 011444555 13 000000455589 14 000000000555 15 000000555567 16 000005558 17 0000005555 18 0358 19 5 20 00 21 0 22 55 23 79 Stem: Hundreds & tens digits Leaf: Units digit
  • 52. © 2008 Brooks/Cole, a division of52 Comparison of Chebyshev’ Rule and the Empirical Rule Interval Chebyshev ’s Rule Empiric al Rule Actual within 1 standard deviation of the mean ≥ 0% ≈ 68% 56/79 = 70.9% within 2 standard deviations of the mean ≥ 75% ≈ 95% 75/79 = 94.9% within 3 standard deviations of the mean ≥ 88.8% ≈99.7% 79/79 = 100% Notice that even with moderate positive skewing of the data, the Empirical Rule gave a much more usable and meaningful result.