The document discusses the 5-number summary, which includes the minimum, first quartile, median, third quartile, and maximum values of a dataset. It explains how to calculate these values and use them to construct a boxplot, which visually displays the spread and outliers of a dataset. An outlier is defined using Tukey's rule as a data point more than 1.5 times the interquartile range below the first quartile or above the third quartile.
2. The 5-number summary
• An example 5-number summary:
5 12 14 17 21
• 5 is the minimum value in the set
• 12 is the first quartile
• 14 is the median
• 17 is the third quartile
• 21 is the maximum value
9/4/2011 Slide 2
3. The 5-number summary
• Quartiles?
– The 1st quartile is the point at which 25% of
the data is below and 75% above.
– The 2nd quartile (the MEDIAN) is the point at
which 50% of the data is below and 50%
above.
– The 3rd quartile is the point at which 75% of
the data is below and 25% above
9/4/2011 Slide 3
4. The 5-number summary
• Back to the example
5 12 14 17 21
• So, if these are a scores on a 22 point quiz from a
class…
– The lowest score in the class was 5 points
– 25% of students earned 12 or fewer points (75% earned 12 or
more)
– 50% of students earned 14 or fewer points (50% earned 14 or
more) – the median
– 75% of students earned 17 or fewer points (25% earned 17 or
more)
– The highest score in the class was 21 points
9/4/2011 Slide 4
5. The 5-number summary
• Finding the minimum and maximum is
pretty simple
• Finding the median was discussed in
lesson 1.6
• So to find the quartiles…
9/4/2011 Slide 5
6. The 5-number summary
• Finding the quartiles
– To find the 1st quartile, simply find the median
of the lower half of the data (the lower half of
the data does NOT include the median of the
data).
– To find the 3rd quartile, simply find the median
of the upper half of the data (the upper half of
the data does NOT include the median of the
data).
9/4/2011 Slide 6
7. The 5-number summary
2
• Example of finding
3 Median of lower
the quartiles Lower Half of Data 3 half is 1st
quartile = 3
5
Median is (8+9)/2 or 8.5 8
9
9
Median of
Upper Half of Data 12 upper half is 3rd
quartile = 12
13
13
9/4/2011 Slide 7
8. The 5-number summary
• Find the 5-number 17.1 2.1
summary of the 5.8 2.0
following data (which
are salaries (in 5.0 1.0
millions) of an NBA 4.5 1.0
team:
4.3 0.8
4.2 0.7
• Go to the next slide to
check your work. 3.1 0.3
9/4/2011 Slide 8
9. The 5-number summary
• Your answer should be:
0.3 1.0 2.6 4.5 17.1
• Another measure of spread is found from the 5-number
summary, the interquartile range (or IQR).
• The IQR is simply the 3rd quartile (Q3) minus the 1st
quartile (Q1).
• So, IQR = Q3 – Q1
• This is simply a variation on the definition of range.
9/4/2011 Slide 9
10. Outliers and extreme values
• The 17.1 million dollar salary is quite high.
• Is it an outlier among the data?
• Tukey’s rule: A data point is an outlier if it
falls more than 1.5 IQR below the 1st
quartile OR 1.5 IQR above the 3rd quartile.
9/4/2011 Slide 10
11. Outliers and extreme values
• Recall the summary: 0.3 1.0 2.6 4.5 17.1
• The IQR = 4.5 – 1.0 = 3.5
• Check for the high outlier:
– Is 17.1 more than 1.5IQR above the 3rd quartile (4.5)?
– Is 17.1 > 1.5(3.5) + 4.5 ?
– Is 17.1 > 9.75 ?
– Yes, so the $17.1 million salary is an outlier on the
team’s payroll.
9/4/2011 Slide 11
12. Boxplots
• The boxplot displays the 5-number
summary: minimum, lower quartile (Q1),
median upper quartile (Q3), maximum.
• It also shows the Inter-quartile range (IQR)
and outliers.
• It also gives us information about the
symmetry of the distribution.
9/4/2011 Slide 12
13. Example Boxplot
The largest
observation
(max) is
approximately
4.
The smallest
observation (min)
is approximately
0.
The first The second The third quartile
quartile (Q1) is quartile (Q2) is (Q3) is approximately
approximately approximately 2.7.
9/4/2011 1. 1.6. Slide 13
14. Example Boxplot (modified)
Outliers show up as
circles. In this case, it
is now the max.
This is the largest
observation that is
NOT an out outlier.
9/4/2011 Slide 14
15. Boxplots
• Example
– Verbal GMAT scores of 12 students: 10, 22, 24, 27,
31, 33, 39, 40, 42, 43, 44, 45
– The 5-number summary is:
10 25.5 36 42.5 45
– Now the box-plot is constructed as follows:
• The line inside the box indicates the median.
• The left side of this box indicates the lower quartile (Q1).
• The right side of this box indicates the upper quartile (Q3).
• A straight line is then drawn from the lowest value of this
distribution to the box (at Q1) and another straight line from
the box (at Q3) to the highest value of this distribution.
9/4/2011 Slide 15
17. Modified Boxplot
• Consider the 5-number summary of NBA
salaries: 0.3 1.0 2.6 4.5 17.1
• The modified boxplot shows outliers
separately.
• On the next slide, note how the outlier
17.1 is plotted separately.
• The line only goes to the maximum (or
minimum) point that is NOT an outlier.
9/4/2011 Slide 17
19. Using Technology to Make
Boxplots
• Use StatCrunch
• On your calculator:
– To create a boxplot:
• Enter your list of data.
• Now select STAT PLOT by pressing the 2nd key followed by
the key labeled Y=.
• Press the ENTER key, then select the option of ON, and select
the boxplot type that shows outliers (as dots)
• The Xlist should indicate the name of your list (L1 or other),
and the Freq value should be set to 1.
• Now press the ZOOM key and select option 9 (ZoomStat).
• Press the ENTER key and the boxplot should be displayed.
• You can use the TRACE key along with the arrow keys in order
to read the values of the five number summary (Min, Q1, Med,
Q3, Max).
9/4/2011 Slide 19
20. The 5-number summary, boxplots
and outliers
• This concludes the presentation.
9/4/2011 Slide 20