2. What you’ll learn
To create and interpret the following
graphs:
Dotplot
Stem and leaf
Regular Stem and Leaf
Split Stem and Leaf
Back-to-Back Stem and Leaf
Histogram
Time Plot
Ogive
3. To learn how to display and describe quantitative data we
will be using some baseball statistics. The following table
shows the number of home runs in a single season for
three well-known baseball players: Hank Aaron, Barry
Bonds, and Babe Ruth.
Hank Aaron Barry Bonds Babe Ruth
13 32 16 40 54 46
27 44 25 37 59 41
26 39 24 34 35 34
44 29 19 49 41 22
30 44 33 73 46
39 38 25 25
40 47 34 47
34 34 46 60
45 40 37 54
44 20 33 46
24 42 49
4. Dotplot
Label the horizontal axis with the name of the
variable and title the graph
Scale the axis based on the values of the
variable
Mark a dot (we’ll use x’s) above the number on
the axis corresponding to each data value
Number of Hom Runs in a Single Season
e Dot Plot
20 25 30 35 40 45 50 55 60
Ruth
5. Describing a Distribution
We describe a distribution (the values the
variable takes on and how often it takes
these values) using the acronym SOCS
Shape– We describe the shape of a distribution in
one of two ways:
Symmetric/Approx. Symmetric
Collection 1 Dot Plot Shape Dot Plot
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 4
Sym etric
m Uniform
6. Skewed
Right Left
Shape Dot Plot Shape Dot Plot
“tail” “tail”
-4 -3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
RightSkew ed LeftSkew ed
Notice that the direction of the “skew” is the same
direction as the “tail”
7. •Outliers: These are observations that we
would consider “unusual”. Pieces of data
that don’t “fit” the overall pattern of the data.
Babe Ruth had two seasons Number of Home Runs in a Single Season Dot Plot
that appear to be somewhat
different than the rest of his Unusual observation???
career. These may be
“outliers”
20 25 30 35 40 45 50 55 60 65
(We’ll learn a numerical way to Ruth
determine if observations are
truly “unusual” later)
Number of Home Runs in a Single Season Dot Plot
Unusual observation???
The season in which Barry
Bonds hit 73 home runs
does not appear to fit the
overall pattern. This piece 10 20 30 40 50 60 70 80
of data may be an outlier. Bonds
8. Center: A single value that describes the entire
distribution. A “typical” value that gives a concise
summary of the whole batch of numbers.
Number of Home Runs in a Single Season Dot Plot
20 25 30 35 40 45 50 55 60 65
Ruth
A typical season for Babe Ruth appears to be
approximately 46 home runs
*We’ll learn about three different numerical measures of center in the next
section
9. Spread: Since we know Number of Home Runs in a Single Season Dot Plot
that not everyone is
typical, we need to also
talk about the variation of
a distribution. We need
to discuss if the values of
the distribution are tightly
20 25 30 35 40 45 50 55 60 65
clustered around the Ruth
center making it easy to
predict or do the values
Babe Ruth’s number of home runs in a
vary a great deal from the single season varies from a low of 23 to
center making prediction a high of 60.
more difficult?
*We’ll learn about three different numerical measures of spread in the next
section.
10. Distribution Description using
SOCS
The distribution of Babe Ruth’s number of home
runs in a single season is approximately
symmetric1 with two possible unusual
observations at 23 and 25 home runs .2 He
typically hits about 463 home runs in a season.
Over his career, the number of home runs has
varied from a low of 23 to a high of 60. 4
1-Shape 2-Outliers
3-Center 4-Spread
11. Stem and Leaf Plot
Creating a stem and leaf plot
Number of Home Runs in a
Single Season
Order the data points from
least to greatest
Separate each observation Hank Aaron
into a stem (all but the 1 3
rightmost digit) and a leaf (the
final digit)—Ex. 123-> 12 2 04679
(stem): 3 (leaf)
In a T-chart, write the stems 3 0244899
vertically in increasing order on
the left side of the chart. 4 00444457
On the right side of the chart
write each leaf to the right of
its stem, spacing the leaves Key
equally
Include a key and title for the
graph 4 6 = 46
12. Split Stem and Leaf Plot
If the data in a distribution is concentrated in just
a few stems, the picture may be more
descriptive if we “split” the stems
When we “split” stems we want the same
number of digits to be possible in each stem.
This means that each original stem can be split
into 2 or 5 new stems.
A good rule of thumb is to have a minimum of 5
stems overall
Let’s look at how splitting stems changes the
look of the distribution of Hank Aaron’s home
run data.
13. Split each stem into 2 Number of Home Runs in a
new stems. This Single Season
means that the first
Hank Aaron
stem includes the
1 3
leaves 0-4 and the
1
second stem has the
2 04
leaves 5-9
2 679
Splitting the stems 3 0244
helps us to “see” the 3 899
shape of the 4 004444
distribution in this 4 57 Key
case.
4 6 = 46
14. Back-to-Back Stem and Leaf
Number of Home Runs in a Single
Back-to-Back stem Season
and leaf plots allow Aaron
3 1
Ruth
us to quickly 1
compare two 40 2 2
distributions. 976 2 5
4420 3 4
998 3 5
Use SOCS to 444400 4 11
make comparisons 75 4 66679
between 5 449
distributions 5 Key
6 0
4 6 = 46
15. Advantages and Disadvantages of
dotplots/stem and leaf plots
Advantages Disadvantages
Preserves each piece
If creating by hand,
of data large data sets can be
cumbersome
Shows features of the
distribution with Data that is widely
regards to shape— varied may be difficult
such as clusters, gaps, to graph
outliers, etc
16. Histograms
A histogram is one of the most common graphs
used for quantitative variables.
Although a histogram looks like a bar chart
there are some important differences
In a histogram, the “bars” touch each other
Histograms do not necessarily preserve individual
data pieces
Changing the “scale” or “bin width” can drastically
alter the picture of the distribution, so caution must
be used when describing a distribution when only a
histogram has been used
17. Creating a histogram
Divide the range of Barry Bonds:
data into classes of Data Ranges from 16
equal width. Count to 73, so we choose
the number of for our classes
observations in each
class. (Remember 15 ≤ # of HR ≤ 19
.
that the width is .
.
somewhat arbitrary 70 ≤ # of HR ≤ 75
and you might choose We can then
a different width than determine the counts
someone else) for each “bin”
18. So the frequency The horizontal axis
distribution looks like: represents the
variable values, so
Class Frequency
using the lower bound
15-19 2 of each class to scale
20-24 1 is appropriate.
25-29 2 The vertical axis can
30-34 4
represent
35-39 2
Frequency
40-44 2
45-49 2
Relative frequency
50-54 0
Cumulative frequency
55-59 0 Relative cumulative
60-64 0 frequency
65-69 0 We’ll use frequency
70-74 1
19. Label and scale your axes. Title your graph
Draw a bar that represents the frequency for
each class. Remember that the bars of the
histograms should touch each other.
Barry Bonds Histogram
7
6
5
Count
4
3
2
1
10 20 30 40 50 60 70 80 90
HomeRuns
20. Interpretation
We interpret a histogram in the same way
we interpret a dotplot or stem and leaf
plot.
ALWAYS use
SOCS
Shape Outliers
Center Spread
21. Time Plots
Sometimes, our data is collected at
intervals over time and we are looking for
changes or patterns that have occurred.
We use a time plot for this type of data
A time plot uses both the horizontal and
vertical axes.
The horizontal axis represents the time
intervals
The vertical axis represents the variable
values
22. Creating a Time Plot
Barry Bonds Line Scatter Plot
Label and scale the
80
70
axes. Title your 60
BondsHR
50
graph. 40
30
Plot a point 20
10
corresponding to the 1986 1990 1994
Year
1998 2002
data taken at each
time interval Year
1986
HR
16
Year
1994
HR
37
A line segment drawn 1987 25 1995 33
between each point 1988
1989
24
19
1996
1997
42
40
may be helpful to see 1990 33 1998 37
patterns in the data 1991
1992
25
34
1999
2000
34
49
1993 46 2001 73
23. Describing Time Plots
When describing time Barry Bonds Line Scatter Plot
plots, you should look for 80
trends in the data 70
60
Although the number of
BondsHR
50
home runs do not show a 40
constant increase from 30
year to year we note that 20
overall, the number of 10
home runs made by 1986 1990 1994
Year
1998 2002
Barry Bond has increased
over time with the most
notable increase being
between 1999 and 2001.
24. Relative frequency, Cumulative
frequency, Percentiles, and Ogives
Sometimes we are interested in describing
the relative position of an observation
For example: you have no doubtably
been told at one time or another that you
scored at the 80th percentile. This means
that 80% of the people taking the test
score the same or lower than you did.
How can we model this?
25. Ogive
(Relative cumulative frequency graph)
We first start # of home Relative
runs in a Relative Cumulative Cumulative
by creating a season Frequency Frequency Frequency Frequency
frequency 15-19 2 0.125 2 0.125
20-24 1 0.0625 3 0.1875
table 25-29 2 0.125 5 0.3125
We’ll look at 30-34 4 0.25 9 0.5625
35-39 2 0.125 11 0.6875
how each 40-44 2 0.125 13 0.8125
column is 45-49 2 0.125 15 0.9375
created in the 50-54 0 0 15 0.9375
next few 55-59 0 0 15 0.9375
60-64 0 0 15 0.9375
slides 65-69 0 0 15 0.9375
70-74 1 0.0625 16 1
26. Relative Frequency
The # of home runs… and # of home *
the frequency are the same runs in a
season Frequency
Relative
Frequency
columns as we created for 15-19 2 0.125
the histogram. 20-24
25-29
1
2
0.0625
0.125
To find the values for the 30-34 4 0.25
35-39 2 0.125
“Relative Frequency” 40-44 2 0.125
column find the following: 45-49 2 0.125
50-54 0 0
Frequency Value
55-59 0 0
Total # of = Relative Frequency 60-64 0 0
observations 65-69 0 0
70-74 1 0.0625
* Within rounding, this column should equal 1
27. Cumulative Frequency
Cumulative frequency # of home
simply adds the runs in a Relative Cumulative
counts in the season
15-19
Frequency
2
Frequency
0.125
Frequency
2
frequency column that 20-24 1 0.0625 3
fall in or below the 25-29 2 0.125 5
current class level. 30-34
35-39
4
2
0.25
0.125
9
11
For Example: to find 40-44 2 0.125 13
the “13”, add the 45-49
50-54
2
0
0.125
0
15
15
frequencies in the 55-59 0 0 15
oval: 60-64 0 0 15
2+1+2+4+2+2=13 65-69
70-74
0
1
0
0.0625
15
16
28. Relative Cumulative Frequency
Relative cumulative # of
ho
m
frequency divides the e
runs in a Relative Cumulative
Relative
Cumulative
cumulative frequency season Frequency Frequency Frequency Frequency
15-19 2 0.125 2 0.125
by the total number of 20-24 1 0.0625 3 0.1875
observations 25-29
30-34
2
4
0.125
0.25
5
9
0.3125
0.5625
35-39 2 0.125 11 0.6875
40-44 2 0.125 13 0.8125
45-49 2 0.125 15 0.9375
For Example: 50-54 0 0 15 0.9375
55-59 0 0 15 0.9375
.8125 = 13/16 60-64 0 0 15 0.9375
65-69 0 0 15 0.9375
70-74 1 0.0625 16 1
Sum 16 1
29. Creating the Ogive
Label and scale the axes
Horizontal: Variable
Vertical: Relative Cumulative Frequency
(percentile)
Plot a point corresponding to the relative
cumulative frequency in each class interval at
the left endpoint of the next class interval
The last point you should plot should be at a
height of 100%
30. # of home Relative
runs in a Cumulative Barry Bonds Scatter Plot
season Frequency *
15-19 0.125 1.2
20-24 0.1875 1.0
Relcumfreq
25-29 0.3125
0.8
30-34 0.5625
35-39 0.6875 0.6
40-44 0.8125 0.4
45-49 0.9375
0.2
50-54 0.9375
55-59 0.9375
0.0
60-64 0.9375
65-69 0.9375
10 20 30 40 50 60 70 80
70-74 1 HR
A line segment from point to point can be added for
analysis
31. Types of Info from Ogives
Finding an individual observation within the
distribution
Find the relative standing of a season in which
Barry Bonds hit 40 home runs
Barry Bonds Scatter Plot
Relcumfreq
1.2
1.0
0.8
0.6
0.4
0.2
0.0
10 20 30 40 50 60 70 80
HR
A season with 40 home runs lies at the 60th percentile, meaning that
approximately 60% of his seasons had 40 or less home runs
32. Locating an observation corresponding to a
percentile.
How many home runs must be hit in a season
to correspond to the 75th percentile?
Barry Bonds Scatter Plot
Relcumfreq
1.2
1.0
0.8
0.6
0.4
0.2
0.0
10 20 30 40 50 60 70 80
HR
To be better than 75% of Mr. Bonds season, approximately 42
home runs must be hit.
33. A little History on the word Ogive
(sometimes called an Ogee)
It was first used by Sir Francis
Galton, who borrowed a term from
architecture to describe the
cumulative normal curve (more
about that next chapter).
The ogive in architecture was a
common decorative element in
many of the English Churches
around 1400. The picture at right
shows the door to the Church of
The Holy Cross at the village of
Caston in Norfolk. In this image you
can see the use of the ogive in the
design of the door and repeated in
the windows above.
Find more about this term at
Mathwords.