SlideShare a Scribd company logo
1 of 113
Chapter 4: Problem 3
1. For problem three in chapter four, a teacher wants to display
her students number of responses for each day of the week.
And she wants to do that with a bar chart. Since she hasn't
taken a stats class, she comes to you for help. You first enter
her data into SPSS and the results look like this-- When you
look at your data set, you'll see that it actually has the wrong
level of measurement. Notice that there's a little Venn
diagram at the top of each column, which indicates that your
data has been entered as nominal. That would be correct if you
were noting which day of the week a student participated, but
since you're noting how often a given student participated, the
correct level of measurement is a scale. Go ahead and change
that. Watch how I do that. Under variable view, under
measure, you just want to click each one and turn it into a
scale. You can also cut and paste these, and I can show you
that in another video. Once you have them changed, go back to
data view, and you'll see that at the top it has changed in two
little rulers. The next question is, how do I get SPSS to
display the average score per day rather the total number of
individual scores, which might look like a mess, and it's why
this question is a toughie. To do that we go under graphs, and
you'll see that you have two options, you can do a Chart
Builder or a Legacy Dialog. For this question we want to use
the Legacy Dialog. We go to Bar and when we click that, there
are two questions-- one, what type of bar chart? We want a
simple one. And then, how do you want the data in their area
displayed? Do we want to summarize for the groups? We
really don't. We want summary of separate variables where
each day of the week is a variable. We click on Define and
then here you'll see every day of the week. You want to bring
that over and you see your bar charts are going to represent the
mean for every day of the week. As a good habit you want to
make sure you title it, I called it "Students' Engagement
During Group Discussion." The second one is by day of
week. We hit Continue, and then when we hit OK, you're
going to see your output pop up. And here is our bar chart--
every day of the week showing the average student engagement.
And this is how you answer problem 3 in chapter 4. Good luck.
2. Identify whether these distributions are negatively skewed,
positively skewed, or not skewed at all and explain why you
describe them that way.
a. This talented group of athletes scored very high on the
vertical jump task.
b. On this incredibly crummy test, everyone received the same
score.
c. On the most difficult spelling test of the year, the third
graders wept as the scores were delivered and then their parents
complained.
3. Use the data available as Chapter 4 Data Set 3 on pie
preference to create a pie chart ☺ using SPSS.
4. For each of the following, indicate whether you would use a
pie, line, or bar chart and why.
a. The proportion of freshmen, sophomores, juniors, and seniors
in a particular university
b. Change in temperature over a 24-hour period
c. Number of applicants for four different jobs
d. Percentage of test takers who passed
e. Number of people in each of 10 categories
5. Provide an example of when you might use each of the
following types of charts. For example, you would use a pie
chart to show the proportion of children who receive a reduced-
price lunch that are in Grades 1 through 6. When you are done,
draw the fictitious chart by hand.
a. Line
b. Bar
c. Scatter/dot (extra credit)
6. Go to the library or online and find a journal article in your
area of interest that contains empirical data but does not contain
any visual representation of them. Use the data to create a chart.
Be sure to specify what type of chart you are creating and why
you chose the one you did. You can create the chart manually or
using SPSS or Excel.
7. Create the worst-looking chart that you can, crowded with
chart and font junk. Nothing makes as lasting an impression as
a bad example.
8. And, finally, what is the purpose of a chart or graph?
4 CREATING GRAPHS A PICTURE REALLY IS WORTH A
THOUSAND WORDS
4: MEDIA LIBRARY
Premium Videos
Core Concepts in Stats Video
· Examining Data: Tables and Figures
Lightboard Lecture Video
· Creating a Simple Chart
Time to Practice Video
· Chapter 4: Problem 3
Difficulty Scale
(moderately easy but not a cinch)
WHAT YOU WILL LEARN IN THIS CHAPTER
· Understanding why a picture is really worth a thousand words
· Creating a histogram and a polygon
· Understanding the different shapes of different distributions
· Using SPSS to create incredibly cool charts
· Creating different types of charts and understanding their
application and uses
WHY ILLUSTRATE DATA?
In the previous two chapters, you learned about the two most
important types of descriptive statistics—measures of central
tendency and measures of variability. Both of these provide you
with the one best number for describing a group of data (central
tendency) and a number reflecting how diverse, or different,
scores are from one another (variability).
What we did not do, and what we will do here, is examine how
differences in these two measures result in different-looking
distributions. Numbers alone (such as M = 3 and s = 3) may be
important, but a visual representation is a much more effective
way of examining the characteristics of a distribution as well as
the characteristics of any set of data.
So, in this chapter, we’ll learn how to visually represent a
distribution of scores as well as how to use different types of
graphs to represent different types of data.
CORE CONCEPTS IN STATS VIDEO
Examining Data: Tables and Figures
X-TIMESTAMP-MAP=LOCAL: Examining data helps find
data entry errors, evaluate research methodology, identify
outliers, and determine the shape of a distribution in a data
set. Researchers typically examine collected data in two ways,
by creating tables and figures. Imagine you asked a group of
friends to rate a movie they've seen on a one to five scale.
A table helps identify the variable and the possible values of
the variable. The sample size, often referred to as n, is 14
because there are ratings reported from 14 people. This is how
large the total sample is. From this, we can determine how
many in the sample have each value of the variable. We can
also determine the percentage that the sample has of each
possible value. Figures display variables from the table.
Nominal and ordinal variables can be depicted with bar charts,
while interval and ratio variables can be depicted using
histograms and frequency polygons. For this data set, we can
use a bar chart. Distributions of data can be characterized
along three aspects or dimensions, modality, symmetry, and
variability. In a unimodal distribution, a small range of values
has the greatest frequency or mode of the set. However, it's
possible for a distribution to have more than one mode. For a
bimodal distribution, we see two values that seem to occur
with the greatest frequency. A distribution is symmetrical if
folding it in half makes each half mirror the other. When a
distribution isn't symmetrical, it's called asymmetrical or
skewed. This distribution is often called positively skewed
because the tail-- or narrow end-- of the distribution is on the
right end. Variability in a distribution is the amount of
spread or dispersion of values for a variable. Peak
distributions look like a tall mountain and reflect little
variability in the scores. This means that almost everyone has
given the same score to the movie. A flat distribution has a
lot of variability, where almost everyone's scores are different.
In between is a normal distribution. It stands apart from the
distributions that are neither peak nor flat, unimodal, or
symmetrical.
TEN WAYS TO A GREAT FIGURE (EAT LESS AND
EXERCISE MORE?)
Whether you create illustrations by hand or use a computer
program, the same principles of decent design apply. Here are
10 to copy and put above your desk:
1. Minimize chart or graph junk. “Chart junk” (a close cousin to
“word junk”) happens when you use every function, every
graph, and every feature a computer program has to make your
charts busy, full, and uninformative. With graphs, more is
definitely less.
2. Plan out your chart before you start creating the final
copy. Use graph paper even if you will be using a computer
program to generate the graph. Actually, why not just use your
computer to generate and print out graph paper
(try www.printfreegraphpaper.com).
3. Say what you mean and mean what you say—no more and no
less. There’s nothing worse than a cluttered (with too much text
and fancy features) graph to confuse the reader.
4. Label everything so nothing is left to the misunderstanding of
the audience.
5. A graph should communicate only one idea—a description of
data or a demonstration of a relationship.
6. Keep things balanced. When you construct a graph, center
titles and labels.
7. Maintain the scale in a graph. “Scale” refers to the
proportional relationship between the horizontal and vertical
axes. This ratio should be about 3 to 4, so a graph that is 3
inches wide will be about 4 inches tall.
8. Simple is best and less is more. Keep the chart simple but not
simplistic. Convey one idea as straightforwardly as possible,
with distracting information saved for the accompanying text.
Remember, a chart or graph should be able to stand alone, and
the reader should be able to understand the message.
9. Limit the number of words you use. Too many words, or
words that are too large (both in terms of physical size and
idea-wise), can detract from the visual message your chart
should convey.
10. A chart alone should convey what you want to say. If it
doesn’t, go back to your plan and try it again.
FIRST THINGS FIRST: CREATING A FREQUENCY
DISTRIBUTION
The most basic way to illustrate data is through the creation of a
frequency distribution. A frequency distribution is a method of
tallying and representing how often certain scores occur. In the
creation of a frequency distribution, scores are usually grouped
into class intervals, or ranges of numbers.
Here are 50 scores on a test of reading comprehension on which
a frequency distribution is based:
47
10
31
25
20
2
11
31
25
21
44
14
15
26
21
41
14
16
26
21
7
30
17
27
24
6
30
16
29
24
35
32
15
29
23
38
33
19
28
20
35
34
18
29
21
36
32
16
27
20
And here’s the frequency distribution. You can see that for each
range of scores, there are associated frequency counts.
Class Interval
Frequency
45–49
1
40–44
2
35–39
4
30–34
8
25–29
10
20–24
10
15–19
8
10–14
4
5–9
2
0–4
1
People Who Loved Statistics
Helen M. Walker (1891–1983) began her college career
studying philosophy and then became a high school math
teacher. She got her master’s degree, taught mathematics at the
University of Kansas (your authors’ favorite college) where she
was tenured, and then studied the history of statistics (at least
up to 1929, when she wrote her doctoral dissertation at
Columbia). Dr. Walker’s greatest interest was in the teaching of
statistics, and many years after her death, a scholarship was
endowed in her name at Columbia for students who want to
teach statistics! Her publications included a whole book
teaching about the best way to show statistics using tables. Oh,
and along the way, she became the first woman president of the
American Statistical Association. All this achievement from
someone who actually loved teaching statistics. Just like your
professor!
The Classiest of Intervals
As you can see from the above table, a class interval is a range
of numbers, and the first step in the creation of a frequency
distribution is to define how large each interval will be. As you
can see in the frequency distribution that we created, each
interval spans five possible scores, such as 5–9 (which includes
scores 5, 6, 7, 8, and 9) and 40–44 (which includes scores 40,
41, 42, 43, and 44). How did we decide to have an interval that
includes only five scores? Why not five intervals, each
consisting of 10 scores? Or two intervals, each consisting of 25
scores?
Here are some general rules to follow in the creation of a class
interval, regardless of the size of values in the data set you are
dealing with:
1. Select a class interval that has a range of 2, 5, 10, 15, or 20
data points. In our example, we chose 5.
2. Select a class interval so that 10 to 20 such intervals cover
the entire range of data. A convenient way to do this is to
compute the range and then divide by a number that represents
the number of intervals you want to use (between 10 and 20). In
our example, there are 50 scores, and we wanted 10 intervals:
50/10 = 5, which is the size of each class interval. If you had a
set of scores ranging from 100 to 400, you could start with an
estimate of 20 intervals and see if the interval range makes
sense for your data: 300/20 = 15, so 15 would be the class
interval.
3. Begin listing the class interval with a multiple of that
interval. In our frequency distribution of reading comprehension
test scores, the class interval is 5, and we started the lowest
class interval at 0.
4. Finally, the interval made up of the largest scores goes at the
top of the frequency distribution.
There are some simple steps for creating class intervals on the
way to creating a frequency distribution. Here are six general
rules:
1. Determine the range.
2. Decide on the number of class intervals.
3. Decide on the size of the class interval.
4. Decide the starting point for the first class.
5. Create the class intervals.
6. Put the data into the class intervals.
Once class intervals are created, it’s time to complete the
frequency part of the frequency distribution. That’s simply
counting the number of times a score occurs in the raw data and
entering that number in each of the class intervals represented
by the count.
In the frequency distribution that we created for our reading
comprehension data, the number of scores that occur between 30
and 34 and thus are in the 30–34 class interval is 8. So, an 8
goes in the column marked Frequency. There’s your frequency
distribution. As you might realize, it is easier to do this
counting if you have your scores listed in order.
Sometimes it is a good idea to graph your data first and then do
whatever calculations or analysis is called for. By first looking
at the data, you may gain insight into the relationship between
variables, what kind of descriptive statistic is the right one to
use to describe the data, and so on. This extra step might
increase your insights and the value of what you are doing.
LIGHTBOARD LECTURE VIDEO
Creating a Simple Chart
So statisticians and professor types use graphs and charts all
the time. Let's take a second to figure out what's in a chart,
like, what are the pieces that puts this all together. Here's
some scores. See here, different scores people could get, and
here's the number of people that got those scores. Very
typical data that people like to graph all the time. And when
you make a graph, you usually have these two lines, two
dimensions. And along the bottom, you often put the actual
scores. So here are the actual scores are 8, 9, 1 And along the
top, it's very common to put something that shows how many
people got that score or the frequency. OK. So, for instance,
the 8 here, one person got that. And the 9, three people got
that. One, two, three. And the 1 Six people got that. I don't
know if we have room for that. That's up here somewhere.
And then for the 11, two people got to that. That'd be about
there. And the 12, only one person got that. So these dots
represent the different people. It's sort of hard to look at those
dots, so instead we make these bars. And the taller the bar,
you can see, then the more people got it. So, like, the 1
that's way up here around a 6. And only one person got an 8,
so that's down here around a 1. But this, now, is a picture of
what your scores look like. And if you look at just the tops of
these bars, you can see, like, where we get a normal curve.
THE PLOT THICKENS: CREATING A HISTOGRAM
Now that we’ve got a tally of how many scores fall in what
class intervals, we’ll go to the next step and create what is
called a histogram, a visual representation of the frequency
distribution where the frequencies are represented by bars.
Depending on the book or journal article or report you read and
the software you use, visual representations of data are called
graphs (such as in SPSS) or charts (such as in the Microsoft
spreadsheet Excel). It really makes no difference. All you need
to know is that a graph or a chart is the visual representation of
data.
To create a histogram, do the following:
1. Using a piece of graph paper, place values at equal distances
along the x-axis, as shown in Figure 4.1. Now, identify
the midpoint of each class interval, which is the middle point in
the interval. It’s pretty easy to just eyeball, but you can also
just add the top and bottom values of the class interval and
divide by 2. For example, the midpoint of the class interval 0–4
is the average of 0 and 4, or 4/2 = 2.
2. Draw a bar or column centered on each midpoint that
represents the entire class interval to the height representing the
frequency of that class interval. For example, in Figure 4.2, you
can see that in our first entry, the class interval of 0–4 is
represented by the frequency of 1 (representing the one time a
value between 0 and 4 occurs). Continue drawing bars or
columns until each of the frequencies for each of the class
intervals is represented. Figure 4.2 is a nice hand-drawn
(really!) histogram for the frequency distribution of the 50
scores that we have been working with so far.
Notice that each class interval is represented by a range of
scores along the x-axis.
Figure 4.1 ⬢ Class intervals along the x-axis
The Tallyho Method
You can see by the simple frequency distribution at the
beginning of the chapter that you already know more about the
distribution of scores than you’d learn from just a simple listing
of them. You have a good idea of what values occur with what
frequency. But another visual representation (besides a
histogram) can be done by using tallies for each of the
occurrences, as shown in Figure 4.3.
Figure 4.2 ⬢ A hand-drawn histogram
Figure 4.3 ⬢ Tallying scores
We used tallies that correspond with the frequency of scores
that occur within a certain class. This gives you an even better
visual representation of how often certain scores occur relative
to other scores.THE NEXT STEP: A FREQUENCY POLYGON
Creating a histogram or a tally of scores wasn’t so difficult, and
the next step (and the next way of illustrating data) is even
easier. We’re going to use the same data—and, in fact, the
histogram that you just saw created—to create a frequency
polygon. (Polygon is a word for shape.) A frequency polygon is
a continuous line that represents the frequencies of scores
within a class interval, as shown in Figure 4.4.
Figure 4.4 ⬢ A hand-drawn frequency polygon
How did we draw this? Here’s how:
1. Place a midpoint at the top of each bar or column in a
histogram (see Figure 4.2).
2. Connect the lines and you’ve got it—a frequency polygon!
Note that in Figure 4.4, the histogram on which the frequency
polygon is based is drawn using vertical and horizontal lines,
and the polygon is drawn using curved lines. That’s because,
although we want you to see what a frequency polygon is based
on, you usually don’t see the underlying histogram.
Why use a frequency polygon rather than a histogram to
represent data? For two reasons. Visually, a frequency polygon
appears more dynamic than a histogram (a line that represents
change in frequency always looks neat). Also, the use of a
continuous line suggests that the variable represented by the
scores along the x-axis is also a theoretically continuous,
interval-level measurement as we talked about in Chapter 2. (To
purists, the fact that the bars touch each other in a histogram
suggests the interval-level nature of the variable, as well.)
Cumulating Frequencies
Once you have created a frequency distribution and have
visually represented those data using a histogram or a frequency
polygon, another option is to create a visual representation of
the cumulative frequency of occurrences by class intervals. This
is called a cumulative frequency distribution.
A cumulative frequency distribution is based on the same data
as a frequency distribution but with an added column
(Cumulative Frequency), as shown below.
Class Interval
Frequency
Cumulative Frequency
45–49
1
50
40–44
2
49
35–39
4
47
30–34
8
43
25–29
10
35
20–24
10
25
15–19
8
15
10–14
4
7
5–9
2
3
0–4
1
1
The cumulative frequency distribution begins with the creation
of a new column labeled “Cumulative Frequency.” Then, we add
the frequency in a class interval to all the frequencies below it.
For example, for the class interval of 0–4, there is 1 occurrence
and none below it, so the cumulative frequency is 1. For the
class interval of 5–9, there are 2 occurrences in that class
interval and one below it for a total of 3 (2 + 1) occurrences.
The last class interval (45–49) contains 1 occurrence, and there
are now a total of 50 occurrences at or below that class interval.
Once we create the cumulative frequency distribution, then we
can plot it as a histogram or a frequency polygon. Only this
time, we’ll skip right ahead and plot the midpoint of each class
interval as a function of the cumulative frequency of that class
interval. You can see the cumulative frequency distribution
in Figure 4.5 based on these same 50 scores. Notice this
frequency polygon is shaped a little like a letter S. If the scores
in a data set are distributed the way scores typically are,
cumulative frequencies will often graph this way.
Figure 4.5 ⬢ A hand-drawn cumulative frequency distribution
Another name for a cumulative frequency polygon is an ogive.
And, if the distribution of the data is normal or bell shaped
(see Chapter 8 for more on this), then the ogive represents what
is popularly known as a bell curve or a normal distribution.
SPSS creates a really nice ogive—it’s called a P-P plot (for
probability plot) and is easy to create. See Appendix A for an
introduction to creating graphs using SPSS, as well as the
material toward the end of this chapter.
OTHER COOL WAYS TO CHART DATA
What we did so far in this chapter is take some data and show
how charts such as histograms and polygons can be used to
communicate visually. But several other types of charts are also
used in the behavioral and social sciences, and although it’s not
necessary for you to know exactly how to create them
(manually), you should at least be familiar with their names and
what they do. So here are some popular charts, what they do,
and how they do it.
There are several very good personal computer applications for
creating charts, among them the spreadsheet Excel (a Microsoft
product) and, of course, SPSS. The charts in the “Using the
Computer to Illustrate Data” section were created using SPSS as
well.
Bar Charts
A bar or column chart should be used when you want to
compare the frequencies of different categories with one
another. Categories are organized horizontally on the x-axis,
and values are shown vertically on the y-axis. Here are some
examples of when you might want to use a column chart:
· Number of participants in different water exercise activities
· The sales of three different types of products
· Number of children in each of six different grades
Figure 4.6 shows a graph of number of participants in different
water activities.
Figure 4.6 ⬢ A bar chart that compares different water activities
Column Charts
A column chart is identical to a bar chart, but in this chart,
categories are organized on the y-axis (which is the vertical
one), and values are shown on the x-axis (the horizontal one).
Line Charts
A line chart should be used when you want to show a trend in
the data at equal intervals. This sort of graph is often used when
the x-axis represents time. Here are some examples of when you
might want to use a line chart:
· Number of cases of mononucleosis (mono) per season among
college students at three state universities
· Toy sales for the T&K company over four quarters
· Number of travelers on two different airlines for each quarter
In Figure 4.7, you can see a chart of sales in units over four
quarters.
Figure 4.7 ⬢ Using a line chart to show a trend over time
Pie Charts
A pie chart should be used when you want to show the
proportion or percentage of people or things in various
categories. The rule is that the percentages in each “slice” must
add up to 100%, to make a whole pie. Here are some examples
of when you might want to use a pie chart:
· Of children living in poverty, the percentage who represent
various ethnicities
· Of students enrolled, the proportion who are in night or day
classes
· Of participants, the percentage in various age groups
Note that a pie chart describes a nominal-level variable (such as
ethnicity, time of enrollment, and age groups).
In Figure 4.8, you can see a pie chart of voter preference. And
we did a few fancy-schmancy things, such as separating and
labeling the slices.
Figure 4.8 ⬢ A pie chart illustrating the relative proportion of
one category to others
USING THE COMPUTER (SPSS, THAT IS) TO ILLUSTRATE
DATA
Now let’s use SPSS and go through the steps in creating some
of the charts that we explored in this chapter. First, here are
some general SPSS charting guidelines.
1. Although there are a couple options, we will use the Chart
Builder option on the Graphs menu. This is the easiest way to
get started and well worth learning how to use.
2. In general, you click Graphs → Chart Builder, and you see a
dialog box from which you will select the type of graph you
want to create.
3. Click the type of graph you want to create and then select the
specific design of that type of graph.
4. Drag the variable names to the axis where each belongs.
5. Click OK, and you’ll see your graph.
Let’s practice.
Creating a Histogram
1. Enter the data you want to use to create the graph. Use those
50 scores we’ve been using in this chapter or make some up just
to practice with.
2. Click Graphs → Chart Builder and you will see the Chart
Builder dialog box, as shown in Figure 4.9. If you see any other
screen, click OK.
3. Click the Histogram option in the Choose from: list and
double-click the first image.
4. Drag the variable you wish to graph to the “x-axis?” location
in the preview window.
5. Click OK and you will see a histogram, as shown in Figure
4.10.
Figure 4.9 ⬢ The Chart Builder dialog box
The histogram in Figure 4.10 looks a bit different from the
hand-drawn one representing the same data shown earlier in this
chapter, in Figure 4.2. The difference is that SPSS defines class
intervals using its own idiosyncratic method. SPSS took as the
middle of a class interval the bottom number of the interval
(such as 10) rather than the midpoint (such as 12.5).
Consequently, scores are allocated to different groups. The
lesson here? How you group data makes a big difference in the
way they look in a histogram. And, once you get to know SPSS
well, you can make all kinds of fine-tuned adjustments to make
graphs appear exactly as you want them.
Creating a Bar Graph
To create a bar graph, follow these steps:
1. Enter the data you want to use to create the graph. We used
the following data that show the number of people in a club who
belong to each of three political parties. 1 = Democrat, 2 =
Republican, 3 = Independent
1, 1, 2, 3, 2, 1, 1, 2, 1
2. Click Graphs → Chart Builder, and you will see the Chart
Builder dialog box, as shown in Figure 4.11. If you see any
other screen, click OK.
3. Click the Bar option in the Choose from: list and double-
click the first image.
4. Drag the variable named Party to the x-axis? location in the
preview window.
5. Drag the variable named Number to the Count axis.
6. Click OK and you will see the bar graph, as shown in Figure
4.12.
Figure 4.11 ⬢ The Chart Builder dialog box
Figure 4.12 ⬢ A bar graph created using the Chart Builder
Creating a Line Graph
To create a line graph, follow these steps:
1. Enter the data you want to use to create the graph. In this
example, we will be using the percentage of the total student
body who attended the first day of classes each year over the
duration of a 10-year program. Here are the data. You can type
them into SPSS exactly as shown here, with the top row being
the names you will give the two variables (columns).
Year
Attendance
1
87
2
88
3
89
4
76
5
80
6
96
7
91
8
97
9
89
10
79
2. Click Graphs → Chart Builder and you will see the Chart
Builder dialog box, as shown in Figure 4.11. If you see any
other screen, click OK.
3. Click the Line option in the Choose from: list and double-
click the first image.
4. Drag the variable named Year to the x-axis? location in the
preview window.
5. Drag the variable named Attendance to the y-axis? location.
6. Click OK, and you will see the line graph, as shown in Figure
4.13. We used the SPSS Chart Editor to change the minimum
and maximum values on the y-axis.
Figure 4.13 ⬢ A line graph created using the Chart Builder
Creating a Pie Chart
To create a pie chart, follow these steps:
1. Enter the data you want to use to create the chart. In this
example, the pie chart represents the percentage of people
buying different brands of doughnuts. Here are the data:
Brand
Percentage
Krispies
55
Dunks
35
Other
10
2. Click Graphs → Chart Builder, and you will see the Chart
Builder dialog box, as shown in Figure 4.11. If you see any
other screen, click OK.
3. Click the Pie/Polar option in the Choose from: list and
double-click the only image.
4. Drag the variable named Brand to the Slice by? axis label.
5. Drag the variable named Percentage to the Angle Variable?
axis label.
6. Click OK, and you will see the pie chart, as shown in Figure
4.14.
Figure 4.14 ⬢ A pie chart created using the Chart Builder
Real-World Stats
Graphs work, and a picture really is worth more than a thousand
words.
In this article, an oldie but goodie, the researchers examined
how people perceive and process statistical graphs. Stephen
Lewandowsky and Ian Spence reviewed empirical studies
designed to explore how suitable different types of graphs are
and how what is known about human perception can have an
impact on the design and utility of these charts.
They focused on some of the theoretical explanations for why
certain elements work and don’t, the use of pictorial symbols
(like a happy face symbol, which could make up the bar in a bar
chart), and multivariate displays, where more than one set of
data needs to be represented. And, as is very often the case with
any paper, they concluded that not enough data were available
yet. Given the increasingly visual world in which we live
(emojis, anyone? ☹ ☺), this is interesting and useful reading to
gain a historical perspective on how information was (and still
is) discussed as a scientific topic.
Want to know more? Go online or to the library and find …
Lewandowsky, S., & Spence, I. (1989). The perception of
statistical graphs. Sociological Methods Research, 18, 200–242.
Summary
There’s no question that charts are fun to create and can add
enormous understanding to what might otherwise appear to be
disorganized data. Follow our suggestions in this chapter and
use charts well but only when they enhance, not duplicate,
what’s already there.
Time to Practice
1. A data set of 50 comprehension scores (named
Comprehension Score) called Chapter 4 Data Set 1 is available
in Appendix C and on the website. Answer the following
questions and/or complete the following tasks:
a. Create a frequency distribution and a histogram for the set.
b. Why did you select the class interval you used?
2. Here is a frequency distribution. Create a histogram by hand
or by using SPSS.
Class Interval
Frequency
261–280
140
241–260
320
221–240
3,380
201–220
600
181–200
500
161–180
410
141–160
315
121–140
300
100–120
200
3. A third-grade teacher wants to improve her students’ level of
engagement during group discussions and instruction. She keeps
track of each of the 15 third graders’ number of responses every
day for 1 week, and the data are available as Chapter 4 Data Set
2. Use SPSS to create a bar chart with one bar for each day (and
warning—this may be a toughie).Time to Practice VideoChapter
4: Problem 3
1. For problem three in chapter four, a teacher wants to display
her students number of responses for each day of the week.
And she wants to do that with a bar chart. Since she hasn't
taken a stats class, she comes to you for help. You first enter
her data into SPSS and the results look like this-- When you
look at your data set, you'll see that it actually has the wrong
level of measurement. Notice that there's a little Venn
diagram at the top of each column, which indicates that your
data has been entered as nominal. That would be correct if you
were noting which day of the week a student participated, but
since you're noting how often a given student participated, the
correct level of measurement is a scale. Go ahead and change
that. Watch how I do that. Under variable view, under
measure, you just want to click each one and turn it into a
scale. You can also cut and paste these, and I can show you
that in another video. Once you have them changed, go back to
data view, and you'll see that at the top it has changed in two
little rulers. The next question is, how do I get SPSS to
display the average score per day rather the total number of
individual scores, which might look like a mess, and it's why
this question is a toughie. To do that we go under graphs, and
you'll see that you have two options, you can do a Chart
Builder or a Legacy Dialog. For this question we want to use
the Legacy Dialog. We go to Bar and when we click that, there
are two questions-- one, what type of bar chart? We want a
simple one. And then, how do you want the data in their area
displayed? Do we want to summarize for the groups? We
really don't. We want summary of separate variables where
each day of the week is a variable. We click on Define and
then here you'll see every day of the week. You want to bring
that over and you see your bar charts are going to represent the
mean for every day of the week. As a good habit you want to
make sure you title it, I called it "Students' Engagement
During Group Discussion." The second one is by day of
week. We hit Continue, and then when we hit OK, you're
going to see your output pop up. And here is our bar chart--
every day of the week showing the average student engagement.
And this is how you answer problem 3 in chapter 4. Good luck.
2. Identify whether these distributions are negatively skewed,
positively skewed, or not skewed at all and explain why you
describe them that way.
a. This talented group of athletes scored very high on the
vertical jump task.
b. On this incredibly crummy test, everyone received the same
score.
c. On the most difficult spelling test of the year, the third
graders wept as the scores were delivered and then their parents
complained.
3. Use the data available as Chapter 4 Data Set 3 on pie
preference to create a pie chart ☺ using SPSS.
4. For each of the following, indicate whether you would use a
pie, line, or bar chart and why.
a. The proportion of freshmen, sophomores, juniors, and seniors
in a particular university
b. Change in temperature over a 24-hour period
c. Number of applicants for four different jobs
d. Percentage of test takers who passed
e. Number of people in each of 10 categories
5. Provide an example of when you might use each of the
following types of charts. For example, you would use a pie
chart to show the proportion of children who receive a reduced-
price lunch that are in Grades 1 through 6. When you are done,
draw the fictitious chart by hand.
a. Line
b. Bar
c. Scatter/dot (extra credit)
6. Go to the library or online and find a journal article in your
area of interest that contains empirical data but does not contain
any visual representation of them. Use the data to create a chart.
Be sure to specify what type of chart you are creating and why
you chose the one you did. You can create the chart manually or
using SPSS or Excel.
7. Create the worst-looking chart that you can, crowded with
chart and font junk. Nothing makes as lasting an impression as
a bad example.
8. And, finally, what is the purpose of a chart or graph?
Student Study Site
Get the tools you need to sharpen your study skills!
Visit edge.sagepub.com/salkindfrey7e to access practice
quizzes, eFlashcards, original and curated videos, data sets, and
more!5 COMPUTING CORRELATION COEFFICIENTS ICE
CREAM AND CRIME5: MEDIA LIBRARY
Premium VideosCore Concepts in Stats Video
· CorrelationLightboard Lecture Video
· Partial CorrelationsTime to Practice Video
· Chapter 5: Problem 6
Difficulty Scale
(moderately hard)WHAT YOU WILL LEARN IN THIS
CHAPTER
· Understanding what correlations are and how they work
· Computing a simple correlation coefficient
· Interpreting the value of the correlation coefficient
· Understanding what other types of correlations exist and when
they should be usedWHAT ARE CORRELATIONS ALL
ABOUT?
Measures of central tendency and measures of variability are
not the only descriptive statistics that we are interested in using
to get a picture of what a set of scores looks like. You have
already learned that knowing the values of the one most
representative score (central tendency) and a measure of spread
or dispersion (variability) is critical for describing the
characteristics of a distribution.
However, sometimes we are as interested in the relationship
between variables—or, to be more precise, how the value of one
variable changes when the value of another variable changes.
The way we express this interest is through the computation of a
simple correlation coefficient. For example, what’s the
relationship between age and strength? Income and years of
education? Memory skills and amount of drug use? Your
political attitudes and the attitudes of your parents?
A correlation coefficient is a numerical index that reflects the
relationship or association between two variables. The value of
this descriptive statistic ranges between −1.00 and +1.00. A
correlation between two variables is sometimes referred to as
a bivariate (for two variables) correlation. Even more
specifically, the type of correlation that we will talk about in
the majority of this chapter is called the Pearson product-
moment correlation, named for its inventor, Karl Pearson.
The Pearson correlation coefficient examines the relationship
between two variables, but both of those variables are
continuous in nature. In other words, they are variables that can
assume any value along some underlying continuum; examples
include height (you really can be 5 feet 6.1938574673 inches
tall), age, test score, and income. Remember in Chapter 2, when
we talked about levels of measurement? Interval and ratio levels
of measurement are continuous. But a host of other variables are
not continuous. They’re called discrete or categorical variables,
and examples are race (such as black and white), social class
(such as high and low), and political affiliation (such as
Democrat and Republican). In Chapter 2, we called these types
of variables nominal level. You need to use other correlational
techniques, such as the phi correlation, in these cases. These
topics are for a more advanced course, but you should know
they are acceptable and very useful techniques. We mention
them briefly later on in this chapter.
Other types of correlation coefficients measure the relationship
between more than two variables, and we’ll talk about one of
these in some more advanced chapters later on (which you are
looking forward to already, right?).
Types of Correlation Coefficients: Flavor 1 and Flavor 2
A correlation reflects the dynamic quality of the relationship
between variables. In doing so, it allows us to understand
whether variables tend to move in the same or opposite
directions in relationship to each other. If variables change in
the same direction, the correlation is called a direct
correlation or a positive correlation. If variables change in
opposite directions, the correlation is called an indirect
correlation or a negative correlation. Table 5.1 shows a
summary of these relationships.
Table 5.1 ⬢ Types of Correlations
What Happens to Variable X
What Happens to Variable Y
Type of Correlation
Value
Example
X increases in value.
Y increases in value.
Direct or positive
Positive, ranging from .00 to +1.00
The more time you spend studying, the higher your test score
will be.
X decreases in value.
Y decreases in value.
Direct or positive
Positive, ranging from .00 to +1.00
The less money you put in the bank, the less interest you will
earn.
X increases in value.
Y decreases in value.
Indirect or negative
Negative, ranging from −1.00 to .00
The more you exercise, the less you will weigh.
X decreases in value.
Y increases in value.
Indirect or negative
Negative, ranging from –1.00 to .00
The less time you take to complete a test, the more items you
will get wrong.
Now, keep in mind that the examples in the table reflect
generalities, for example, regarding time to complete a test and
the number of items correct on that test. In general, the less
time that is taken on a test, the lower the score. Such a
conclusion is not rocket science, because the faster one goes,
the more likely one is to make careless mistakes such as not
reading instructions correctly. But, of course, some people can
go very fast and do very well. And other people go very slowly
and don’t do well at all. The point is that we are talking about
the average performance of a group of people on two different
variables. We are computing the correlation between the two
variables for the group of people, not for any one particular
person.
There are several easy (but important) things to remember about
the correlation coefficient:
· A correlation can range in value from −1.00 to +1.00.
· The absolute value of the coefficient reflects the strength of
the correlation. So a correlation of −.70 is stronger than a
correlation of +.50. One frequently made mistake regarding
correlation coefficients occurs when students assume that a
direct or positive correlation is always stronger (i.e., “better”)
than an indirect or negative correlation because of the sign and
nothing else.
· To calculate a correlation, you need exactly two variables and
at least two people.
· Another easy mistake is to assign a value judgment to the sign
of the correlation. Many students assume that a negative
relationship is not good and a positive one is good. But think of
the example from Table 5.1 where exercise and weight have a
negative correlation. That negative correlation is a positive
thing! That’s why, instead of using the
terms negative and positive, you might prefer to use the
terms indirect and direct to communicate meaning more clearly.
· The Pearson product-moment correlation coefficient is
represented by the small letter r with a subscript representing
the variables that are being correlated. You’d think that P for
Pearson might be used as the symbol for this correlation, but in
Greek, the P letter actually is similar to the English “r” sound,
so r is used. P is used for the theoretical correlation in a
population, so don’t feel sorry for Pearson. (If it helps, think
of r as standing for relationship.) For example,
· rxy is the correlation between variable X and variable Y.
· rweight-height is the correlation between weight and height.
· rSAT.GPA is the correlation between SAT score and grade
point average (GPA).
The correlation coefficient reflects the amount of variability
that is shared between two variables and what they have in
common. For example, you can expect an individual’s height to
be correlated with an individual’s weight because these two
variables share many of the same characteristics, such as the
individual’s nutritional and medical history, general health, and
genetics, and, of course, taller people have more mass usually.
On the other hand, if one variable does not change in value and
therefore has nothing to share, then the correlation between it
and another variable is zero. For example, if you computed the
correlation between age and number of years of school
completed, and everyone was 25 years old, there would be no
correlation between the two variables because there is literally
no information (no variability) in age available to share.
Likewise, if you constrain or restrict the range of one variable,
the correlation between that variable and another variable will
be less than if the range is not constrained. For example, if you
correlate reading comprehension and grades in school for very
high-achieving children, you’ll find the correlation to be lower
than if you computed the same correlation for children in
general. That’s because the reading comprehension score of
very high-achieving students is quite high and much less
variable than it would be for all children. The moral? When you
are interested in the relationship between two variables, try to
collect sufficiently diverse data—that way, you’ll get the truest
representative result. And how do you do that? Measure a
variable as precisely as possible (use higher, more informative
levels of measurement) and use a sample that varies greatly on
the characteristics you are interested in.COMPUTING A
SIMPLE CORRELATION COEFFICIENT
The computational formula for the simple Pearson product-
moment correlation coefficient between a variable
labeled X and a variable labeled Y is shown in Formula
5.1:COMPUTING A SIMPLE CORRELATION COEFFICIENT
The computational formula for the simple Pearson product-
moment correlation coefficient between a variable
labeled X and a variable labeled Y is shown in Formula 5.1:
(5.1)
rxy=n∑XY−∑X∑Y√[n∑X2−(∑X)2][n∑Y2−(∑Y)2],rxy=n∑XY−
∑X∑Y[n∑X2−(∑X)2][n∑Y2−(∑Y)2],
where
· rxy is the correlation coefficient between X and Y;
· n is the size of the sample;
· X is each individual’s score on the X variable;
· Y is each individual’s score on the Y variable;
· XY is the product of each X score times its
corresponding Y score;
· X2 is each individual’s X score, squared; and
· Y2 is each individual’s Y score, squared.
Here are the data we will use in this example:
X
Y
X2
Y2
XY
2
3
4
9
6
4
2
16
4
8
5
6
25
36
30
6
5
36
25
30
4
3
16
9
12
7
6
49
36
42
8
5
64
25
40
5
4
25
16
20
6
4
36
16
24
7
5
49
25
35
Total, Sum, or ∑
54
43
320
201
247
Before we plug the numbers in, let’s make sure you understand
what each one represents:
· ∑X, or the sum of all the X values, is 54.
· ∑Y, or the sum of all the Y values, is 43.
· ∑X 2, or the sum of each X value squared, is 320.
· ∑Y 2, or the sum of each Y value squared, is 201.
· ∑XY, or the sum of the products of X and Y, is 247.
It’s easy to confuse the sum of a set of values squared and the
sum of the squared values. The sum of a set of values squared is
taking values such as 2 and 3, summing them (to be 5), and then
squaring that (which is 25). The sum of the squared values is
taking values such as 2 and 3, squaring them (to get 4 and 9,
respectively), and then adding those together (to get 13). Just
look for the parentheses as you work.
Here are the steps in computing the correlation coefficient:
1. List the two values for each participant. You should do this
in a column format so as not to get confused. Use graph paper if
working manually or SPSS or some other data analysis tool if
working digitally.
2. Compute the sum of all the X values and compute the sum of
all the Y values.
3. Square each of the X values and square each of the Y values.
4. Find the sum of the XY products.
These values are plugged into the equation you see in Formula
5.2:
(5.2)
rxy=(10×247)−(54×43)√[(10×320)−542][(10×201)−432].rxy=(1
0×247)−(54×43)[(10×320)−542][(10×201)−432].
Ta-da! And you can see the answer in Formula 5.3:
(5.3)
rxy=148213.83=.692.rxy=148213.83=.692.
What’s really interesting about correlations is that they measure
the amount of distance that one variable covaries in relation to
another. So, if both variables are highly variable (have lots of
wide-ranging values), the correlation between them is more
likely to be high than if not. Now, that’s not to say that lots of
variability guarantees a higher correlation, because the scores
have to vary in a systematic way. But if the variance is
constrained in one variable, then no matter how much the other
variable changes, the correlation will be lower. For example,
let’s say you are examining the correlation between academic
achievement in high school and first-year grades in college and
you look at only the top 10% of the class. Well, that top 10% is
likely to have very similar grades, introducing no variability
and no room for the one variable to vary as a function of the
other. Guess what you get when you correlate one variable with
another variable that does not change (that is, has no
variability)? rxy = 0, that’s what. The lesson here? Variability
works, and you should not artificially limit it.
The Scatterplot: A Visual Picture of a Correlation
There’s a very simple way to visually represent a correlation:
Create what is called a scatterplot, or scattergram (in SPSS
lingo it’s a scatter/dot graph). This is simply a plot of each set
of scores on separate axes.
Here are the steps to complete a scattergram like the one you
see in Figure 5.1, which plots the 10 sets of scores for which we
computed the sample correlation earlier.
Figure 5.1 ⬢ A simple scattergram
1. Draw the x-axis and the y-axis. Usually, the X variable goes
on the horizontal axis and the Y variable goes on the vertical
axis.
2. Mark both axes with the range of values that you know to be
the case for the data. For example, the value of the X variable in
our example ranges from 2 to 8, so we marked the x-axis from 0
to 9. There’s no harm in marking the axes a bit low or high—
just as long as you allow room for the values to appear. The
value of the Y variable ranges from 2 to 6, and we marked that
axis from 0 to 9. Having similarly labeled (and scaled) axes can
sometimes make the finished scatterplot easier to understand.
3. Finally, for each pair of scores (such as 2 and 3, as shown
in Figure 5.1), we entered a dot on the chart by marking the
place where 2 falls on the x-axis and 3 falls on the y-axis. The
dot represents a data point, which is the intersection of the two
values.
When all the data points are plotted, what does such an
illustration tell us about the relationship between the variables?
To begin with, the general shape of the collection of data points
indicates whether the correlation is direct (positive) or indirect
(negative).
A positive slope occurs when the data points group themselves
in a cluster from the lower left-hand corner on the x- and y-axes
through the upper right-hand corner. A negative slope occurs
when the data points group themselves in a cluster from the
upper left-hand corner on the x- and y-axes through the lower
right-hand corner.
Here are some scatterplots showing very different correlations
where you can see how the grouping of the data points reflects
the sign and strength of the correlation coefficient.
Figure 5.2 shows a perfect direct correlation, where rxy = 1.00
and all the data points are aligned along a straight line with a
positive slope.
Figure 5.2 ⬢ A perfect direct, or positive, correlation
If the correlation were perfectly indirect, the value of the
correlation coefficient would be −1.00, and the data points
would align themselves in a straight line as well but from the
upper left-hand corner of the chart to the lower right. In other
words, the line that connects the data points would have a
negative slope. And, remember, in both examples, the strength
of the association is the same; it is only the direction that is
different.
Don’t ever expect to find a perfect correlation between any two
variables in the behavioral or social sciences. Such a correlation
would say that two variables are so perfectly related, they share
everything in common. In other words, knowing one is exactly
like knowing the other. Just think about your classmates. Do
you think they all share any one thing in common that is
perfectly related to another of their characteristics across all
those different people? Probably not. In fact, r values
approaching .7 and .8 are just about the highest you’ll see.
In Figure 5.3, you can see the scatterplot for a strong (but not
perfect) direct relationship where rxy = .70. Notice that the data
points align themselves along a positive slope, although not
perfectly.
Now, we’ll show you a strong indirect, or negative, relationship
in Figure 5.4, where rxy = −.82. Notice that the data points
align themselves on a negative slope from the upper left-hand
corner of the chart to the lower right-hand corner.
That’s what different types of correlations look like, and you
can really tell the general strength and direction by examining
the way the points are grouped.
Figure 5.3 ⬢ A strong, but not perfect, direct relationship
Not all correlations are reflected by a straight line showing
the X and the Y values in a relationship called a linear
correlation (see Chapter 16 for tons of fun stuff about this). The
relationship may not be linear and may not be reflected by a
straight line. Let’s take the correlation between age and
memory. For the early years, the correlation is probably highly
positive—the older children get, the better their memory. Then,
into young and middle adulthood, there isn’t much of a change
or much of a correlation, because most young and middle adults
maintain a good (but not necessarily increasingly better)
memory. But with old age, memory begins to suffer, and there is
an indirect relationship between memory and aging in the later
years. If you take these together and look at the relationship
over the life span, you find that the correlation between memory
and age tends to look something like a curve where age
continues to grow at the same rate but memory increases at
first, levels off, and then decreases. It’s
a curvilinear relationship, and sometimes, the best description
of a relationship is that it is curvilinear.CORE CONCEPTS IN
STATS VIDEOCorrelation
People who perform well in high school tend to do well in
college-- right? Say you wanted to measure the relationship
between students' GPA in high school and their GPA in
college-- you would calculate a Pearson correlation between
the two variables. To see the relationship between two
variables, we draw a picture of the data using a scatter plot.
We plot each person's high school GPA on the x-axis and their
college GPA on the y-axis. Notice how the dots have a pattern.
Next, we compute a line called a regression line that runs
through the center of the dots. This line is computed using the
average. We use a formula to measure how much each GPA
varies independently and how much the two GPA is very
together. The variance is the average amount that each
student's high school GPA differs from the mean of all of the
high school GPAs. Of course, each student's college GPA
also varies from the mean of the college GPAs. But high
school GPA and college GPA also very together. People who
do well in high school tend to do well in college. Varying
together is called covariance. The Pearson correlation is
calculated by dividing the covariance of the two variables by
the variance of the two variables. Sometimes the dots are
spread out from the line-- that means there is a lot of
independent variance. The relationship between the two
variables is weaker. Sometimes the dots are close to the line--
the relationship is stronger. Sometimes the line looks like this-
- as one goes up, the other goes up. Sometimes the line is
like this-- as one goes up, the other goes down. College GPA
is negatively related to the amount of time spent partying.
We use correlation to understand the strength and the direction
of the relationship between the variables. The Correlation
Matrix: Bunches of Correlations
What happens if you have more than two variables and you want
to see correlations among all pairs of variables? How are the
correlations illustrated? Use a correlation matrix like the one
shown in Table 5.2—a simple and elegant solution.
As you can see in these made-up data, there are four variables
in the matrix: level of income (Income), level of education
(Education), attitude toward voting (Attitude), and how sure
they are that they will vote (Vote).
Table 5.2 ⬢ Correlation Matrix
Income
Education
Attitude
Vote
Income
1.00
.574
−.08
−.291
Education
.574
1.00
−.149
−.199
Attitude
−.08
−.149
1.00
−.169
Vote
−.291
−.199
−.169
1.00
For each pair of variables, there is a correlation coefficient. For
example, the correlation between income level and education is
.574. Similarly, the correlation between income level and how
sure people are that they will vote in the next election is −.291
(meaning that the higher the level of income, the less confident
people were that they would vote).
In such a matrix with four variables, there are really only six
correlation coefficients. Because variables correlate perfectly
with themselves (those are the 1.00s down the diagonal), and
because the correlation between Income and Vote is the same as
the correlation between Vote and Income, the matrix creates a
mirror image of itself.
You can use SPSS—or almost any other statistical analysis
package, such as Excel—to easily create a matrix like the one
you saw earlier. In applications like Excel, you can use the Data
Analysis ToolPak.
You will see such matrices (the plural of matrix) when you read
journal articles that use correlations to describe the
relationships among several variables.
Understanding What the Correlation Coefficient Means
Well, we have this numerical index of the relationship between
two variables, and we know that the higher the value of the
correlation (regardless of its sign), the stronger the relationship
is. But how can we interpret it and make it a more meaningful
indicator of a relationship?
Here are different ways to look at the interpretation of that
simple rxy.
Using-Your-Thumb (or Eyeball) Method
Perhaps the easiest (but not the most informative) way to
interpret the value of a correlation coefficient is by eyeballing it
and using the information in Table 5.3. This is based on
customary interpretations of the size of a correlation in the
behavioral sciences.
So, if the correlation between two variables is .3, you could
safely conclude that the relationship is a moderate one—not
strong but certainly not weak enough to say that the variables in
question don’t share anything in common.
Table 5.3 ⬢ Interpreting a Correlation Coefficient
Size of the Correlation
Coefficient General Interpretation
.5 to 1.0
Strong relationship
.4
Moderate to strong relationship
.3
Moderate relationship
.2
Weak to moderate relationship
0 to .1
Weak or no relationship
This eyeball method is perfectly acceptable for a quick
assessment of the strength of the relationship between variables,
such as when you briefly evaluate data presented visually. But
because this rule of thumb depends on a subjective judgment (of
what’s “strong” or “weak”), we would like a more precise
method. That’s what we’ll look at now.
Special Effects! Correlation Coefficient
Throughout the book, we will learn about various effect sizes
and how to interpret them. An effect size is an index of the
strength of the relationship among variables, and with most
statistical procedures we learn about, there will be an associated
effect size that should be reported and interpreted. The
correlation coefficient is a perfect example of an effect size as
it quite literally is a measure of the strength of a relationship.
Thanks to Table 5.3, we already know how to interpret it.
SQUARING THE CORRELATION COEFFICIENT: A
DETERMINED EFFORT
Here’s the much more precise way to interpret the correlation
coefficient: computing the coefficient of determination.
The coefficient of determination is the percentage of variance in
one variable that is accounted for by the variance in the other
variable. Quite a mouthful, huh?
Earlier in this chapter, we pointed out how variables that share
something in common tend to be correlated with one another. If
we correlated math and language arts grades for 100 fifth-grade
students, we would find the correlation to be moderately strong,
because many of the reasons why children do well (or poorly) in
math tend to be the same reasons why they do well (or poorly)
in language arts. The number of hours they study, how bright
they are, how interested their parents are in their schoolwork,
the number of books they have at home, and more are all related
to both math and language arts performance and account for
differences between children (and that’s where the variability
comes in).
The more these two variables share in common, the more they
will be related. These two variables share variability—or the
reason why children differ from one another. And on the whole,
the brighter child who studies more will do better.
To determine exactly how much of the variance in one variable
can be accounted for by the variance in another variable, the
coefficient of determination is computed by squaring the
correlation coefficient.
For example, if the correlation between GPA and number of
hours of study time is .70 (or r GPA.time = .70), then the
coefficient of determination, represented
by r2GPA.timerGPA.time2 , is .702, or .49. This means that
49% of the variance in GPA “can be explained by” or “is shared
by” the variance in studying time. And the stronger the
correlation, the more variance can be explained (which only
makes good sense). The more two variables share in common
(such as good study habits, knowledge of what’s expected in
class, and lack of fatigue), the more information about
performance on one score can be explained by the other score.
However, if 49% of the variance can be explained, this means
that 51% cannot—so even for a very strong correlation of .70,
many of the reasons why scores on these variables tend to be
different from one another go unexplained. This amount of
unexplained variance is called the coefficient of alienation (also
called the coefficient of nondetermination). Don’t worry. No
aliens here. This isn’t X-Files or Walking Dead stuff—it’s just
the amount of variance in Y not explained by X (and, of course,
vice versa since the relationship goes both ways).
How about a visual presentation of this sharing variance idea?
Okay. In Figure 5.5, you’ll find a correlation coefficient, the
corresponding coefficient of determination, and a diagram that
represents how much variance is shared between the two
variables. The larger the shaded area in each diagram (and the
more variance the two variables share), the more highly the
variables are correlated.
· The first diagram in Figure 5.5 shows two circles that do not
touch. They don’t touch because they do not share anything in
common. The correlation is zero.
· The second diagram shows two circles that overlap. With a
correlation of .5 (and r2xy=.25rxy2=.25 ), they share about 25%
of the variance between them.
· Finally, the third diagram shows two circles placed almost on
top of each other. With an almost perfect correlation of rxy =
.90 (r2xy=.81rxy2=.81 ), they share about 81% of the variance
between them.
Figure 5.5 ⬢ How variables share variance and the resulting
correlationAs More Ice Cream Is Eaten … the Crime Rate Goes
Up (or Association vs. Causality)
Now here’s the really important thing to be careful about when
computing, reading about, or interpreting correlation
coefficients.
Imagine this. In a small midwestern town, a phenomenon
occurred that defied any logic. The local police chief observed
that as ice cream consumption increased, crime rates tended to
increase as well. Quite simply, if you measured both, you would
find the relationship was direct, meaning that as people eat
more ice cream, the crime rate increases. And as you might
expect, as they eat less ice cream, the crime rate goes down.
The police chief was baffled until he recalled the Stats 1 class
he took in college and still fondly remembered. (He probably
also pulled out his copy of this book that he still owned. In fact,
it was likely one of three copies he had purchased to make sure
he always had one handy.)
He wondered how this could be turned into an aha! “Very
easily,” he thought. The two variables must share something or
have something in common with one another. Remember that it
must be something that relates to both level of ice cream
consumption and level of crime rate. Can you guess what that
is?
The outside temperature is what they both have in common.
When it gets warm outside, such as in the summertime, more
crimes are committed (it stays light longer, people leave the
windows open, bad guys and girls are out more, etc.). And
because it is warmer, people enjoy the ancient treat and art of
eating ice cream. Conversely, during the long and dark winter
months, less ice cream is consumed and fewer crimes are
committed as well.
Joe, though, recently elected as a city commissioner, learns
about these findings and has a great idea, or at least one that he
thinks his constituents will love. (Keep in mind, he skipped the
statistics offering in college.) Why not just limit the
consumption of ice cream in the summer months to reduce the
crime rate? Sounds good, right? Well, on closer inspection, it
really makes no sense at all.
That’s because of the simple principle that correlations express
the association that exists between two or more variables; they
have nothing to do with causality. In other words, just because
level of ice cream consumption and crime rate increase together
(and decrease together as well) does not mean that a change in
one results in a change in the other.
For example, if we took all the ice cream out of all the stores in
town and no more was available, do you think the crime rate
would decrease? Of course not, and it’s preposterous to think
so. But strangely enough, that’s often how associations are
interpreted—as being causal in nature—and complex issues in
the social and behavioral sciences are reduced to trivialities
because of this misunderstanding. Did long hair and hippiedom
have anything to do with the Vietnam conflict? Of course not.
Does the rise in the number of crimes committed have anything
to do with more efficient and safer cars? Of course not. But they
all happen at the same time, creating the illusion of being
associated.
People Who Loved Statistics
Katharine Coman (1857–1915) was such a kind and caring
researcher that a famous book of poetry and prose was written
about her after her death from cancer at the age of 57. Her love
for statistics was demonstrated in her belief that the study of
economics could solve social problems and urged her college,
Wellesley, to let her teach economics and statistics. She may
have been the first woman statistics professor. Coman was a
prominent social activist in her life and in her writings, and she
frequently cited industrial and economic statistics to support her
positions, especially as they related to the labor movement and
the role of African American workers. The artistic biography
written about Professor Coman was Yellow Clover (1922), a
tribute to her by her longtime companion (and coauthor of the
song “America the Beautiful”), Katherine Lee Bates.
Using SPSS to Compute a Correlation Coefficient
Let’s use SPSS to compute a correlation coefficient. The data
set we are using is an SPSS data file named Chapter 5 Data Set
1.
There are two variables in this data set:
Variable
Definition
Income
Annual income in dollars
Education
Level of education measured in years
To compute the Pearson correlation coefficient, follow these
steps:
1. Open the file named Chapter 5 Data Set 1.
2. Click Analyze → Correlate → Bivariate, and you will see the
Bivariate Correlations dialog box, as shown in Figure 5.6.
3. Double-click on the variable named Income to move it to the
Variables: box.
4. Double-click on the variable named Education to move it to
the Variables: box. You can also hold down the Ctrl key to
select more than one variable at a time and then use the “move”
arrow in the center of the dialog box to move them both.
5. Click OK.
Figure 5.6 ⬢ The Bivariate Correlations dialog box
Understanding the SPSS Output
The output in Figure 5.7 shows the correlation coefficient to be
equal to .574. Also shown are the sample size, 20, and a
measure of the statistical significance of the correlation
coefficient (we’ll cover the topic of statistical significance
in Chapter 9).
Figure 5.7 ⬢ SPSS output for the computation of the correlation
coefficient
The SPSS output shows that the two variables are related to one
another and that as level of income increases, so does level of
education. Similarly, as level of income decreases, so does level
of education. The fact that the correlation is significant means
that this relationship is not due to chance.
As for the meaningfulness of the relationship, the coefficient of
determination is .5742 or .329 or .33, meaning that 33% of the
variance in one variable is accounted for by the other.
According to our eyeball strategy, this is a relatively weak
relationship. Once again, remember that low levels of income
do not cause low levels of education, nor does not finishing
high school mean that someone is destined to a life of low
income. That’s causality, not association, and correlations speak
only to association.
Creating a Scatterplot (or Scattergram or Whatever)
You can draw a scatterplot by hand, but it’s good to know how
to have SPSS do it for you as well. Let’s take the same data that
we just used to produce the correlation matrix in Figure 5.7 and
use it to create a scatterplot. Be sure that the data set
named Chapter 5 Data Set 1 is on your screen.
1. Click Graphs → Chart Builder → Scatter/Dot, and you will
see the Chart Builder dialog box shown in Figure 5.8.
2. Double-click on the first Scatter/Dot example.
3. Highlight and drag the variable named Income to the y-axis.
4. Highlight and drag the variable named Education to the x-
axis.
5. Click OK, and you’ll have a very nice, simple, and easy-to-
understand scatterplot like the one you see in Figure 5.9.
Figure 5.8 ⬢ The Chart Builder dialog box
OTHER COOL CORRELATIONS
There are different ways in which variables can be assessed. For
example, nominal-level variables are categorical in nature;
examples are race (e.g., black or white) and political affiliation
(e.g., Independent or Republican). Or, if you are measuring
income and age, you are measuring interval-level variables,
because the underlying continuum on which they are based has
equally appearing intervals. As you continue your studies,
you’re likely to come across correlations between data that
occur at different levels of measurement. And to compute these
correlations, you need some specialized techniques. Table
5.4 summarizes what these different techniques are and how
they differ from one another.
Table 5.4 ⬢ Correlation Coefficient Shopping, Anyone?
Level of Measurement and Examples
Variable X
Variable Y
Type of Correlation
Correlation Being Computed
Nominal (voting preference, such as Republican or Democrat)
Nominal (biological sex, such as male or female)
Phi coefficient
The correlation between voting preference and sex
Nominal (social class, such as high, medium, or low)
Ordinal (rank in high school graduating class)
Rank biserial coefficient
The correlation between social class and rank in high school
Nominal (family configuration, such as two-parent or single-
parent)
Interval (grade point average)
Point biserial
The correlation between family configuration and grade point
average
Ordinal (height converted to rank)
Ordinal (weight converted to rank)
Spearman rank coefficient
The correlation between height and weight
Interval (number of problems solved)
Interval (age in years)
Pearson correlation coefficient
The correlation between number of problems solved and age in
years
PARTING WAYS: A BIT ABOUT PARTIAL CORRELATION
Okay, now you have the basics about simple correlation, but
there are many other correlational techniques that are
specialized tools to use when exploring relationships between
variables.
A common “extra” tool is called partial correlation, where the
relationship between two variables is explored, but the impact
of a third variable is removed from the relationship between the
two. Sometimes that third variable is called a mediating or
a confounding variable.
For example, let’s say that we are exploring the relationship
between level of depression and incidence of chronic disease
and we find that, on the whole, the relationship is positive. In
other words, the more chronic disease is evident, the higher the
likelihood that depression is present as well (and of course vice
versa). Now remember, the relationship might not be causal, one
variable might not “cause” the other, and the presence of one
does not mean that the other will be present as well. The
positive correlation is just an assessment of an association
between these two variables, the key idea being that they share
some variance in common.
And that’s exactly the point—it’s the other variables they share
in common that we want to control and, in some cases, remove
from the relationship so we can focus on the key relationship we
are interested in.
For example, how about level of family support? Nutritional
habits? Severity or length of illness? These and many more
variables can all explain the relationship between these two
variables, or they may at least account for some of the variance.
And think back a bit. That’s exactly the same argument we
made when focusing on the relationship between the
consumption of ice cream and the level of crime. Once outside
temperature (the mediating or confounding variable) is removed
from the equation … boom! The relationship between the
consumption of ice cream and the crime level plummets. Let’s
take a look.
Here are some data on the consumption of ice cream and the
crime rate for 10 cities.
Consumption of Ice Cream
Crime Rate
Consumption of ice cream
1.00
.743
Crime rate
1.00
So, the correlation between these two variables, consumption of
ice cream and crime rate, is .743. This is a pretty healthy
relationship, accounting for about 50% of the variance between
the two variables (.7432 = .55 or 55%).
Now, we’ll add a third variable, average outside temperature.
Here are the Pearson correlation coefficients for the set of three
variables.
Consumption of Ice Cream
Crime Rate
Average Outside Temperature
Consumption of ice cream
1.00
.743
.704
Crime rate
1.00
.655
Average outside temperature
1.00
As you can see by these values, there’s a fairly strong
relationship between ice cream consumption and outside
temperature and between crime rate and outside temperature.
We’re interested in the question, “What’s the correlation
between ice cream consumption and crime rate with the effects
of outside temperature removed or partialed out?”
That’s what partial correlation does. It looks at the relationship
between two variables (in this case, consumption of ice cream
and crime rate) as it removes the influence of a third (in this
case, outside temperature).
A third variable that explains the relationship between two
variables can be a mediating variable or a confounding variable.
Those are different types of variables with different definitions,
though, and are easy to confuse. In our example with
correlations, a confounding variable is something like
temperature that affects both our variables of interest and
explains the correlation between them. A mediating variable is a
variable that comes between our two variables of interest and
explains the apparent relationship. For example, if A is
correlated with B and B is correlated with C, A and C would
seem to be related but only because they are both related to B.
B is a mediating variable. Perhaps A affects B and B affects C,
so A and C are correlated.
LIGHTBOARD LECTURE VIDEO
Partial Correlations
When we talk about correlations between two variables, we
almost always think about it by drawing these two overlapping
circles. And we have variable A variable B. And they
somehow are correlated. They overlap in some way. And they
share this information. Sometimes, though, there can be
another variable. Let's call it-- let's see, my computer brain
suggests C. That correlates with both of those. And when these
variables all correlate together, there's all these overlapping
areas. So A, B, and C all measure the same thing to some
degree. A and C all measure this same thing to some degree.
And B and C all measure this same thing to some degree.
Sometimes we want to know, though, what would the
relationship between A and B be if we controlled for C?
Would the correlation go up? Would it go down? And you
can even see visually, some of this correlation between A and
B is this part right here that is actually part of C as well. So
statistically, we call this correlation after controlling for
another a partial correlation. And I'm going to show you what
a partial correlation looks like. If we control for a variable, it
means we statistically remove it. It's as if it's not there. It's
as if everyone got And it creates now, between A and B, this
new relationship. And you can look-- literally, it is different.
It's a different type of correlation. It probably would go down,
mathematically. The other thing interesting-- when you
remove a relationship because of a partial correlation, is the
variables themselves are now a different shape because a little
bit of A used to be C. And a little bit of B used to be C. So
when you control for a third variable, you end up with a
different relationship and different variables. One way to
think about controlling for another variable is, instead of
thinking about variables, think about friends. We all have
different friends. Sometimes you're friends with someone
because you share some other friend. So imagine instead,
you've got three friends here-- 1, 2, and 3. You often see 1
and 2 together. But it's because they're with number 3.
They're with friend 3. What would happen if we remove 3, that
1 and 2 are never together unless 3's there? Let's remove all
the times where 3 is also there. Well, 1 and 2 spend a little bit
of time together. But they don't associate as closely. They're
not really close friends with each other. They just seem to be
friends because they both hang out with number 3. And
number 3 has the nice house and has all the fun parties. But if
it's just 1 and 2, they barely like each other.
Using SPSS to Compute Partial Correlations
Let’s use some data and SPSS to illustrate the computation of a
partial correlation. Here are the raw data.
City
Ice Cream Consumption
Crime Rate
Average Outside Temperature
1
3.4
62
88
2
5.4
98
89
3
6.7
76
65
4
2.3
45
44
5
5.3
94
89
6
4.4
88
62
7
5.1
90
91
8
2.1
68
33
9
3.2
76
46
10
2.2
35
41
1. Enter the data we are using into SPSS.
2. Click Analyze → Correlate → Partial and you will see the
Partial Correlations dialog box, as shown in Figure 5.10.
3. Move Ice_Cream and Crime_Rate to the Variables: box by
dragging them or double-clicking on each one.
4. Move the variable named Outside_Temp to the Controlling
for: box.
5. Click OK and you will see the SPSS output as shown
in Figure 5.11.
Figure 5.10 ⬢ The Partial Correlations dialog box
Understanding the SPSS Output
As you can see in Figure 5.11, the correlation between ice
cream consumption (Ice_Cream) and crime rate (Crime_Rate)
with the influence or moderation of outside temperature
(Outside_Temp) removed is .525. This is less than the simple
Pearson correlation between ice cream consumption and crime
rate (which is .743), which does not consider the influence of
outside temperature. What seemed to explain 55% of the
variance (and was what we call “significant at the .05 level”),
with the removal of Outside_Temp as a moderating variable,
now explains .5252 = 0.28 = 28% of the variance (and the
relationship is no longer significant).
Figure 5.11 ⬢ The completed partial correlation analysis
Our conclusion? Outside temperature accounted for enough of
the shared variance between the consumption of ice cream and
the crime rate for us to conclude that the two-variable
relationship was significant. But, with the removal of the
moderating or confounding variable outside temperature, the
relationship was no longer significant. And we don’t need to
stop selling ice cream to try to reduce crime.
Real-World Stats
This is a fun one and consistent with the increasing interest in
using statistics in various sports in various ways, a discipline
informally named sabermetrics. The term was coined by Bill
James (and his approach is represented in the movie and
book Moneyball).
Stephen Hall and his colleagues examined the link between
teams’ payrolls and the competitiveness of those teams (for both
professional baseball and soccer), and he was one of the first to
look at this from an empirical perspective. In other words, until
these data were published, most people made decisions based on
anecdotal evidence rather than quantitative assessments. Hall
looked at data on team payrolls in American Major League
Baseball and English soccer between 1980 and 2000, and he
used a model that allows for the establishment of causality (and
not just association) by looking at the time sequence of events
to examine the link.
In baseball, payroll and performance both increased
significantly in the 1990s, but there was no evidence that
causality runs in the direction from payroll to performance. In
comparison, for English soccer, the researchers did show that
higher payrolls actually were at least one cause of better
performance. Pretty cool, isn’t it, how association can be
explored to make real-world decisions?
Want to know more? Go online or to the library and find …
Hall, S., Szymanski, S., & Zimbalist, A. S. (2002). Testing
causality between team performance and payroll: The cases of
Major League Baseball and English soccer. Journal of Sports
Economics, 3, 149–168.
Summary
The idea of showing how things are related to one another and
what they have in common is a very powerful one, and the
correlation coefficient is a very useful descriptive statistic (one
used in inference as well, as we will show you later). Keep in
mind that correlations express a relationship that is associative
but not necessarily causal, and you’ll be able to understand how
this statistic gives us valuable information about relationships
between variables and how variables change or remain the same
in concert with others. Now it’s time to change speeds just a bit
and wrap up Part II with a focus on reliability and validity. You
need to know about these ideas because you’ll be learning how
to determine what differences in outcomes, such as scores and
other variables, represent.
Time to Practice
1. Use these data to answer Questions 1a and 1b. These data are
saved as Chapter 5 Data Set 2.
a. Compute the Pearson product-moment correlation coefficient
by hand and show all your work.
b. Construct a scatterplot for these 10 pairs of values by hand.
Based on the scatterplot, would you predict the correlation to be
direct or indirect? Why?
Number Correct (out of a possible 20)
Attitude (out of a possible 100)
17
94
13
73
12
59
15
80
16
93
14
85
16
66
16
79
18
77
19
91
2. Use these data to answer Questions 2a and 2b. These data are
saved as Chapter 5 Data Set 3.
Speed (to complete a 50-yard swim)
Strength (number of pounds bench-pressed)
21.6
135
23.4
213
26.5
243
25.5
167
20.8
120
19.5
134
20.9
209
18.7
176
29.8
156
28.7
177
a. Using either a calculator or a computer, compute the Pearson
correlation coefficient.
b. Interpret these data using the general range of very weak to
very strong. Also compute the coefficient of determination.
How does the subjective analysis compare with the value of r 2?
3. Rank the following correlation coefficients on strength of
their relationship (list the weakest first).
. .71
. +.36
. −.45
. .47
. −.62
· For the following set of scores, calculate the Pearson
correlation coefficient and interpret the outcome. These data are
saved as Chapter 5 Data Set 4.
Achievement Increase Over 12 Months
Classroom Budget Increase Over 12 Months
0.07
0.11
0.03
0.14
0.05
0.13
0.07
0.26
0.02
0.08
0.01
0.03
0.05
0.06
0.04
0.12
0.04
0.11
· For the following set of data, by hand, correlate minutes of
exercise with grade point average (GPA). What do you conclude
given your analysis? These data are saved as Chapter 5 Data Set
5.
Exercise
GPA
25
3.6
30
4.0
20
3.8
60
3.0
45
3.7
90
3.9
60
3.5
0
2.8
15
3.0
10
2.5
Use SPSS to determine the correlation between hours of
studying and GPA for these honor students. Why is the
correlation so low?
Hours of Studying
GPA
23
3.95
12
3.90
15
4.00
14
3.76
16
3.97
21
3.89
14
3.66
11
3.91
18
3.80
9
3.89
Time to Practice Video
Chapter 5: Problem 6
Chapter 5, Problem 6 asks you to compute a correlation. They
want to assess how honor students GPAs are correlated with
their hours of studying, and then to answer the question of
why that correlation is so low. So you see listed here are
individual student's GPAs and the average number of hours
that they study. We want to set up our SPSS data file with this
information. When we look here, you'll see the two variables.
And under Variable View, make sure that they're both set up
with scale, since they're both measured on a continuous range.
A GPA can go from can go from When we look here to do,
this we're going to go under Analyze and then Correlate By
Variate. We have two variables, so this would be a by variate
correlation. This is straightforward here. We're going to take
both of them, move it to our variables. You notice down here
it's saying we have the Pearson, which is for by variate
continuous data. We're going to look for a two-tailed
significance. Is it, how are they related to each other? We are
asking it to flag our significant correlations. You always want
to look under Options, but it defaults to what we want. We
could say we want to show the means and standard deviations,
we don't need to do that for this one. So instead, let's hit OK.
And here is our information. When we look at our
significance, it's a We need it to be lower than a And so, the
question is, why is this correlation of a so low? Well, if we
look back at our data itself, it's going to give us an
understanding. Of these are really high GPAs. When there's
very little variability, you're typically not going to get a strong
correlation because there's not much change. Even though
there seems to be a range of how they studied, in terms of
hours, it didn't have a big effect on their GPA because they're
all really high. And this is how you answer Chapter 5, Problem
6.
1. The coefficient of determination between two variables is
.64. Answer the following questions:
a. What is the Pearson correlation coefficient?
b. How strong is the relationship?
c. How much of the variance in the relationship between these
two variables is unaccounted for?
2. Here is a set of three variables for each of 20 participants in
a study on recovery from a head injury. Create a simple matrix
that shows the correlations between each variable. You can do
this by hand (and plan on being here for a while) or use SPSS or
any other application. These data are saved as Chapter 5 Data
Set 6.
Age at Injury
Level of Treatment
12-Month Treatment Score
25
1
78
16
2
66
8
2
78
23
3
89
31
4
87
19
4
90
15
4
98
31
5
76
21
1
56
26
1
72
24
5
84
25
5
87
36
4
69
45
4
87
16
4
88
23
1
92
31
2
97
53
2
69
11
3
79
33
2
69
3. Look at Table 5.4. What type of correlation coefficient would
you use to examine the relationship between biological sex
(defined in this study as having only two categories: male or
female) and political affiliation? How about family
configuration (two-parent or single-parent) and high school
GPA? Explain why you selected the answers you did.
4. When two variables are correlated (such as strength and
running speed), they are associated with one another. Explain
how, even if there is a correlation between the two, one might
not cause the other.
5. Provide three examples of an association between two
variables where a causal relationship makes perfect sense
conceptually.
6. Why can’t correlations be used as a tool to prove a causal
relationship between variables rather than just an association?
7. When would you use partial correlation?
Student Study Site
Get the tools you need to sharpen your study skills!
Visit edge.sagepub.com/salkindfrey7e to access practice
quizzes, eFlashcards, original and curated videos, data sets, and
more!6 AN INTRODUCTION TO UNDERSTANDING
RELIABILITY AND VALIDITY JUST THE TRUTH6: MEDIA
LIBRARY
Premium VideosLightboard Lecture Video
· Reliability
· ValidityTime to Practice Video
· Chapter 6: Problem 5
Difficulty Scale
(not so hard)WHAT YOU WILL LEARN IN THIS CHAPTER
· Defining reliability and validity and understanding why they
are important
· This is a stats class! What’s up with this measurement stuff?
· Understanding how the quality of tests is evaluated
· Computing and interpreting various types of reliability
coefficients
· Computing and interpreting various types of validity
coefficientsAN INTRODUCTION TO RELIABILITY AND
VALIDITY
Ask any parent, teacher, pediatrician, or almost anyone in your
neighborhood what the five top concerns are about today’s
children, and there is sure to be a group who identifies obesity
as one of those concerns. Sandy Slater and her colleagues
developed and tested the reliability and validity of a self-
reported questionnaire on home, school, and neighborhood
physical activity environments for youth located in low-income
urban minority neighborhoods and rural areas. In particular, the
researchers looked at such variables as information on the
presence of electronic and play equipment in youth participants’
bedrooms and homes and outdoor play equipment at schools.
They also looked at what people close to the children thought
about being active. A total of 205 parent–child pairs completed
a 160-item take-home survey on two different occasions, a
perfect model for establishing test–retest reliability. The
researchers found that the measure had good reliability and
validity. The researchers hoped that this survey could be used to
help identify opportunities and develop strategies to encourage
underserved youth to be more physically active.
Want to know more? Go online or to the library and find …
Slater, S., Full, K., Fitzgibbon, M., & Uskali, A. (2015, June 4).
Test–retest reliability and validity results of the Youth Physical
Activity Supports Questionnaire. SAGE Open, 5(2).
doi:10.1177/2158244015586809
What’s Up With This Measurement Stuff?
An excellent question and one that you should be asking. After
all, you enrolled in a stats class, and up to now, that’s been the
focus of the material that has been covered. Now it looks like
you’re faced with a topic that belongs in a tests and
measurements class. So, what’s this material doing in a stats
book?
Well, much of what we have covered so far in Statistics for
People Who (Think They) Hate Statistics has to do with the
collection and description of data. Now we are about to begin
the journey toward analyzing and interpreting data. But before
we begin learning those skills, we want to make sure that the
data are what you think they are—that the data represent what it
is you want to know about. In other words, if you’re studying
poverty, you want to make sure that the measure you use to
assess poverty works and that it works time after time. Or, if
you are studying aggression in middle-aged males, you want to
make sure that whatever tool you use to assess aggression works
and that it works time after time.
More really good news: Should you continue in your education
and want to take a class on tests and measurements, this
introductory chapter will give you a real jump on understanding
the scope of the area and what topics you’ll be studying.
And to make sure that the entire process of collecting data and
making sense out of them works, you first have to make sure
that what you use to collect data works as well. The
fundamental questions that are answered in this chapter are
“How do I know that the test, scale, instrument, and so on. I use
produces scores that aren’t random but actually represent an
individual’s typical performance?” (that’s reliability) and “How
do I know that the test, scale, instrument, and so on. I use
measures what it is supposed to?” (that’s validity).
Anyone who does research will tell you about the importance of
establishing the reliability and validity of your measurement
tool, whether it’s a simple observational instrument of consumer
behavior or one that measures a complex psychological
construct such as depression. However, there’s another very
good reason. If the tools that you use to collect data are
unreliable or invalid, then the results of any test or any
hypothesis, and the conclusions you may reach based on those
results, are necessarily inconclusive. If you are not sure that the
test does what it is supposed to and that it does so consistently
without randomness in its scores, how do you know that the
nonsignificant results you got aren’t a function of the lousy test
tools rather than an actual reflection of reality? Want a clean
test of your hypothesis? Make reliability and validity an
important part of your research.
You may have noticed a new term at the beginning of this
chapter—dependent variable. In an experiment, this is the
outcome variable, or what the researcher looks at to see whether
any change has occurred as a function of the treatment that has
taken place. And guess what? The treatment has a name as
well—the independent variable. For example, if a researcher
examined the effect of different reading programs on
comprehension, the independent variable would be the reading
program, and the dependent or outcome variable would be
reading comprehension score. The term dependent variable is
used for the outcome variable because the hypothesis suggests
that it depends on, or is affected by, the independent variable.
Although these terms will not be used often throughout the
remainder of this book, you should have some familiarity with
them.RELIABILITY: DOING IT AGAIN UNTIL YOU GET IT
RIGHT
Reliability is pretty easy to understand and figure out. It’s
simply whether a test, or whatever you use as a measurement
tool, measures something consistently. If you administer a test
of personality type before a special treatment occurs, will the
administration of that same test 4 months later be reliable?
That, my friend, is one type of reliability—the degree to which
scores are consistent for one person measured twice. There are
other types of reliability, each of which we will get to after we
define reliability just a bit more.
Test Scores: Truth or Dare?
When you take a test in this class, you get a score, such as 89
(good for you) or 65 (back to the books!). That test score
consists of several elements, including the observed score (or
what you actually get on the test, such as 89 or 65) and a true
score (the typical score you would get if you took the same test
an infinite number of times). We can’t directly measure true
score (because we don’t have the time or energy to give
someone the same test an infinite number of times), but we can
estimate it.
Why aren’t true scores and observed scores the same? Well,
they can be if the test (and the accompanying observed score) is
a perfect (and we mean absolutely perfect) reflection of what’s
being measured.
But the bread sometimes falls on the buttered side, and
Murphy’s law tells us that the world is not perfect. So, what you
see as an observed score may come close to the true score, but
rarely are they the same. Rather, the difference as you see is the
amount of error that is introduced.
Notice that reliability is not the same as validity; it does not
reflect whether you are measuring what you want to. Here’s
why. True score has nothing to do with whether the construct of
interest is really being reflected. Rather, true score is
the mean score an individual would get if he or she took a test
an infinite number of times, and it represents the theoretical
typical level of performance on a given test. Now, one would
hope that the typical level of performance would reflect the
construct of interest, but that’s another question (a question of
validity). The distinction here is that a test is reliable if it
consistently produces whatever score a person would get on
average, regardless of what the test is measuring. In fact, a
perfectly reliable test might not produce a score that has
anything to do with the construct of interest, such as “what you
really know.”LIGHTBOARD LECTURE VIDEOReliability
So one of the qualities of a good test is that it's reliable. And
a lot of times when you learn stuff, you hear about validity
and reliability. And you think it's all the same thing. Well,
validity is whether the score matches the thing that supposed
to be measured. We're not talking about that. Reliability is
different. Reliability refers to whether the score you just got
on a test is the typical score you would have gotten on that
test, not whether it matches the level of the trait you're trying
to measure. That's validity. But if you took the test again
tomorrow, would you get the same score if you took it
yesterday? If you'd had a better breakfast, would your score
be any different? That's what reliability is all about. And we
can think about reliability as the purpose of a test is for you
to get the score that represents your typical level of
performance. If you took a test an infinite number of times
and you averaged all those scores-- you alone-- that average is
the typical level of performance you would get. So every
time someone takes a test, when you take a test, you get some
score on it. And if this middle bullseye is your typical level
of performance, the test score you would get if you took the
test infinite number of times and averaged it, that's the score
you typically get. And if you get that score, then it's a reliable
test. And if that's true for everybody, it's a reliable test. But
there's some randomness. There's always randomness. That's
the problem with social sciences. There's all this randomness
in human behavior. And so the score you get won't be your
exact typical level of performance. It will not be right here
on the bullseye. It might be out here somewhere. It might be
further out here. I mean, the further away it is from your
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx
Chapter 4 Problem 31. For problem three in chapter four,   a teac.docx

More Related Content

Similar to Chapter 4 Problem 31. For problem three in chapter four, a teac.docx

Rate of change and graphing lesson plan
Rate of change and graphing lesson planRate of change and graphing lesson plan
Rate of change and graphing lesson plansgrif915
 
4.5 collecting and analyzing data
4.5   collecting and analyzing data4.5   collecting and analyzing data
4.5 collecting and analyzing dataLeizelAlvarez1
 
analysing_data_using_spss.pdf
analysing_data_using_spss.pdfanalysing_data_using_spss.pdf
analysing_data_using_spss.pdfDrAnilKannur1
 
Analysis Of Data Using SPSS
Analysis Of Data Using SPSSAnalysis Of Data Using SPSS
Analysis Of Data Using SPSSBrittany Brown
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptGurumurthy B R
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptMohamedAmir79
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptsadafshahbaz7777
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptDrMuazzamArshadAssis
 
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docxPrint CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docxstilliegeorgiana
 
How to Organize and Interpret Information with Venn Diagrams Using Paint
How to Organize and Interpret Information with Venn Diagrams Using PaintHow to Organize and Interpret Information with Venn Diagrams Using Paint
How to Organize and Interpret Information with Venn Diagrams Using PaintHolly Aesquivel
 
Extension presentation
Extension presentationExtension presentation
Extension presentationalkapant123
 
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxDesign PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxcarolinef5
 
3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docxlorainedeserre
 
Relationship between Linear Algebra and StatisticsLinear algebra.docx
Relationship between Linear Algebra and StatisticsLinear algebra.docxRelationship between Linear Algebra and StatisticsLinear algebra.docx
Relationship between Linear Algebra and StatisticsLinear algebra.docxdebishakespeare
 

Similar to Chapter 4 Problem 31. For problem three in chapter four, a teac.docx (19)

Statistics for ess
Statistics for essStatistics for ess
Statistics for ess
 
Rate of change and graphing lesson plan
Rate of change and graphing lesson planRate of change and graphing lesson plan
Rate of change and graphing lesson plan
 
4.5 collecting and analyzing data
4.5   collecting and analyzing data4.5   collecting and analyzing data
4.5 collecting and analyzing data
 
analysing_data_using_spss.pdf
analysing_data_using_spss.pdfanalysing_data_using_spss.pdf
analysing_data_using_spss.pdf
 
analysing_data_using_spss.pdf
analysing_data_using_spss.pdfanalysing_data_using_spss.pdf
analysing_data_using_spss.pdf
 
Analysis Of Data Using SPSS
Analysis Of Data Using SPSSAnalysis Of Data Using SPSS
Analysis Of Data Using SPSS
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.ppt
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.ppt
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.ppt
 
Introduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.pptIntroduction to Statistics - Chapter 3-5 Notes.ppt
Introduction to Statistics - Chapter 3-5 Notes.ppt
 
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docxPrint CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
 
EXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdfEXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdf
 
How to Organize and Interpret Information with Venn Diagrams Using Paint
How to Organize and Interpret Information with Venn Diagrams Using PaintHow to Organize and Interpret Information with Venn Diagrams Using Paint
How to Organize and Interpret Information with Venn Diagrams Using Paint
 
Extension presentation
Extension presentationExtension presentation
Extension presentation
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docxDesign PatternsChristian Behrenshttpswww.behance.netgall.docx
Design PatternsChristian Behrenshttpswww.behance.netgall.docx
 
3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx
 
GRAPH AND CHARTS
GRAPH AND CHARTSGRAPH AND CHARTS
GRAPH AND CHARTS
 
Relationship between Linear Algebra and StatisticsLinear algebra.docx
Relationship between Linear Algebra and StatisticsLinear algebra.docxRelationship between Linear Algebra and StatisticsLinear algebra.docx
Relationship between Linear Algebra and StatisticsLinear algebra.docx
 

More from robertad6

Chapter 8 Religion and the Restoration of Health in Afric.docx
Chapter 8  Religion and the Restoration of Health in Afric.docxChapter 8  Religion and the Restoration of Health in Afric.docx
Chapter 8 Religion and the Restoration of Health in Afric.docxrobertad6
 
Chapter 8 - Children of alcoholics often display characteristic tr.docx
Chapter 8 - Children of alcoholics often display characteristic tr.docxChapter 8 - Children of alcoholics often display characteristic tr.docx
Chapter 8 - Children of alcoholics often display characteristic tr.docxrobertad6
 
Chapter 8 - Review the Siemens AG case study.  Note the importan.docx
Chapter 8 - Review the Siemens AG case study.  Note the importan.docxChapter 8 - Review the Siemens AG case study.  Note the importan.docx
Chapter 8 - Review the Siemens AG case study.  Note the importan.docxrobertad6
 
Chapter 7Victimology and Patterns of VictimizationThe a.docx
Chapter 7Victimology and Patterns of VictimizationThe a.docxChapter 7Victimology and Patterns of VictimizationThe a.docx
Chapter 7Victimology and Patterns of VictimizationThe a.docxrobertad6
 
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docx
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docxChapter 7Thinking and IntelligenceFigure 7.1 Thinking .docx
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docxrobertad6
 
Chapter 7 2. How does a false positive alarm differ from a f.docx
Chapter 7 2. How does a false positive alarm differ from a f.docxChapter 7 2. How does a false positive alarm differ from a f.docx
Chapter 7 2. How does a false positive alarm differ from a f.docxrobertad6
 
Chapter 7 covers Corporate Information Security and Privacy Regu.docx
Chapter 7 covers Corporate Information Security and Privacy Regu.docxChapter 7 covers Corporate Information Security and Privacy Regu.docx
Chapter 7 covers Corporate Information Security and Privacy Regu.docxrobertad6
 
Chapter 7The Problem of EvilOf all the objections to theism pr.docx
Chapter 7The Problem of EvilOf all the objections to theism pr.docxChapter 7The Problem of EvilOf all the objections to theism pr.docx
Chapter 7The Problem of EvilOf all the objections to theism pr.docxrobertad6
 
Chapter 7Social Networking,Engagement, andSocial Metrics.docx
Chapter 7Social Networking,Engagement, andSocial Metrics.docxChapter 7Social Networking,Engagement, andSocial Metrics.docx
Chapter 7Social Networking,Engagement, andSocial Metrics.docxrobertad6
 
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docx
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docxCHAPTER 7The CPU and MemoryThe Architecture of Comp.docx
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docxrobertad6
 
Chapter 7QUESTION 1. Which of the following is defin.docx
Chapter 7QUESTION 1. Which of the following is defin.docxChapter 7QUESTION 1. Which of the following is defin.docx
Chapter 7QUESTION 1. Which of the following is defin.docxrobertad6
 
Chapter 7One pageAPA stylePlease comment on the followin.docx
Chapter 7One pageAPA stylePlease comment on the followin.docxChapter 7One pageAPA stylePlease comment on the followin.docx
Chapter 7One pageAPA stylePlease comment on the followin.docxrobertad6
 
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docx
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docxCHAPTER 7Managing Financial OperationsRevenue cycle (bil.docx
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docxrobertad6
 
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docx
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docxCHAPTER 7Primate BehaviorWhat is Meant By Behavior.docx
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docxrobertad6
 
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docx
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docxChapter 7Medical Care Production and Costs(c) 2012 Cengage.docx
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docxrobertad6
 
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docx
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docxChapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docx
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docxrobertad6
 
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docx
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docxCHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docx
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docxrobertad6
 
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docx
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docxCHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docx
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docxrobertad6
 
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docx
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docxChapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docx
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docxrobertad6
 
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docx
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docxChapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docx
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docxrobertad6
 

More from robertad6 (20)

Chapter 8 Religion and the Restoration of Health in Afric.docx
Chapter 8  Religion and the Restoration of Health in Afric.docxChapter 8  Religion and the Restoration of Health in Afric.docx
Chapter 8 Religion and the Restoration of Health in Afric.docx
 
Chapter 8 - Children of alcoholics often display characteristic tr.docx
Chapter 8 - Children of alcoholics often display characteristic tr.docxChapter 8 - Children of alcoholics often display characteristic tr.docx
Chapter 8 - Children of alcoholics often display characteristic tr.docx
 
Chapter 8 - Review the Siemens AG case study.  Note the importan.docx
Chapter 8 - Review the Siemens AG case study.  Note the importan.docxChapter 8 - Review the Siemens AG case study.  Note the importan.docx
Chapter 8 - Review the Siemens AG case study.  Note the importan.docx
 
Chapter 7Victimology and Patterns of VictimizationThe a.docx
Chapter 7Victimology and Patterns of VictimizationThe a.docxChapter 7Victimology and Patterns of VictimizationThe a.docx
Chapter 7Victimology and Patterns of VictimizationThe a.docx
 
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docx
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docxChapter 7Thinking and IntelligenceFigure 7.1 Thinking .docx
Chapter 7Thinking and IntelligenceFigure 7.1 Thinking .docx
 
Chapter 7 2. How does a false positive alarm differ from a f.docx
Chapter 7 2. How does a false positive alarm differ from a f.docxChapter 7 2. How does a false positive alarm differ from a f.docx
Chapter 7 2. How does a false positive alarm differ from a f.docx
 
Chapter 7 covers Corporate Information Security and Privacy Regu.docx
Chapter 7 covers Corporate Information Security and Privacy Regu.docxChapter 7 covers Corporate Information Security and Privacy Regu.docx
Chapter 7 covers Corporate Information Security and Privacy Regu.docx
 
Chapter 7The Problem of EvilOf all the objections to theism pr.docx
Chapter 7The Problem of EvilOf all the objections to theism pr.docxChapter 7The Problem of EvilOf all the objections to theism pr.docx
Chapter 7The Problem of EvilOf all the objections to theism pr.docx
 
Chapter 7Social Networking,Engagement, andSocial Metrics.docx
Chapter 7Social Networking,Engagement, andSocial Metrics.docxChapter 7Social Networking,Engagement, andSocial Metrics.docx
Chapter 7Social Networking,Engagement, andSocial Metrics.docx
 
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docx
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docxCHAPTER 7The CPU and MemoryThe Architecture of Comp.docx
CHAPTER 7The CPU and MemoryThe Architecture of Comp.docx
 
Chapter 7QUESTION 1. Which of the following is defin.docx
Chapter 7QUESTION 1. Which of the following is defin.docxChapter 7QUESTION 1. Which of the following is defin.docx
Chapter 7QUESTION 1. Which of the following is defin.docx
 
Chapter 7One pageAPA stylePlease comment on the followin.docx
Chapter 7One pageAPA stylePlease comment on the followin.docxChapter 7One pageAPA stylePlease comment on the followin.docx
Chapter 7One pageAPA stylePlease comment on the followin.docx
 
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docx
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docxCHAPTER 7Managing Financial OperationsRevenue cycle (bil.docx
CHAPTER 7Managing Financial OperationsRevenue cycle (bil.docx
 
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docx
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docxCHAPTER 7Primate BehaviorWhat is Meant By Behavior.docx
CHAPTER 7Primate BehaviorWhat is Meant By Behavior.docx
 
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docx
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docxChapter 7Medical Care Production and Costs(c) 2012 Cengage.docx
Chapter 7Medical Care Production and Costs(c) 2012 Cengage.docx
 
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docx
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docxChapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docx
Chapter 7Evaluating HRD ProgramsWerner© 2017 Cengage Learn.docx
 
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docx
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docxCHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docx
CHAPTER 7INTEREST RATES AND BOND VALUATIONCopyright © 2019 M.docx
 
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docx
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docxCHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docx
CHAPTER 7FriendshipTHE NATURE OF FRIENDSHIPFRIENDSHIP ACROSS T.docx
 
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docx
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docxChapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docx
Chapter 7Lean Thinking and Lean SystemsMcGraw-Hill Ed.docx
 
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docx
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docxChapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docx
Chapter 7Food, Nutrition & Obesity PolicyEyler, Chriqui, M.docx
 

Recently uploaded

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Chapter 4 Problem 31. For problem three in chapter four, a teac.docx

  • 1. Chapter 4: Problem 3 1. For problem three in chapter four, a teacher wants to display her students number of responses for each day of the week. And she wants to do that with a bar chart. Since she hasn't taken a stats class, she comes to you for help. You first enter her data into SPSS and the results look like this-- When you look at your data set, you'll see that it actually has the wrong level of measurement. Notice that there's a little Venn diagram at the top of each column, which indicates that your data has been entered as nominal. That would be correct if you were noting which day of the week a student participated, but since you're noting how often a given student participated, the correct level of measurement is a scale. Go ahead and change that. Watch how I do that. Under variable view, under measure, you just want to click each one and turn it into a scale. You can also cut and paste these, and I can show you that in another video. Once you have them changed, go back to data view, and you'll see that at the top it has changed in two little rulers. The next question is, how do I get SPSS to display the average score per day rather the total number of individual scores, which might look like a mess, and it's why this question is a toughie. To do that we go under graphs, and you'll see that you have two options, you can do a Chart Builder or a Legacy Dialog. For this question we want to use the Legacy Dialog. We go to Bar and when we click that, there are two questions-- one, what type of bar chart? We want a simple one. And then, how do you want the data in their area displayed? Do we want to summarize for the groups? We really don't. We want summary of separate variables where each day of the week is a variable. We click on Define and then here you'll see every day of the week. You want to bring that over and you see your bar charts are going to represent the mean for every day of the week. As a good habit you want to make sure you title it, I called it "Students' Engagement During Group Discussion." The second one is by day of
  • 2. week. We hit Continue, and then when we hit OK, you're going to see your output pop up. And here is our bar chart-- every day of the week showing the average student engagement. And this is how you answer problem 3 in chapter 4. Good luck. 2. Identify whether these distributions are negatively skewed, positively skewed, or not skewed at all and explain why you describe them that way. a. This talented group of athletes scored very high on the vertical jump task. b. On this incredibly crummy test, everyone received the same score. c. On the most difficult spelling test of the year, the third graders wept as the scores were delivered and then their parents complained. 3. Use the data available as Chapter 4 Data Set 3 on pie preference to create a pie chart ☺ using SPSS. 4. For each of the following, indicate whether you would use a pie, line, or bar chart and why. a. The proportion of freshmen, sophomores, juniors, and seniors in a particular university b. Change in temperature over a 24-hour period c. Number of applicants for four different jobs d. Percentage of test takers who passed e. Number of people in each of 10 categories 5. Provide an example of when you might use each of the following types of charts. For example, you would use a pie chart to show the proportion of children who receive a reduced- price lunch that are in Grades 1 through 6. When you are done, draw the fictitious chart by hand. a. Line b. Bar c. Scatter/dot (extra credit) 6. Go to the library or online and find a journal article in your area of interest that contains empirical data but does not contain any visual representation of them. Use the data to create a chart. Be sure to specify what type of chart you are creating and why
  • 3. you chose the one you did. You can create the chart manually or using SPSS or Excel. 7. Create the worst-looking chart that you can, crowded with chart and font junk. Nothing makes as lasting an impression as a bad example. 8. And, finally, what is the purpose of a chart or graph? 4 CREATING GRAPHS A PICTURE REALLY IS WORTH A THOUSAND WORDS 4: MEDIA LIBRARY Premium Videos Core Concepts in Stats Video · Examining Data: Tables and Figures Lightboard Lecture Video · Creating a Simple Chart Time to Practice Video · Chapter 4: Problem 3 Difficulty Scale (moderately easy but not a cinch) WHAT YOU WILL LEARN IN THIS CHAPTER · Understanding why a picture is really worth a thousand words · Creating a histogram and a polygon · Understanding the different shapes of different distributions · Using SPSS to create incredibly cool charts · Creating different types of charts and understanding their application and uses WHY ILLUSTRATE DATA? In the previous two chapters, you learned about the two most important types of descriptive statistics—measures of central tendency and measures of variability. Both of these provide you with the one best number for describing a group of data (central tendency) and a number reflecting how diverse, or different, scores are from one another (variability). What we did not do, and what we will do here, is examine how differences in these two measures result in different-looking
  • 4. distributions. Numbers alone (such as M = 3 and s = 3) may be important, but a visual representation is a much more effective way of examining the characteristics of a distribution as well as the characteristics of any set of data. So, in this chapter, we’ll learn how to visually represent a distribution of scores as well as how to use different types of graphs to represent different types of data. CORE CONCEPTS IN STATS VIDEO Examining Data: Tables and Figures X-TIMESTAMP-MAP=LOCAL: Examining data helps find data entry errors, evaluate research methodology, identify outliers, and determine the shape of a distribution in a data set. Researchers typically examine collected data in two ways, by creating tables and figures. Imagine you asked a group of friends to rate a movie they've seen on a one to five scale. A table helps identify the variable and the possible values of the variable. The sample size, often referred to as n, is 14 because there are ratings reported from 14 people. This is how large the total sample is. From this, we can determine how many in the sample have each value of the variable. We can also determine the percentage that the sample has of each possible value. Figures display variables from the table. Nominal and ordinal variables can be depicted with bar charts, while interval and ratio variables can be depicted using histograms and frequency polygons. For this data set, we can use a bar chart. Distributions of data can be characterized along three aspects or dimensions, modality, symmetry, and variability. In a unimodal distribution, a small range of values has the greatest frequency or mode of the set. However, it's possible for a distribution to have more than one mode. For a bimodal distribution, we see two values that seem to occur with the greatest frequency. A distribution is symmetrical if folding it in half makes each half mirror the other. When a distribution isn't symmetrical, it's called asymmetrical or skewed. This distribution is often called positively skewed because the tail-- or narrow end-- of the distribution is on the
  • 5. right end. Variability in a distribution is the amount of spread or dispersion of values for a variable. Peak distributions look like a tall mountain and reflect little variability in the scores. This means that almost everyone has given the same score to the movie. A flat distribution has a lot of variability, where almost everyone's scores are different. In between is a normal distribution. It stands apart from the distributions that are neither peak nor flat, unimodal, or symmetrical. TEN WAYS TO A GREAT FIGURE (EAT LESS AND EXERCISE MORE?) Whether you create illustrations by hand or use a computer program, the same principles of decent design apply. Here are 10 to copy and put above your desk: 1. Minimize chart or graph junk. “Chart junk” (a close cousin to “word junk”) happens when you use every function, every graph, and every feature a computer program has to make your charts busy, full, and uninformative. With graphs, more is definitely less. 2. Plan out your chart before you start creating the final copy. Use graph paper even if you will be using a computer program to generate the graph. Actually, why not just use your computer to generate and print out graph paper (try www.printfreegraphpaper.com). 3. Say what you mean and mean what you say—no more and no less. There’s nothing worse than a cluttered (with too much text and fancy features) graph to confuse the reader. 4. Label everything so nothing is left to the misunderstanding of the audience. 5. A graph should communicate only one idea—a description of data or a demonstration of a relationship. 6. Keep things balanced. When you construct a graph, center titles and labels. 7. Maintain the scale in a graph. “Scale” refers to the proportional relationship between the horizontal and vertical axes. This ratio should be about 3 to 4, so a graph that is 3
  • 6. inches wide will be about 4 inches tall. 8. Simple is best and less is more. Keep the chart simple but not simplistic. Convey one idea as straightforwardly as possible, with distracting information saved for the accompanying text. Remember, a chart or graph should be able to stand alone, and the reader should be able to understand the message. 9. Limit the number of words you use. Too many words, or words that are too large (both in terms of physical size and idea-wise), can detract from the visual message your chart should convey. 10. A chart alone should convey what you want to say. If it doesn’t, go back to your plan and try it again. FIRST THINGS FIRST: CREATING A FREQUENCY DISTRIBUTION The most basic way to illustrate data is through the creation of a frequency distribution. A frequency distribution is a method of tallying and representing how often certain scores occur. In the creation of a frequency distribution, scores are usually grouped into class intervals, or ranges of numbers. Here are 50 scores on a test of reading comprehension on which a frequency distribution is based: 47 10 31 25 20 2 11 31 25 21 44 14 15 26 21
  • 8. range of scores, there are associated frequency counts. Class Interval Frequency 45–49 1 40–44 2 35–39 4 30–34 8 25–29 10 20–24 10 15–19 8 10–14 4 5–9 2 0–4 1 People Who Loved Statistics Helen M. Walker (1891–1983) began her college career studying philosophy and then became a high school math teacher. She got her master’s degree, taught mathematics at the University of Kansas (your authors’ favorite college) where she was tenured, and then studied the history of statistics (at least up to 1929, when she wrote her doctoral dissertation at Columbia). Dr. Walker’s greatest interest was in the teaching of statistics, and many years after her death, a scholarship was endowed in her name at Columbia for students who want to teach statistics! Her publications included a whole book teaching about the best way to show statistics using tables. Oh, and along the way, she became the first woman president of the
  • 9. American Statistical Association. All this achievement from someone who actually loved teaching statistics. Just like your professor! The Classiest of Intervals As you can see from the above table, a class interval is a range of numbers, and the first step in the creation of a frequency distribution is to define how large each interval will be. As you can see in the frequency distribution that we created, each interval spans five possible scores, such as 5–9 (which includes scores 5, 6, 7, 8, and 9) and 40–44 (which includes scores 40, 41, 42, 43, and 44). How did we decide to have an interval that includes only five scores? Why not five intervals, each consisting of 10 scores? Or two intervals, each consisting of 25 scores? Here are some general rules to follow in the creation of a class interval, regardless of the size of values in the data set you are dealing with: 1. Select a class interval that has a range of 2, 5, 10, 15, or 20 data points. In our example, we chose 5. 2. Select a class interval so that 10 to 20 such intervals cover the entire range of data. A convenient way to do this is to compute the range and then divide by a number that represents the number of intervals you want to use (between 10 and 20). In our example, there are 50 scores, and we wanted 10 intervals: 50/10 = 5, which is the size of each class interval. If you had a set of scores ranging from 100 to 400, you could start with an estimate of 20 intervals and see if the interval range makes sense for your data: 300/20 = 15, so 15 would be the class interval. 3. Begin listing the class interval with a multiple of that interval. In our frequency distribution of reading comprehension test scores, the class interval is 5, and we started the lowest class interval at 0. 4. Finally, the interval made up of the largest scores goes at the top of the frequency distribution. There are some simple steps for creating class intervals on the
  • 10. way to creating a frequency distribution. Here are six general rules: 1. Determine the range. 2. Decide on the number of class intervals. 3. Decide on the size of the class interval. 4. Decide the starting point for the first class. 5. Create the class intervals. 6. Put the data into the class intervals. Once class intervals are created, it’s time to complete the frequency part of the frequency distribution. That’s simply counting the number of times a score occurs in the raw data and entering that number in each of the class intervals represented by the count. In the frequency distribution that we created for our reading comprehension data, the number of scores that occur between 30 and 34 and thus are in the 30–34 class interval is 8. So, an 8 goes in the column marked Frequency. There’s your frequency distribution. As you might realize, it is easier to do this counting if you have your scores listed in order. Sometimes it is a good idea to graph your data first and then do whatever calculations or analysis is called for. By first looking at the data, you may gain insight into the relationship between variables, what kind of descriptive statistic is the right one to use to describe the data, and so on. This extra step might increase your insights and the value of what you are doing. LIGHTBOARD LECTURE VIDEO Creating a Simple Chart So statisticians and professor types use graphs and charts all the time. Let's take a second to figure out what's in a chart, like, what are the pieces that puts this all together. Here's some scores. See here, different scores people could get, and here's the number of people that got those scores. Very typical data that people like to graph all the time. And when you make a graph, you usually have these two lines, two dimensions. And along the bottom, you often put the actual scores. So here are the actual scores are 8, 9, 1 And along the
  • 11. top, it's very common to put something that shows how many people got that score or the frequency. OK. So, for instance, the 8 here, one person got that. And the 9, three people got that. One, two, three. And the 1 Six people got that. I don't know if we have room for that. That's up here somewhere. And then for the 11, two people got to that. That'd be about there. And the 12, only one person got that. So these dots represent the different people. It's sort of hard to look at those dots, so instead we make these bars. And the taller the bar, you can see, then the more people got it. So, like, the 1 that's way up here around a 6. And only one person got an 8, so that's down here around a 1. But this, now, is a picture of what your scores look like. And if you look at just the tops of these bars, you can see, like, where we get a normal curve. THE PLOT THICKENS: CREATING A HISTOGRAM Now that we’ve got a tally of how many scores fall in what class intervals, we’ll go to the next step and create what is called a histogram, a visual representation of the frequency distribution where the frequencies are represented by bars. Depending on the book or journal article or report you read and the software you use, visual representations of data are called graphs (such as in SPSS) or charts (such as in the Microsoft spreadsheet Excel). It really makes no difference. All you need to know is that a graph or a chart is the visual representation of data. To create a histogram, do the following: 1. Using a piece of graph paper, place values at equal distances along the x-axis, as shown in Figure 4.1. Now, identify the midpoint of each class interval, which is the middle point in the interval. It’s pretty easy to just eyeball, but you can also just add the top and bottom values of the class interval and divide by 2. For example, the midpoint of the class interval 0–4 is the average of 0 and 4, or 4/2 = 2. 2. Draw a bar or column centered on each midpoint that represents the entire class interval to the height representing the
  • 12. frequency of that class interval. For example, in Figure 4.2, you can see that in our first entry, the class interval of 0–4 is represented by the frequency of 1 (representing the one time a value between 0 and 4 occurs). Continue drawing bars or columns until each of the frequencies for each of the class intervals is represented. Figure 4.2 is a nice hand-drawn (really!) histogram for the frequency distribution of the 50 scores that we have been working with so far. Notice that each class interval is represented by a range of scores along the x-axis. Figure 4.1 ⬢ Class intervals along the x-axis The Tallyho Method You can see by the simple frequency distribution at the beginning of the chapter that you already know more about the distribution of scores than you’d learn from just a simple listing of them. You have a good idea of what values occur with what frequency. But another visual representation (besides a histogram) can be done by using tallies for each of the occurrences, as shown in Figure 4.3. Figure 4.2 ⬢ A hand-drawn histogram Figure 4.3 ⬢ Tallying scores We used tallies that correspond with the frequency of scores that occur within a certain class. This gives you an even better visual representation of how often certain scores occur relative to other scores.THE NEXT STEP: A FREQUENCY POLYGON Creating a histogram or a tally of scores wasn’t so difficult, and the next step (and the next way of illustrating data) is even easier. We’re going to use the same data—and, in fact, the histogram that you just saw created—to create a frequency polygon. (Polygon is a word for shape.) A frequency polygon is a continuous line that represents the frequencies of scores within a class interval, as shown in Figure 4.4.
  • 13. Figure 4.4 ⬢ A hand-drawn frequency polygon How did we draw this? Here’s how: 1. Place a midpoint at the top of each bar or column in a histogram (see Figure 4.2). 2. Connect the lines and you’ve got it—a frequency polygon! Note that in Figure 4.4, the histogram on which the frequency polygon is based is drawn using vertical and horizontal lines, and the polygon is drawn using curved lines. That’s because, although we want you to see what a frequency polygon is based on, you usually don’t see the underlying histogram. Why use a frequency polygon rather than a histogram to represent data? For two reasons. Visually, a frequency polygon appears more dynamic than a histogram (a line that represents change in frequency always looks neat). Also, the use of a continuous line suggests that the variable represented by the scores along the x-axis is also a theoretically continuous, interval-level measurement as we talked about in Chapter 2. (To purists, the fact that the bars touch each other in a histogram suggests the interval-level nature of the variable, as well.) Cumulating Frequencies Once you have created a frequency distribution and have visually represented those data using a histogram or a frequency polygon, another option is to create a visual representation of the cumulative frequency of occurrences by class intervals. This is called a cumulative frequency distribution. A cumulative frequency distribution is based on the same data as a frequency distribution but with an added column (Cumulative Frequency), as shown below. Class Interval Frequency Cumulative Frequency 45–49 1 50 40–44
  • 14. 2 49 35–39 4 47 30–34 8 43 25–29 10 35 20–24 10 25 15–19 8 15 10–14 4 7 5–9 2 3 0–4 1 1 The cumulative frequency distribution begins with the creation of a new column labeled “Cumulative Frequency.” Then, we add the frequency in a class interval to all the frequencies below it. For example, for the class interval of 0–4, there is 1 occurrence and none below it, so the cumulative frequency is 1. For the class interval of 5–9, there are 2 occurrences in that class interval and one below it for a total of 3 (2 + 1) occurrences. The last class interval (45–49) contains 1 occurrence, and there are now a total of 50 occurrences at or below that class interval. Once we create the cumulative frequency distribution, then we
  • 15. can plot it as a histogram or a frequency polygon. Only this time, we’ll skip right ahead and plot the midpoint of each class interval as a function of the cumulative frequency of that class interval. You can see the cumulative frequency distribution in Figure 4.5 based on these same 50 scores. Notice this frequency polygon is shaped a little like a letter S. If the scores in a data set are distributed the way scores typically are, cumulative frequencies will often graph this way. Figure 4.5 ⬢ A hand-drawn cumulative frequency distribution Another name for a cumulative frequency polygon is an ogive. And, if the distribution of the data is normal or bell shaped (see Chapter 8 for more on this), then the ogive represents what is popularly known as a bell curve or a normal distribution. SPSS creates a really nice ogive—it’s called a P-P plot (for probability plot) and is easy to create. See Appendix A for an introduction to creating graphs using SPSS, as well as the material toward the end of this chapter. OTHER COOL WAYS TO CHART DATA What we did so far in this chapter is take some data and show how charts such as histograms and polygons can be used to communicate visually. But several other types of charts are also used in the behavioral and social sciences, and although it’s not necessary for you to know exactly how to create them (manually), you should at least be familiar with their names and what they do. So here are some popular charts, what they do, and how they do it. There are several very good personal computer applications for creating charts, among them the spreadsheet Excel (a Microsoft product) and, of course, SPSS. The charts in the “Using the Computer to Illustrate Data” section were created using SPSS as well. Bar Charts A bar or column chart should be used when you want to compare the frequencies of different categories with one another. Categories are organized horizontally on the x-axis,
  • 16. and values are shown vertically on the y-axis. Here are some examples of when you might want to use a column chart: · Number of participants in different water exercise activities · The sales of three different types of products · Number of children in each of six different grades Figure 4.6 shows a graph of number of participants in different water activities. Figure 4.6 ⬢ A bar chart that compares different water activities Column Charts A column chart is identical to a bar chart, but in this chart, categories are organized on the y-axis (which is the vertical one), and values are shown on the x-axis (the horizontal one). Line Charts A line chart should be used when you want to show a trend in the data at equal intervals. This sort of graph is often used when the x-axis represents time. Here are some examples of when you might want to use a line chart: · Number of cases of mononucleosis (mono) per season among college students at three state universities · Toy sales for the T&K company over four quarters · Number of travelers on two different airlines for each quarter In Figure 4.7, you can see a chart of sales in units over four quarters. Figure 4.7 ⬢ Using a line chart to show a trend over time Pie Charts A pie chart should be used when you want to show the proportion or percentage of people or things in various categories. The rule is that the percentages in each “slice” must add up to 100%, to make a whole pie. Here are some examples of when you might want to use a pie chart: · Of children living in poverty, the percentage who represent various ethnicities · Of students enrolled, the proportion who are in night or day
  • 17. classes · Of participants, the percentage in various age groups Note that a pie chart describes a nominal-level variable (such as ethnicity, time of enrollment, and age groups). In Figure 4.8, you can see a pie chart of voter preference. And we did a few fancy-schmancy things, such as separating and labeling the slices. Figure 4.8 ⬢ A pie chart illustrating the relative proportion of one category to others USING THE COMPUTER (SPSS, THAT IS) TO ILLUSTRATE DATA Now let’s use SPSS and go through the steps in creating some of the charts that we explored in this chapter. First, here are some general SPSS charting guidelines. 1. Although there are a couple options, we will use the Chart Builder option on the Graphs menu. This is the easiest way to get started and well worth learning how to use. 2. In general, you click Graphs → Chart Builder, and you see a dialog box from which you will select the type of graph you want to create. 3. Click the type of graph you want to create and then select the specific design of that type of graph. 4. Drag the variable names to the axis where each belongs. 5. Click OK, and you’ll see your graph. Let’s practice. Creating a Histogram 1. Enter the data you want to use to create the graph. Use those 50 scores we’ve been using in this chapter or make some up just to practice with. 2. Click Graphs → Chart Builder and you will see the Chart Builder dialog box, as shown in Figure 4.9. If you see any other screen, click OK. 3. Click the Histogram option in the Choose from: list and
  • 18. double-click the first image. 4. Drag the variable you wish to graph to the “x-axis?” location in the preview window. 5. Click OK and you will see a histogram, as shown in Figure 4.10. Figure 4.9 ⬢ The Chart Builder dialog box The histogram in Figure 4.10 looks a bit different from the hand-drawn one representing the same data shown earlier in this chapter, in Figure 4.2. The difference is that SPSS defines class intervals using its own idiosyncratic method. SPSS took as the middle of a class interval the bottom number of the interval (such as 10) rather than the midpoint (such as 12.5). Consequently, scores are allocated to different groups. The lesson here? How you group data makes a big difference in the way they look in a histogram. And, once you get to know SPSS well, you can make all kinds of fine-tuned adjustments to make graphs appear exactly as you want them. Creating a Bar Graph To create a bar graph, follow these steps: 1. Enter the data you want to use to create the graph. We used the following data that show the number of people in a club who belong to each of three political parties. 1 = Democrat, 2 = Republican, 3 = Independent 1, 1, 2, 3, 2, 1, 1, 2, 1 2. Click Graphs → Chart Builder, and you will see the Chart Builder dialog box, as shown in Figure 4.11. If you see any other screen, click OK. 3. Click the Bar option in the Choose from: list and double- click the first image. 4. Drag the variable named Party to the x-axis? location in the preview window. 5. Drag the variable named Number to the Count axis.
  • 19. 6. Click OK and you will see the bar graph, as shown in Figure 4.12. Figure 4.11 ⬢ The Chart Builder dialog box Figure 4.12 ⬢ A bar graph created using the Chart Builder Creating a Line Graph To create a line graph, follow these steps: 1. Enter the data you want to use to create the graph. In this example, we will be using the percentage of the total student body who attended the first day of classes each year over the duration of a 10-year program. Here are the data. You can type them into SPSS exactly as shown here, with the top row being the names you will give the two variables (columns). Year Attendance 1 87 2 88 3 89 4 76 5 80 6 96 7 91 8 97 9
  • 20. 89 10 79 2. Click Graphs → Chart Builder and you will see the Chart Builder dialog box, as shown in Figure 4.11. If you see any other screen, click OK. 3. Click the Line option in the Choose from: list and double- click the first image. 4. Drag the variable named Year to the x-axis? location in the preview window. 5. Drag the variable named Attendance to the y-axis? location. 6. Click OK, and you will see the line graph, as shown in Figure 4.13. We used the SPSS Chart Editor to change the minimum and maximum values on the y-axis. Figure 4.13 ⬢ A line graph created using the Chart Builder Creating a Pie Chart To create a pie chart, follow these steps: 1. Enter the data you want to use to create the chart. In this example, the pie chart represents the percentage of people buying different brands of doughnuts. Here are the data: Brand Percentage Krispies 55 Dunks 35 Other 10 2. Click Graphs → Chart Builder, and you will see the Chart Builder dialog box, as shown in Figure 4.11. If you see any other screen, click OK. 3. Click the Pie/Polar option in the Choose from: list and double-click the only image.
  • 21. 4. Drag the variable named Brand to the Slice by? axis label. 5. Drag the variable named Percentage to the Angle Variable? axis label. 6. Click OK, and you will see the pie chart, as shown in Figure 4.14. Figure 4.14 ⬢ A pie chart created using the Chart Builder Real-World Stats Graphs work, and a picture really is worth more than a thousand words. In this article, an oldie but goodie, the researchers examined how people perceive and process statistical graphs. Stephen Lewandowsky and Ian Spence reviewed empirical studies designed to explore how suitable different types of graphs are and how what is known about human perception can have an impact on the design and utility of these charts. They focused on some of the theoretical explanations for why certain elements work and don’t, the use of pictorial symbols (like a happy face symbol, which could make up the bar in a bar chart), and multivariate displays, where more than one set of data needs to be represented. And, as is very often the case with any paper, they concluded that not enough data were available yet. Given the increasingly visual world in which we live (emojis, anyone? ☹ ☺), this is interesting and useful reading to gain a historical perspective on how information was (and still is) discussed as a scientific topic. Want to know more? Go online or to the library and find … Lewandowsky, S., & Spence, I. (1989). The perception of statistical graphs. Sociological Methods Research, 18, 200–242. Summary There’s no question that charts are fun to create and can add enormous understanding to what might otherwise appear to be disorganized data. Follow our suggestions in this chapter and use charts well but only when they enhance, not duplicate, what’s already there. Time to Practice
  • 22. 1. A data set of 50 comprehension scores (named Comprehension Score) called Chapter 4 Data Set 1 is available in Appendix C and on the website. Answer the following questions and/or complete the following tasks: a. Create a frequency distribution and a histogram for the set. b. Why did you select the class interval you used? 2. Here is a frequency distribution. Create a histogram by hand or by using SPSS. Class Interval Frequency 261–280 140 241–260 320 221–240 3,380 201–220 600 181–200 500 161–180 410 141–160 315 121–140 300 100–120 200 3. A third-grade teacher wants to improve her students’ level of engagement during group discussions and instruction. She keeps track of each of the 15 third graders’ number of responses every day for 1 week, and the data are available as Chapter 4 Data Set 2. Use SPSS to create a bar chart with one bar for each day (and warning—this may be a toughie).Time to Practice VideoChapter 4: Problem 3 1. For problem three in chapter four, a teacher wants to display
  • 23. her students number of responses for each day of the week. And she wants to do that with a bar chart. Since she hasn't taken a stats class, she comes to you for help. You first enter her data into SPSS and the results look like this-- When you look at your data set, you'll see that it actually has the wrong level of measurement. Notice that there's a little Venn diagram at the top of each column, which indicates that your data has been entered as nominal. That would be correct if you were noting which day of the week a student participated, but since you're noting how often a given student participated, the correct level of measurement is a scale. Go ahead and change that. Watch how I do that. Under variable view, under measure, you just want to click each one and turn it into a scale. You can also cut and paste these, and I can show you that in another video. Once you have them changed, go back to data view, and you'll see that at the top it has changed in two little rulers. The next question is, how do I get SPSS to display the average score per day rather the total number of individual scores, which might look like a mess, and it's why this question is a toughie. To do that we go under graphs, and you'll see that you have two options, you can do a Chart Builder or a Legacy Dialog. For this question we want to use the Legacy Dialog. We go to Bar and when we click that, there are two questions-- one, what type of bar chart? We want a simple one. And then, how do you want the data in their area displayed? Do we want to summarize for the groups? We really don't. We want summary of separate variables where each day of the week is a variable. We click on Define and then here you'll see every day of the week. You want to bring that over and you see your bar charts are going to represent the mean for every day of the week. As a good habit you want to make sure you title it, I called it "Students' Engagement During Group Discussion." The second one is by day of week. We hit Continue, and then when we hit OK, you're going to see your output pop up. And here is our bar chart-- every day of the week showing the average student engagement.
  • 24. And this is how you answer problem 3 in chapter 4. Good luck. 2. Identify whether these distributions are negatively skewed, positively skewed, or not skewed at all and explain why you describe them that way. a. This talented group of athletes scored very high on the vertical jump task. b. On this incredibly crummy test, everyone received the same score. c. On the most difficult spelling test of the year, the third graders wept as the scores were delivered and then their parents complained. 3. Use the data available as Chapter 4 Data Set 3 on pie preference to create a pie chart ☺ using SPSS. 4. For each of the following, indicate whether you would use a pie, line, or bar chart and why. a. The proportion of freshmen, sophomores, juniors, and seniors in a particular university b. Change in temperature over a 24-hour period c. Number of applicants for four different jobs d. Percentage of test takers who passed e. Number of people in each of 10 categories 5. Provide an example of when you might use each of the following types of charts. For example, you would use a pie chart to show the proportion of children who receive a reduced- price lunch that are in Grades 1 through 6. When you are done, draw the fictitious chart by hand. a. Line b. Bar c. Scatter/dot (extra credit) 6. Go to the library or online and find a journal article in your area of interest that contains empirical data but does not contain any visual representation of them. Use the data to create a chart. Be sure to specify what type of chart you are creating and why you chose the one you did. You can create the chart manually or using SPSS or Excel. 7. Create the worst-looking chart that you can, crowded with
  • 25. chart and font junk. Nothing makes as lasting an impression as a bad example. 8. And, finally, what is the purpose of a chart or graph? Student Study Site Get the tools you need to sharpen your study skills! Visit edge.sagepub.com/salkindfrey7e to access practice quizzes, eFlashcards, original and curated videos, data sets, and more!5 COMPUTING CORRELATION COEFFICIENTS ICE CREAM AND CRIME5: MEDIA LIBRARY Premium VideosCore Concepts in Stats Video · CorrelationLightboard Lecture Video · Partial CorrelationsTime to Practice Video · Chapter 5: Problem 6 Difficulty Scale (moderately hard)WHAT YOU WILL LEARN IN THIS CHAPTER · Understanding what correlations are and how they work · Computing a simple correlation coefficient · Interpreting the value of the correlation coefficient · Understanding what other types of correlations exist and when they should be usedWHAT ARE CORRELATIONS ALL ABOUT? Measures of central tendency and measures of variability are not the only descriptive statistics that we are interested in using to get a picture of what a set of scores looks like. You have already learned that knowing the values of the one most representative score (central tendency) and a measure of spread or dispersion (variability) is critical for describing the characteristics of a distribution. However, sometimes we are as interested in the relationship between variables—or, to be more precise, how the value of one variable changes when the value of another variable changes. The way we express this interest is through the computation of a simple correlation coefficient. For example, what’s the
  • 26. relationship between age and strength? Income and years of education? Memory skills and amount of drug use? Your political attitudes and the attitudes of your parents? A correlation coefficient is a numerical index that reflects the relationship or association between two variables. The value of this descriptive statistic ranges between −1.00 and +1.00. A correlation between two variables is sometimes referred to as a bivariate (for two variables) correlation. Even more specifically, the type of correlation that we will talk about in the majority of this chapter is called the Pearson product- moment correlation, named for its inventor, Karl Pearson. The Pearson correlation coefficient examines the relationship between two variables, but both of those variables are continuous in nature. In other words, they are variables that can assume any value along some underlying continuum; examples include height (you really can be 5 feet 6.1938574673 inches tall), age, test score, and income. Remember in Chapter 2, when we talked about levels of measurement? Interval and ratio levels of measurement are continuous. But a host of other variables are not continuous. They’re called discrete or categorical variables, and examples are race (such as black and white), social class (such as high and low), and political affiliation (such as Democrat and Republican). In Chapter 2, we called these types of variables nominal level. You need to use other correlational techniques, such as the phi correlation, in these cases. These topics are for a more advanced course, but you should know they are acceptable and very useful techniques. We mention them briefly later on in this chapter. Other types of correlation coefficients measure the relationship between more than two variables, and we’ll talk about one of these in some more advanced chapters later on (which you are looking forward to already, right?). Types of Correlation Coefficients: Flavor 1 and Flavor 2 A correlation reflects the dynamic quality of the relationship between variables. In doing so, it allows us to understand
  • 27. whether variables tend to move in the same or opposite directions in relationship to each other. If variables change in the same direction, the correlation is called a direct correlation or a positive correlation. If variables change in opposite directions, the correlation is called an indirect correlation or a negative correlation. Table 5.1 shows a summary of these relationships. Table 5.1 ⬢ Types of Correlations What Happens to Variable X What Happens to Variable Y Type of Correlation Value Example X increases in value. Y increases in value. Direct or positive Positive, ranging from .00 to +1.00 The more time you spend studying, the higher your test score will be. X decreases in value. Y decreases in value. Direct or positive Positive, ranging from .00 to +1.00 The less money you put in the bank, the less interest you will earn. X increases in value. Y decreases in value. Indirect or negative Negative, ranging from −1.00 to .00 The more you exercise, the less you will weigh. X decreases in value. Y increases in value. Indirect or negative Negative, ranging from –1.00 to .00 The less time you take to complete a test, the more items you will get wrong.
  • 28. Now, keep in mind that the examples in the table reflect generalities, for example, regarding time to complete a test and the number of items correct on that test. In general, the less time that is taken on a test, the lower the score. Such a conclusion is not rocket science, because the faster one goes, the more likely one is to make careless mistakes such as not reading instructions correctly. But, of course, some people can go very fast and do very well. And other people go very slowly and don’t do well at all. The point is that we are talking about the average performance of a group of people on two different variables. We are computing the correlation between the two variables for the group of people, not for any one particular person. There are several easy (but important) things to remember about the correlation coefficient: · A correlation can range in value from −1.00 to +1.00. · The absolute value of the coefficient reflects the strength of the correlation. So a correlation of −.70 is stronger than a correlation of +.50. One frequently made mistake regarding correlation coefficients occurs when students assume that a direct or positive correlation is always stronger (i.e., “better”) than an indirect or negative correlation because of the sign and nothing else. · To calculate a correlation, you need exactly two variables and at least two people. · Another easy mistake is to assign a value judgment to the sign of the correlation. Many students assume that a negative relationship is not good and a positive one is good. But think of the example from Table 5.1 where exercise and weight have a negative correlation. That negative correlation is a positive thing! That’s why, instead of using the terms negative and positive, you might prefer to use the terms indirect and direct to communicate meaning more clearly. · The Pearson product-moment correlation coefficient is represented by the small letter r with a subscript representing the variables that are being correlated. You’d think that P for
  • 29. Pearson might be used as the symbol for this correlation, but in Greek, the P letter actually is similar to the English “r” sound, so r is used. P is used for the theoretical correlation in a population, so don’t feel sorry for Pearson. (If it helps, think of r as standing for relationship.) For example, · rxy is the correlation between variable X and variable Y. · rweight-height is the correlation between weight and height. · rSAT.GPA is the correlation between SAT score and grade point average (GPA). The correlation coefficient reflects the amount of variability that is shared between two variables and what they have in common. For example, you can expect an individual’s height to be correlated with an individual’s weight because these two variables share many of the same characteristics, such as the individual’s nutritional and medical history, general health, and genetics, and, of course, taller people have more mass usually. On the other hand, if one variable does not change in value and therefore has nothing to share, then the correlation between it and another variable is zero. For example, if you computed the correlation between age and number of years of school completed, and everyone was 25 years old, there would be no correlation between the two variables because there is literally no information (no variability) in age available to share. Likewise, if you constrain or restrict the range of one variable, the correlation between that variable and another variable will be less than if the range is not constrained. For example, if you correlate reading comprehension and grades in school for very high-achieving children, you’ll find the correlation to be lower than if you computed the same correlation for children in general. That’s because the reading comprehension score of very high-achieving students is quite high and much less variable than it would be for all children. The moral? When you are interested in the relationship between two variables, try to collect sufficiently diverse data—that way, you’ll get the truest representative result. And how do you do that? Measure a variable as precisely as possible (use higher, more informative
  • 30. levels of measurement) and use a sample that varies greatly on the characteristics you are interested in.COMPUTING A SIMPLE CORRELATION COEFFICIENT The computational formula for the simple Pearson product- moment correlation coefficient between a variable labeled X and a variable labeled Y is shown in Formula 5.1:COMPUTING A SIMPLE CORRELATION COEFFICIENT The computational formula for the simple Pearson product- moment correlation coefficient between a variable labeled X and a variable labeled Y is shown in Formula 5.1: (5.1) rxy=n∑XY−∑X∑Y√[n∑X2−(∑X)2][n∑Y2−(∑Y)2],rxy=n∑XY− ∑X∑Y[n∑X2−(∑X)2][n∑Y2−(∑Y)2], where · rxy is the correlation coefficient between X and Y; · n is the size of the sample; · X is each individual’s score on the X variable; · Y is each individual’s score on the Y variable; · XY is the product of each X score times its corresponding Y score; · X2 is each individual’s X score, squared; and · Y2 is each individual’s Y score, squared. Here are the data we will use in this example: X Y X2 Y2 XY 2 3 4 9 6 4
  • 32. 4 25 16 20 6 4 36 16 24 7 5 49 25 35 Total, Sum, or ∑ 54 43 320 201 247 Before we plug the numbers in, let’s make sure you understand what each one represents: · ∑X, or the sum of all the X values, is 54. · ∑Y, or the sum of all the Y values, is 43. · ∑X 2, or the sum of each X value squared, is 320. · ∑Y 2, or the sum of each Y value squared, is 201. · ∑XY, or the sum of the products of X and Y, is 247. It’s easy to confuse the sum of a set of values squared and the sum of the squared values. The sum of a set of values squared is taking values such as 2 and 3, summing them (to be 5), and then squaring that (which is 25). The sum of the squared values is taking values such as 2 and 3, squaring them (to get 4 and 9, respectively), and then adding those together (to get 13). Just look for the parentheses as you work.
  • 33. Here are the steps in computing the correlation coefficient: 1. List the two values for each participant. You should do this in a column format so as not to get confused. Use graph paper if working manually or SPSS or some other data analysis tool if working digitally. 2. Compute the sum of all the X values and compute the sum of all the Y values. 3. Square each of the X values and square each of the Y values. 4. Find the sum of the XY products. These values are plugged into the equation you see in Formula 5.2: (5.2) rxy=(10×247)−(54×43)√[(10×320)−542][(10×201)−432].rxy=(1 0×247)−(54×43)[(10×320)−542][(10×201)−432]. Ta-da! And you can see the answer in Formula 5.3: (5.3) rxy=148213.83=.692.rxy=148213.83=.692. What’s really interesting about correlations is that they measure the amount of distance that one variable covaries in relation to another. So, if both variables are highly variable (have lots of wide-ranging values), the correlation between them is more likely to be high than if not. Now, that’s not to say that lots of variability guarantees a higher correlation, because the scores have to vary in a systematic way. But if the variance is constrained in one variable, then no matter how much the other variable changes, the correlation will be lower. For example, let’s say you are examining the correlation between academic achievement in high school and first-year grades in college and you look at only the top 10% of the class. Well, that top 10% is likely to have very similar grades, introducing no variability and no room for the one variable to vary as a function of the other. Guess what you get when you correlate one variable with another variable that does not change (that is, has no variability)? rxy = 0, that’s what. The lesson here? Variability works, and you should not artificially limit it.
  • 34. The Scatterplot: A Visual Picture of a Correlation There’s a very simple way to visually represent a correlation: Create what is called a scatterplot, or scattergram (in SPSS lingo it’s a scatter/dot graph). This is simply a plot of each set of scores on separate axes. Here are the steps to complete a scattergram like the one you see in Figure 5.1, which plots the 10 sets of scores for which we computed the sample correlation earlier. Figure 5.1 ⬢ A simple scattergram 1. Draw the x-axis and the y-axis. Usually, the X variable goes on the horizontal axis and the Y variable goes on the vertical axis. 2. Mark both axes with the range of values that you know to be the case for the data. For example, the value of the X variable in our example ranges from 2 to 8, so we marked the x-axis from 0 to 9. There’s no harm in marking the axes a bit low or high— just as long as you allow room for the values to appear. The value of the Y variable ranges from 2 to 6, and we marked that axis from 0 to 9. Having similarly labeled (and scaled) axes can sometimes make the finished scatterplot easier to understand. 3. Finally, for each pair of scores (such as 2 and 3, as shown in Figure 5.1), we entered a dot on the chart by marking the place where 2 falls on the x-axis and 3 falls on the y-axis. The dot represents a data point, which is the intersection of the two values. When all the data points are plotted, what does such an illustration tell us about the relationship between the variables? To begin with, the general shape of the collection of data points indicates whether the correlation is direct (positive) or indirect (negative). A positive slope occurs when the data points group themselves in a cluster from the lower left-hand corner on the x- and y-axes
  • 35. through the upper right-hand corner. A negative slope occurs when the data points group themselves in a cluster from the upper left-hand corner on the x- and y-axes through the lower right-hand corner. Here are some scatterplots showing very different correlations where you can see how the grouping of the data points reflects the sign and strength of the correlation coefficient. Figure 5.2 shows a perfect direct correlation, where rxy = 1.00 and all the data points are aligned along a straight line with a positive slope. Figure 5.2 ⬢ A perfect direct, or positive, correlation If the correlation were perfectly indirect, the value of the correlation coefficient would be −1.00, and the data points would align themselves in a straight line as well but from the upper left-hand corner of the chart to the lower right. In other words, the line that connects the data points would have a negative slope. And, remember, in both examples, the strength of the association is the same; it is only the direction that is different. Don’t ever expect to find a perfect correlation between any two variables in the behavioral or social sciences. Such a correlation would say that two variables are so perfectly related, they share everything in common. In other words, knowing one is exactly like knowing the other. Just think about your classmates. Do you think they all share any one thing in common that is perfectly related to another of their characteristics across all those different people? Probably not. In fact, r values approaching .7 and .8 are just about the highest you’ll see. In Figure 5.3, you can see the scatterplot for a strong (but not perfect) direct relationship where rxy = .70. Notice that the data points align themselves along a positive slope, although not perfectly. Now, we’ll show you a strong indirect, or negative, relationship in Figure 5.4, where rxy = −.82. Notice that the data points align themselves on a negative slope from the upper left-hand
  • 36. corner of the chart to the lower right-hand corner. That’s what different types of correlations look like, and you can really tell the general strength and direction by examining the way the points are grouped. Figure 5.3 ⬢ A strong, but not perfect, direct relationship Not all correlations are reflected by a straight line showing the X and the Y values in a relationship called a linear correlation (see Chapter 16 for tons of fun stuff about this). The relationship may not be linear and may not be reflected by a straight line. Let’s take the correlation between age and memory. For the early years, the correlation is probably highly positive—the older children get, the better their memory. Then, into young and middle adulthood, there isn’t much of a change or much of a correlation, because most young and middle adults maintain a good (but not necessarily increasingly better) memory. But with old age, memory begins to suffer, and there is an indirect relationship between memory and aging in the later years. If you take these together and look at the relationship over the life span, you find that the correlation between memory and age tends to look something like a curve where age continues to grow at the same rate but memory increases at first, levels off, and then decreases. It’s a curvilinear relationship, and sometimes, the best description of a relationship is that it is curvilinear.CORE CONCEPTS IN STATS VIDEOCorrelation People who perform well in high school tend to do well in college-- right? Say you wanted to measure the relationship between students' GPA in high school and their GPA in college-- you would calculate a Pearson correlation between the two variables. To see the relationship between two variables, we draw a picture of the data using a scatter plot. We plot each person's high school GPA on the x-axis and their college GPA on the y-axis. Notice how the dots have a pattern.
  • 37. Next, we compute a line called a regression line that runs through the center of the dots. This line is computed using the average. We use a formula to measure how much each GPA varies independently and how much the two GPA is very together. The variance is the average amount that each student's high school GPA differs from the mean of all of the high school GPAs. Of course, each student's college GPA also varies from the mean of the college GPAs. But high school GPA and college GPA also very together. People who do well in high school tend to do well in college. Varying together is called covariance. The Pearson correlation is calculated by dividing the covariance of the two variables by the variance of the two variables. Sometimes the dots are spread out from the line-- that means there is a lot of independent variance. The relationship between the two variables is weaker. Sometimes the dots are close to the line-- the relationship is stronger. Sometimes the line looks like this- - as one goes up, the other goes up. Sometimes the line is like this-- as one goes up, the other goes down. College GPA is negatively related to the amount of time spent partying. We use correlation to understand the strength and the direction of the relationship between the variables. The Correlation Matrix: Bunches of Correlations What happens if you have more than two variables and you want to see correlations among all pairs of variables? How are the correlations illustrated? Use a correlation matrix like the one shown in Table 5.2—a simple and elegant solution. As you can see in these made-up data, there are four variables in the matrix: level of income (Income), level of education (Education), attitude toward voting (Attitude), and how sure they are that they will vote (Vote). Table 5.2 ⬢ Correlation Matrix Income Education Attitude
  • 38. Vote Income 1.00 .574 −.08 −.291 Education .574 1.00 −.149 −.199 Attitude −.08 −.149 1.00 −.169 Vote −.291 −.199 −.169 1.00 For each pair of variables, there is a correlation coefficient. For example, the correlation between income level and education is .574. Similarly, the correlation between income level and how sure people are that they will vote in the next election is −.291 (meaning that the higher the level of income, the less confident people were that they would vote). In such a matrix with four variables, there are really only six correlation coefficients. Because variables correlate perfectly with themselves (those are the 1.00s down the diagonal), and because the correlation between Income and Vote is the same as the correlation between Vote and Income, the matrix creates a mirror image of itself. You can use SPSS—or almost any other statistical analysis package, such as Excel—to easily create a matrix like the one you saw earlier. In applications like Excel, you can use the Data
  • 39. Analysis ToolPak. You will see such matrices (the plural of matrix) when you read journal articles that use correlations to describe the relationships among several variables. Understanding What the Correlation Coefficient Means Well, we have this numerical index of the relationship between two variables, and we know that the higher the value of the correlation (regardless of its sign), the stronger the relationship is. But how can we interpret it and make it a more meaningful indicator of a relationship? Here are different ways to look at the interpretation of that simple rxy. Using-Your-Thumb (or Eyeball) Method Perhaps the easiest (but not the most informative) way to interpret the value of a correlation coefficient is by eyeballing it and using the information in Table 5.3. This is based on customary interpretations of the size of a correlation in the behavioral sciences. So, if the correlation between two variables is .3, you could safely conclude that the relationship is a moderate one—not strong but certainly not weak enough to say that the variables in question don’t share anything in common. Table 5.3 ⬢ Interpreting a Correlation Coefficient Size of the Correlation Coefficient General Interpretation .5 to 1.0 Strong relationship .4 Moderate to strong relationship .3 Moderate relationship .2 Weak to moderate relationship 0 to .1 Weak or no relationship This eyeball method is perfectly acceptable for a quick
  • 40. assessment of the strength of the relationship between variables, such as when you briefly evaluate data presented visually. But because this rule of thumb depends on a subjective judgment (of what’s “strong” or “weak”), we would like a more precise method. That’s what we’ll look at now. Special Effects! Correlation Coefficient Throughout the book, we will learn about various effect sizes and how to interpret them. An effect size is an index of the strength of the relationship among variables, and with most statistical procedures we learn about, there will be an associated effect size that should be reported and interpreted. The correlation coefficient is a perfect example of an effect size as it quite literally is a measure of the strength of a relationship. Thanks to Table 5.3, we already know how to interpret it. SQUARING THE CORRELATION COEFFICIENT: A DETERMINED EFFORT Here’s the much more precise way to interpret the correlation coefficient: computing the coefficient of determination. The coefficient of determination is the percentage of variance in one variable that is accounted for by the variance in the other variable. Quite a mouthful, huh? Earlier in this chapter, we pointed out how variables that share something in common tend to be correlated with one another. If we correlated math and language arts grades for 100 fifth-grade students, we would find the correlation to be moderately strong, because many of the reasons why children do well (or poorly) in math tend to be the same reasons why they do well (or poorly) in language arts. The number of hours they study, how bright they are, how interested their parents are in their schoolwork, the number of books they have at home, and more are all related to both math and language arts performance and account for differences between children (and that’s where the variability comes in). The more these two variables share in common, the more they will be related. These two variables share variability—or the reason why children differ from one another. And on the whole,
  • 41. the brighter child who studies more will do better. To determine exactly how much of the variance in one variable can be accounted for by the variance in another variable, the coefficient of determination is computed by squaring the correlation coefficient. For example, if the correlation between GPA and number of hours of study time is .70 (or r GPA.time = .70), then the coefficient of determination, represented by r2GPA.timerGPA.time2 , is .702, or .49. This means that 49% of the variance in GPA “can be explained by” or “is shared by” the variance in studying time. And the stronger the correlation, the more variance can be explained (which only makes good sense). The more two variables share in common (such as good study habits, knowledge of what’s expected in class, and lack of fatigue), the more information about performance on one score can be explained by the other score. However, if 49% of the variance can be explained, this means that 51% cannot—so even for a very strong correlation of .70, many of the reasons why scores on these variables tend to be different from one another go unexplained. This amount of unexplained variance is called the coefficient of alienation (also called the coefficient of nondetermination). Don’t worry. No aliens here. This isn’t X-Files or Walking Dead stuff—it’s just the amount of variance in Y not explained by X (and, of course, vice versa since the relationship goes both ways). How about a visual presentation of this sharing variance idea? Okay. In Figure 5.5, you’ll find a correlation coefficient, the corresponding coefficient of determination, and a diagram that represents how much variance is shared between the two variables. The larger the shaded area in each diagram (and the more variance the two variables share), the more highly the variables are correlated. · The first diagram in Figure 5.5 shows two circles that do not touch. They don’t touch because they do not share anything in common. The correlation is zero. · The second diagram shows two circles that overlap. With a
  • 42. correlation of .5 (and r2xy=.25rxy2=.25 ), they share about 25% of the variance between them. · Finally, the third diagram shows two circles placed almost on top of each other. With an almost perfect correlation of rxy = .90 (r2xy=.81rxy2=.81 ), they share about 81% of the variance between them. Figure 5.5 ⬢ How variables share variance and the resulting correlationAs More Ice Cream Is Eaten … the Crime Rate Goes Up (or Association vs. Causality) Now here’s the really important thing to be careful about when computing, reading about, or interpreting correlation coefficients. Imagine this. In a small midwestern town, a phenomenon occurred that defied any logic. The local police chief observed that as ice cream consumption increased, crime rates tended to increase as well. Quite simply, if you measured both, you would find the relationship was direct, meaning that as people eat more ice cream, the crime rate increases. And as you might expect, as they eat less ice cream, the crime rate goes down. The police chief was baffled until he recalled the Stats 1 class he took in college and still fondly remembered. (He probably also pulled out his copy of this book that he still owned. In fact, it was likely one of three copies he had purchased to make sure he always had one handy.) He wondered how this could be turned into an aha! “Very easily,” he thought. The two variables must share something or have something in common with one another. Remember that it must be something that relates to both level of ice cream consumption and level of crime rate. Can you guess what that is? The outside temperature is what they both have in common. When it gets warm outside, such as in the summertime, more crimes are committed (it stays light longer, people leave the windows open, bad guys and girls are out more, etc.). And because it is warmer, people enjoy the ancient treat and art of eating ice cream. Conversely, during the long and dark winter
  • 43. months, less ice cream is consumed and fewer crimes are committed as well. Joe, though, recently elected as a city commissioner, learns about these findings and has a great idea, or at least one that he thinks his constituents will love. (Keep in mind, he skipped the statistics offering in college.) Why not just limit the consumption of ice cream in the summer months to reduce the crime rate? Sounds good, right? Well, on closer inspection, it really makes no sense at all. That’s because of the simple principle that correlations express the association that exists between two or more variables; they have nothing to do with causality. In other words, just because level of ice cream consumption and crime rate increase together (and decrease together as well) does not mean that a change in one results in a change in the other. For example, if we took all the ice cream out of all the stores in town and no more was available, do you think the crime rate would decrease? Of course not, and it’s preposterous to think so. But strangely enough, that’s often how associations are interpreted—as being causal in nature—and complex issues in the social and behavioral sciences are reduced to trivialities because of this misunderstanding. Did long hair and hippiedom have anything to do with the Vietnam conflict? Of course not. Does the rise in the number of crimes committed have anything to do with more efficient and safer cars? Of course not. But they all happen at the same time, creating the illusion of being associated. People Who Loved Statistics Katharine Coman (1857–1915) was such a kind and caring researcher that a famous book of poetry and prose was written about her after her death from cancer at the age of 57. Her love for statistics was demonstrated in her belief that the study of economics could solve social problems and urged her college, Wellesley, to let her teach economics and statistics. She may have been the first woman statistics professor. Coman was a
  • 44. prominent social activist in her life and in her writings, and she frequently cited industrial and economic statistics to support her positions, especially as they related to the labor movement and the role of African American workers. The artistic biography written about Professor Coman was Yellow Clover (1922), a tribute to her by her longtime companion (and coauthor of the song “America the Beautiful”), Katherine Lee Bates. Using SPSS to Compute a Correlation Coefficient Let’s use SPSS to compute a correlation coefficient. The data set we are using is an SPSS data file named Chapter 5 Data Set 1. There are two variables in this data set: Variable Definition Income Annual income in dollars Education Level of education measured in years To compute the Pearson correlation coefficient, follow these steps: 1. Open the file named Chapter 5 Data Set 1. 2. Click Analyze → Correlate → Bivariate, and you will see the Bivariate Correlations dialog box, as shown in Figure 5.6. 3. Double-click on the variable named Income to move it to the Variables: box. 4. Double-click on the variable named Education to move it to the Variables: box. You can also hold down the Ctrl key to select more than one variable at a time and then use the “move” arrow in the center of the dialog box to move them both. 5. Click OK. Figure 5.6 ⬢ The Bivariate Correlations dialog box Understanding the SPSS Output The output in Figure 5.7 shows the correlation coefficient to be equal to .574. Also shown are the sample size, 20, and a
  • 45. measure of the statistical significance of the correlation coefficient (we’ll cover the topic of statistical significance in Chapter 9). Figure 5.7 ⬢ SPSS output for the computation of the correlation coefficient The SPSS output shows that the two variables are related to one another and that as level of income increases, so does level of education. Similarly, as level of income decreases, so does level of education. The fact that the correlation is significant means that this relationship is not due to chance. As for the meaningfulness of the relationship, the coefficient of determination is .5742 or .329 or .33, meaning that 33% of the variance in one variable is accounted for by the other. According to our eyeball strategy, this is a relatively weak relationship. Once again, remember that low levels of income do not cause low levels of education, nor does not finishing high school mean that someone is destined to a life of low income. That’s causality, not association, and correlations speak only to association. Creating a Scatterplot (or Scattergram or Whatever) You can draw a scatterplot by hand, but it’s good to know how to have SPSS do it for you as well. Let’s take the same data that we just used to produce the correlation matrix in Figure 5.7 and use it to create a scatterplot. Be sure that the data set named Chapter 5 Data Set 1 is on your screen. 1. Click Graphs → Chart Builder → Scatter/Dot, and you will see the Chart Builder dialog box shown in Figure 5.8. 2. Double-click on the first Scatter/Dot example. 3. Highlight and drag the variable named Income to the y-axis. 4. Highlight and drag the variable named Education to the x- axis. 5. Click OK, and you’ll have a very nice, simple, and easy-to- understand scatterplot like the one you see in Figure 5.9.
  • 46. Figure 5.8 ⬢ The Chart Builder dialog box OTHER COOL CORRELATIONS There are different ways in which variables can be assessed. For example, nominal-level variables are categorical in nature; examples are race (e.g., black or white) and political affiliation (e.g., Independent or Republican). Or, if you are measuring income and age, you are measuring interval-level variables, because the underlying continuum on which they are based has equally appearing intervals. As you continue your studies, you’re likely to come across correlations between data that occur at different levels of measurement. And to compute these correlations, you need some specialized techniques. Table 5.4 summarizes what these different techniques are and how they differ from one another. Table 5.4 ⬢ Correlation Coefficient Shopping, Anyone? Level of Measurement and Examples Variable X Variable Y Type of Correlation Correlation Being Computed Nominal (voting preference, such as Republican or Democrat) Nominal (biological sex, such as male or female) Phi coefficient The correlation between voting preference and sex Nominal (social class, such as high, medium, or low) Ordinal (rank in high school graduating class) Rank biserial coefficient The correlation between social class and rank in high school Nominal (family configuration, such as two-parent or single- parent) Interval (grade point average) Point biserial The correlation between family configuration and grade point average
  • 47. Ordinal (height converted to rank) Ordinal (weight converted to rank) Spearman rank coefficient The correlation between height and weight Interval (number of problems solved) Interval (age in years) Pearson correlation coefficient The correlation between number of problems solved and age in years PARTING WAYS: A BIT ABOUT PARTIAL CORRELATION Okay, now you have the basics about simple correlation, but there are many other correlational techniques that are specialized tools to use when exploring relationships between variables. A common “extra” tool is called partial correlation, where the relationship between two variables is explored, but the impact of a third variable is removed from the relationship between the two. Sometimes that third variable is called a mediating or a confounding variable. For example, let’s say that we are exploring the relationship between level of depression and incidence of chronic disease and we find that, on the whole, the relationship is positive. In other words, the more chronic disease is evident, the higher the likelihood that depression is present as well (and of course vice versa). Now remember, the relationship might not be causal, one variable might not “cause” the other, and the presence of one does not mean that the other will be present as well. The positive correlation is just an assessment of an association between these two variables, the key idea being that they share some variance in common. And that’s exactly the point—it’s the other variables they share in common that we want to control and, in some cases, remove from the relationship so we can focus on the key relationship we are interested in. For example, how about level of family support? Nutritional habits? Severity or length of illness? These and many more
  • 48. variables can all explain the relationship between these two variables, or they may at least account for some of the variance. And think back a bit. That’s exactly the same argument we made when focusing on the relationship between the consumption of ice cream and the level of crime. Once outside temperature (the mediating or confounding variable) is removed from the equation … boom! The relationship between the consumption of ice cream and the crime level plummets. Let’s take a look. Here are some data on the consumption of ice cream and the crime rate for 10 cities. Consumption of Ice Cream Crime Rate Consumption of ice cream 1.00 .743 Crime rate 1.00 So, the correlation between these two variables, consumption of ice cream and crime rate, is .743. This is a pretty healthy relationship, accounting for about 50% of the variance between the two variables (.7432 = .55 or 55%). Now, we’ll add a third variable, average outside temperature. Here are the Pearson correlation coefficients for the set of three variables. Consumption of Ice Cream Crime Rate Average Outside Temperature Consumption of ice cream 1.00 .743 .704 Crime rate
  • 49. 1.00 .655 Average outside temperature 1.00 As you can see by these values, there’s a fairly strong relationship between ice cream consumption and outside temperature and between crime rate and outside temperature. We’re interested in the question, “What’s the correlation between ice cream consumption and crime rate with the effects of outside temperature removed or partialed out?” That’s what partial correlation does. It looks at the relationship between two variables (in this case, consumption of ice cream and crime rate) as it removes the influence of a third (in this case, outside temperature). A third variable that explains the relationship between two variables can be a mediating variable or a confounding variable. Those are different types of variables with different definitions, though, and are easy to confuse. In our example with correlations, a confounding variable is something like temperature that affects both our variables of interest and explains the correlation between them. A mediating variable is a variable that comes between our two variables of interest and explains the apparent relationship. For example, if A is correlated with B and B is correlated with C, A and C would seem to be related but only because they are both related to B. B is a mediating variable. Perhaps A affects B and B affects C, so A and C are correlated. LIGHTBOARD LECTURE VIDEO Partial Correlations When we talk about correlations between two variables, we almost always think about it by drawing these two overlapping circles. And we have variable A variable B. And they somehow are correlated. They overlap in some way. And they
  • 50. share this information. Sometimes, though, there can be another variable. Let's call it-- let's see, my computer brain suggests C. That correlates with both of those. And when these variables all correlate together, there's all these overlapping areas. So A, B, and C all measure the same thing to some degree. A and C all measure this same thing to some degree. And B and C all measure this same thing to some degree. Sometimes we want to know, though, what would the relationship between A and B be if we controlled for C? Would the correlation go up? Would it go down? And you can even see visually, some of this correlation between A and B is this part right here that is actually part of C as well. So statistically, we call this correlation after controlling for another a partial correlation. And I'm going to show you what a partial correlation looks like. If we control for a variable, it means we statistically remove it. It's as if it's not there. It's as if everyone got And it creates now, between A and B, this new relationship. And you can look-- literally, it is different. It's a different type of correlation. It probably would go down, mathematically. The other thing interesting-- when you remove a relationship because of a partial correlation, is the variables themselves are now a different shape because a little bit of A used to be C. And a little bit of B used to be C. So when you control for a third variable, you end up with a different relationship and different variables. One way to think about controlling for another variable is, instead of thinking about variables, think about friends. We all have different friends. Sometimes you're friends with someone because you share some other friend. So imagine instead, you've got three friends here-- 1, 2, and 3. You often see 1 and 2 together. But it's because they're with number 3. They're with friend 3. What would happen if we remove 3, that 1 and 2 are never together unless 3's there? Let's remove all the times where 3 is also there. Well, 1 and 2 spend a little bit of time together. But they don't associate as closely. They're not really close friends with each other. They just seem to be
  • 51. friends because they both hang out with number 3. And number 3 has the nice house and has all the fun parties. But if it's just 1 and 2, they barely like each other. Using SPSS to Compute Partial Correlations Let’s use some data and SPSS to illustrate the computation of a partial correlation. Here are the raw data. City Ice Cream Consumption Crime Rate Average Outside Temperature 1 3.4 62 88 2 5.4 98 89 3 6.7 76 65 4 2.3 45 44 5 5.3 94 89 6 4.4 88 62 7
  • 52. 5.1 90 91 8 2.1 68 33 9 3.2 76 46 10 2.2 35 41 1. Enter the data we are using into SPSS. 2. Click Analyze → Correlate → Partial and you will see the Partial Correlations dialog box, as shown in Figure 5.10. 3. Move Ice_Cream and Crime_Rate to the Variables: box by dragging them or double-clicking on each one. 4. Move the variable named Outside_Temp to the Controlling for: box. 5. Click OK and you will see the SPSS output as shown in Figure 5.11. Figure 5.10 ⬢ The Partial Correlations dialog box Understanding the SPSS Output As you can see in Figure 5.11, the correlation between ice cream consumption (Ice_Cream) and crime rate (Crime_Rate) with the influence or moderation of outside temperature (Outside_Temp) removed is .525. This is less than the simple Pearson correlation between ice cream consumption and crime rate (which is .743), which does not consider the influence of outside temperature. What seemed to explain 55% of the variance (and was what we call “significant at the .05 level”), with the removal of Outside_Temp as a moderating variable,
  • 53. now explains .5252 = 0.28 = 28% of the variance (and the relationship is no longer significant). Figure 5.11 ⬢ The completed partial correlation analysis Our conclusion? Outside temperature accounted for enough of the shared variance between the consumption of ice cream and the crime rate for us to conclude that the two-variable relationship was significant. But, with the removal of the moderating or confounding variable outside temperature, the relationship was no longer significant. And we don’t need to stop selling ice cream to try to reduce crime. Real-World Stats This is a fun one and consistent with the increasing interest in using statistics in various sports in various ways, a discipline informally named sabermetrics. The term was coined by Bill James (and his approach is represented in the movie and book Moneyball). Stephen Hall and his colleagues examined the link between teams’ payrolls and the competitiveness of those teams (for both professional baseball and soccer), and he was one of the first to look at this from an empirical perspective. In other words, until these data were published, most people made decisions based on anecdotal evidence rather than quantitative assessments. Hall looked at data on team payrolls in American Major League Baseball and English soccer between 1980 and 2000, and he used a model that allows for the establishment of causality (and not just association) by looking at the time sequence of events to examine the link. In baseball, payroll and performance both increased significantly in the 1990s, but there was no evidence that causality runs in the direction from payroll to performance. In comparison, for English soccer, the researchers did show that higher payrolls actually were at least one cause of better performance. Pretty cool, isn’t it, how association can be explored to make real-world decisions? Want to know more? Go online or to the library and find …
  • 54. Hall, S., Szymanski, S., & Zimbalist, A. S. (2002). Testing causality between team performance and payroll: The cases of Major League Baseball and English soccer. Journal of Sports Economics, 3, 149–168. Summary The idea of showing how things are related to one another and what they have in common is a very powerful one, and the correlation coefficient is a very useful descriptive statistic (one used in inference as well, as we will show you later). Keep in mind that correlations express a relationship that is associative but not necessarily causal, and you’ll be able to understand how this statistic gives us valuable information about relationships between variables and how variables change or remain the same in concert with others. Now it’s time to change speeds just a bit and wrap up Part II with a focus on reliability and validity. You need to know about these ideas because you’ll be learning how to determine what differences in outcomes, such as scores and other variables, represent. Time to Practice 1. Use these data to answer Questions 1a and 1b. These data are saved as Chapter 5 Data Set 2. a. Compute the Pearson product-moment correlation coefficient by hand and show all your work. b. Construct a scatterplot for these 10 pairs of values by hand. Based on the scatterplot, would you predict the correlation to be direct or indirect? Why? Number Correct (out of a possible 20) Attitude (out of a possible 100) 17 94 13 73 12 59 15 80
  • 55. 16 93 14 85 16 66 16 79 18 77 19 91 2. Use these data to answer Questions 2a and 2b. These data are saved as Chapter 5 Data Set 3. Speed (to complete a 50-yard swim) Strength (number of pounds bench-pressed) 21.6 135 23.4 213 26.5 243 25.5 167 20.8 120 19.5 134 20.9 209 18.7 176 29.8 156 28.7 177
  • 56. a. Using either a calculator or a computer, compute the Pearson correlation coefficient. b. Interpret these data using the general range of very weak to very strong. Also compute the coefficient of determination. How does the subjective analysis compare with the value of r 2? 3. Rank the following correlation coefficients on strength of their relationship (list the weakest first). . .71 . +.36 . −.45 . .47 . −.62 · For the following set of scores, calculate the Pearson correlation coefficient and interpret the outcome. These data are saved as Chapter 5 Data Set 4. Achievement Increase Over 12 Months Classroom Budget Increase Over 12 Months 0.07 0.11 0.03 0.14 0.05 0.13 0.07 0.26 0.02 0.08 0.01 0.03 0.05 0.06 0.04 0.12 0.04 0.11 · For the following set of data, by hand, correlate minutes of
  • 57. exercise with grade point average (GPA). What do you conclude given your analysis? These data are saved as Chapter 5 Data Set 5. Exercise GPA 25 3.6 30 4.0 20 3.8 60 3.0 45 3.7 90 3.9 60 3.5 0 2.8 15 3.0 10 2.5 Use SPSS to determine the correlation between hours of studying and GPA for these honor students. Why is the correlation so low? Hours of Studying GPA 23 3.95 12 3.90 15 4.00
  • 58. 14 3.76 16 3.97 21 3.89 14 3.66 11 3.91 18 3.80 9 3.89 Time to Practice Video Chapter 5: Problem 6 Chapter 5, Problem 6 asks you to compute a correlation. They want to assess how honor students GPAs are correlated with their hours of studying, and then to answer the question of why that correlation is so low. So you see listed here are individual student's GPAs and the average number of hours that they study. We want to set up our SPSS data file with this information. When we look here, you'll see the two variables. And under Variable View, make sure that they're both set up with scale, since they're both measured on a continuous range. A GPA can go from can go from When we look here to do, this we're going to go under Analyze and then Correlate By Variate. We have two variables, so this would be a by variate correlation. This is straightforward here. We're going to take both of them, move it to our variables. You notice down here it's saying we have the Pearson, which is for by variate continuous data. We're going to look for a two-tailed significance. Is it, how are they related to each other? We are asking it to flag our significant correlations. You always want to look under Options, but it defaults to what we want. We could say we want to show the means and standard deviations,
  • 59. we don't need to do that for this one. So instead, let's hit OK. And here is our information. When we look at our significance, it's a We need it to be lower than a And so, the question is, why is this correlation of a so low? Well, if we look back at our data itself, it's going to give us an understanding. Of these are really high GPAs. When there's very little variability, you're typically not going to get a strong correlation because there's not much change. Even though there seems to be a range of how they studied, in terms of hours, it didn't have a big effect on their GPA because they're all really high. And this is how you answer Chapter 5, Problem 6. 1. The coefficient of determination between two variables is .64. Answer the following questions: a. What is the Pearson correlation coefficient? b. How strong is the relationship? c. How much of the variance in the relationship between these two variables is unaccounted for? 2. Here is a set of three variables for each of 20 participants in a study on recovery from a head injury. Create a simple matrix that shows the correlations between each variable. You can do this by hand (and plan on being here for a while) or use SPSS or any other application. These data are saved as Chapter 5 Data Set 6. Age at Injury Level of Treatment 12-Month Treatment Score 25 1 78 16 2 66 8 2 78
  • 61. 23 1 92 31 2 97 53 2 69 11 3 79 33 2 69 3. Look at Table 5.4. What type of correlation coefficient would you use to examine the relationship between biological sex (defined in this study as having only two categories: male or female) and political affiliation? How about family configuration (two-parent or single-parent) and high school GPA? Explain why you selected the answers you did. 4. When two variables are correlated (such as strength and running speed), they are associated with one another. Explain how, even if there is a correlation between the two, one might not cause the other. 5. Provide three examples of an association between two variables where a causal relationship makes perfect sense conceptually. 6. Why can’t correlations be used as a tool to prove a causal relationship between variables rather than just an association? 7. When would you use partial correlation? Student Study Site Get the tools you need to sharpen your study skills! Visit edge.sagepub.com/salkindfrey7e to access practice quizzes, eFlashcards, original and curated videos, data sets, and
  • 62. more!6 AN INTRODUCTION TO UNDERSTANDING RELIABILITY AND VALIDITY JUST THE TRUTH6: MEDIA LIBRARY Premium VideosLightboard Lecture Video · Reliability · ValidityTime to Practice Video · Chapter 6: Problem 5 Difficulty Scale (not so hard)WHAT YOU WILL LEARN IN THIS CHAPTER · Defining reliability and validity and understanding why they are important · This is a stats class! What’s up with this measurement stuff? · Understanding how the quality of tests is evaluated · Computing and interpreting various types of reliability coefficients · Computing and interpreting various types of validity coefficientsAN INTRODUCTION TO RELIABILITY AND VALIDITY Ask any parent, teacher, pediatrician, or almost anyone in your neighborhood what the five top concerns are about today’s children, and there is sure to be a group who identifies obesity as one of those concerns. Sandy Slater and her colleagues developed and tested the reliability and validity of a self- reported questionnaire on home, school, and neighborhood physical activity environments for youth located in low-income urban minority neighborhoods and rural areas. In particular, the researchers looked at such variables as information on the presence of electronic and play equipment in youth participants’ bedrooms and homes and outdoor play equipment at schools. They also looked at what people close to the children thought about being active. A total of 205 parent–child pairs completed a 160-item take-home survey on two different occasions, a perfect model for establishing test–retest reliability. The researchers found that the measure had good reliability and validity. The researchers hoped that this survey could be used to
  • 63. help identify opportunities and develop strategies to encourage underserved youth to be more physically active. Want to know more? Go online or to the library and find … Slater, S., Full, K., Fitzgibbon, M., & Uskali, A. (2015, June 4). Test–retest reliability and validity results of the Youth Physical Activity Supports Questionnaire. SAGE Open, 5(2). doi:10.1177/2158244015586809 What’s Up With This Measurement Stuff? An excellent question and one that you should be asking. After all, you enrolled in a stats class, and up to now, that’s been the focus of the material that has been covered. Now it looks like you’re faced with a topic that belongs in a tests and measurements class. So, what’s this material doing in a stats book? Well, much of what we have covered so far in Statistics for People Who (Think They) Hate Statistics has to do with the collection and description of data. Now we are about to begin the journey toward analyzing and interpreting data. But before we begin learning those skills, we want to make sure that the data are what you think they are—that the data represent what it is you want to know about. In other words, if you’re studying poverty, you want to make sure that the measure you use to assess poverty works and that it works time after time. Or, if you are studying aggression in middle-aged males, you want to make sure that whatever tool you use to assess aggression works and that it works time after time. More really good news: Should you continue in your education and want to take a class on tests and measurements, this introductory chapter will give you a real jump on understanding the scope of the area and what topics you’ll be studying. And to make sure that the entire process of collecting data and making sense out of them works, you first have to make sure that what you use to collect data works as well. The fundamental questions that are answered in this chapter are “How do I know that the test, scale, instrument, and so on. I use
  • 64. produces scores that aren’t random but actually represent an individual’s typical performance?” (that’s reliability) and “How do I know that the test, scale, instrument, and so on. I use measures what it is supposed to?” (that’s validity). Anyone who does research will tell you about the importance of establishing the reliability and validity of your measurement tool, whether it’s a simple observational instrument of consumer behavior or one that measures a complex psychological construct such as depression. However, there’s another very good reason. If the tools that you use to collect data are unreliable or invalid, then the results of any test or any hypothesis, and the conclusions you may reach based on those results, are necessarily inconclusive. If you are not sure that the test does what it is supposed to and that it does so consistently without randomness in its scores, how do you know that the nonsignificant results you got aren’t a function of the lousy test tools rather than an actual reflection of reality? Want a clean test of your hypothesis? Make reliability and validity an important part of your research. You may have noticed a new term at the beginning of this chapter—dependent variable. In an experiment, this is the outcome variable, or what the researcher looks at to see whether any change has occurred as a function of the treatment that has taken place. And guess what? The treatment has a name as well—the independent variable. For example, if a researcher examined the effect of different reading programs on comprehension, the independent variable would be the reading program, and the dependent or outcome variable would be reading comprehension score. The term dependent variable is used for the outcome variable because the hypothesis suggests that it depends on, or is affected by, the independent variable. Although these terms will not be used often throughout the remainder of this book, you should have some familiarity with them.RELIABILITY: DOING IT AGAIN UNTIL YOU GET IT RIGHT Reliability is pretty easy to understand and figure out. It’s
  • 65. simply whether a test, or whatever you use as a measurement tool, measures something consistently. If you administer a test of personality type before a special treatment occurs, will the administration of that same test 4 months later be reliable? That, my friend, is one type of reliability—the degree to which scores are consistent for one person measured twice. There are other types of reliability, each of which we will get to after we define reliability just a bit more. Test Scores: Truth or Dare? When you take a test in this class, you get a score, such as 89 (good for you) or 65 (back to the books!). That test score consists of several elements, including the observed score (or what you actually get on the test, such as 89 or 65) and a true score (the typical score you would get if you took the same test an infinite number of times). We can’t directly measure true score (because we don’t have the time or energy to give someone the same test an infinite number of times), but we can estimate it. Why aren’t true scores and observed scores the same? Well, they can be if the test (and the accompanying observed score) is a perfect (and we mean absolutely perfect) reflection of what’s being measured. But the bread sometimes falls on the buttered side, and Murphy’s law tells us that the world is not perfect. So, what you see as an observed score may come close to the true score, but rarely are they the same. Rather, the difference as you see is the amount of error that is introduced. Notice that reliability is not the same as validity; it does not reflect whether you are measuring what you want to. Here’s why. True score has nothing to do with whether the construct of interest is really being reflected. Rather, true score is the mean score an individual would get if he or she took a test an infinite number of times, and it represents the theoretical typical level of performance on a given test. Now, one would hope that the typical level of performance would reflect the
  • 66. construct of interest, but that’s another question (a question of validity). The distinction here is that a test is reliable if it consistently produces whatever score a person would get on average, regardless of what the test is measuring. In fact, a perfectly reliable test might not produce a score that has anything to do with the construct of interest, such as “what you really know.”LIGHTBOARD LECTURE VIDEOReliability So one of the qualities of a good test is that it's reliable. And a lot of times when you learn stuff, you hear about validity and reliability. And you think it's all the same thing. Well, validity is whether the score matches the thing that supposed to be measured. We're not talking about that. Reliability is different. Reliability refers to whether the score you just got on a test is the typical score you would have gotten on that test, not whether it matches the level of the trait you're trying to measure. That's validity. But if you took the test again tomorrow, would you get the same score if you took it yesterday? If you'd had a better breakfast, would your score be any different? That's what reliability is all about. And we can think about reliability as the purpose of a test is for you to get the score that represents your typical level of performance. If you took a test an infinite number of times and you averaged all those scores-- you alone-- that average is the typical level of performance you would get. So every time someone takes a test, when you take a test, you get some score on it. And if this middle bullseye is your typical level of performance, the test score you would get if you took the test infinite number of times and averaged it, that's the score you typically get. And if you get that score, then it's a reliable test. And if that's true for everybody, it's a reliable test. But there's some randomness. There's always randomness. That's the problem with social sciences. There's all this randomness in human behavior. And so the score you get won't be your exact typical level of performance. It will not be right here on the bullseye. It might be out here somewhere. It might be further out here. I mean, the further away it is from your