3. Zoom out: where are we?
We have:
A research question
An idea for a research design
A hypothesis
But how do we measure what we’re interested
in?
4. Scales of Measurement
to measure them
We study variables and need
accurately
4 scales of measurement
Nominal
Ordinal
Interval
Ratio
5. Nominal Scale
symbols classify or categorize into GROUPS or
TYPES
Name, Categorize, Classify
Caution: use of numbers to indicate group
Examples- gender, marital status, experimental
condition
6. Ordinal Scale
A rank order scale of measurement
Examples- order of finish, Letter grade in class,
social class (low, med., high)
Allows you to determine which person is higher
or lower but not how much higher or lower.
Can’t make direct comparisons
7. Interval Scale
Rank ordering PLUS equal intervals of distance
between adjacent numbers
Example- Celsius and Fahrenheit temperature, IQ
scores, year
Now you can make comparisons
Equal distances but no absolute zero point
8. Ratio Scale
rank ordering, equal intervals PLUS an absolute
zero point
Absolute zero = absence of variable
Examples- Kelvin temperature, income, weight,
height, response time.
9. Psychometric properties
Reliability: Consistency/stability of scores
Validity: Are you measuring what you are trying
to measure?
Ideally, we want:
Measures that are reliable
Inferences that are valid
Reliability is necessary but not sufficient in order
to have validity
14. Measuring Reliability
4 Primary types
Test-Retest Reliability
Equivalent- Forms Reliability
Internal Consistency Reliability
Interrater Reliability
Indicate level of reliability with a reliability coefficient
Correlation; should be positive and strong (> .70)
15. Test- Retest
Refers to consistency over time
Same measure administered twice (with a time
interval between)
16. Equivalent-Forms Reliability
Equivalent forms- two versions of the same
measure
Administer to the same group of people
Problem- hard to develop equivalent measures
Example: SAT, GRE
17. Internal Consistency
Consistency with which test items measure a
single construct.
More items increases reliability, but we use as
few items as possible
Why?
18. Example: Internal
Consistency
I feel sad
I feel down
I feel depressed
I feel miserable
I feel awful
19. Example: Internal
Consistency
I feel hungry
I feel happy
I have green eyes
Big Bird is scary
I like turtles
http://www.youtube.com/watch?v=CMNry4PE93Y
20. Internal Consistency
Measured using coefficient alpha (α)
a.k.a. Cronbach’s alpha
Should be .7 or higher
High values mean the items are measuring the
same construct
If your scale measures more than 1 thing, each
construct gets its own coefficient α
21. Interrater Reliability
of ratings made
Interrater reliability- consistency
by different judges
GRE writing section
Expressive writing studies
Correlation between ratings should be strong/positive
22. Interobserver Agreement
observers agree
percentage of times different
% of times raters agree- easy to calculate and
understand
23. Validity
Accuracy of inferences or interpretations made
on the basis of scores
Measuring schizophrenia, or love
We can’t directly observe it!
It’s the accuracy of the interpretation from the test
24. Validity
Construct
Operationalization
Important to consider:
Does your operationalization truly reflect what you’re
measuring?
Validation
Never-ending process
25. Obtaining Validity:
Based on Content
Content validity: judgment of the degree to
which items adequately represent a construct’s
domain.
Do items appear to represent the thing you’re trying to
measure? (face validity)
Does your measure exclude any important parts of
what you’re trying to measure?
Does your test measure something besides what you
wanted? (i.e., include irrelevant items)
26. Obtaining Validity:
Based on Internal Structure
Some constructs are multidimensional and need
measures that address all dimensions
Homogeneity—degree to which a set of items
measure a single construct
Item-to-total correlation
Coefficient alpha
27. Obtaining Validity: Based on
Relations to Other Variables
Criterion-related validity: degree to which scores
predict or relate to an already established test
Two types of criterion validity:
Predictive: using your measure to predict future
performance
Concurrent: using your measure to predict current
performance on the same construct, or a related one.
28. Obtaining Validity: Based on
Relations to Other Variables
Convergent validity: relationship between your
measure and other measures of that same
construct
Discriminant validity: evidence that scores from
your measure are NOT similar to scores of tests
on different constructs.
29. Appropriate Use of Reliability
and Validity Info
Reliability and validity info apply to the measure
of interest in the reported sample
Situation-specific, not broad
Standardized tests: norming group
If you want to use a test with a group not represented
in the norming group, be cautious
Report R & V for your own sample, and be wary of
articles that make blanket statements about a
measure’s R & V