D8 and d9 personality test development 10 2007-posting

Personality Test
Development
Introduction to Clinical
Psychology
Discussion Section #8 and #9

Personality Test Construction
Goal:
 Gain an increased understanding of the
concepts reliability and validity as they
pertain to tests
 Gain an increased understanding of test
development methods

Test Construction Procedure
1.
2.

3.
4.
5.

Identify a need for a new test
Assemble an item pool (decide on
scale and item formats)
Pilot item pool
Select “good” items
Examine test’s psychometric properties
(reliability and validity)

1. Identify Need for a New Test
 What

is the objective of the new test/is
there really a need for it
 How will the test be administered?
 What is the ideal item format for this
test?
 Should more than one form be
developed?
 What special training will be required of
test users in terms of administering or

2. Assemble Item Pool
Two decisions:
 Content
 Format

Content
 Develop

a pool of items that fully
measure the construct
 Example: Depression
 What items should be included in the
pool?

Format
 Dichotomous

(true false)
 Polychotomous (multiple choice)
 Likert scales (degree of agreement)
 …many others

3. Pilot Item Pool
 Try

the pool of items out on people for
whom the test is being developed
 Test should be administered under
conditions similar to those that the
developed test will be administered (e.g.
same instructions, time frame, time
limits)

4. Select “Good” Items
Selecting “good” items involves complex
statistical analysis of the test results
which varies according to the purpose of
the test.(called item analysis)
However, in tests of attitudes or personality
characteristics one consideration is
whether individuals endorse the full
range of the scale provided.

5. Examine Test’s Psychometric
Properties
 Does

the test yield consistent results
(reliability)?
 Do the test items measure the intended
construct (validity)?

Test Construction Exercise: Part 1
 Develop

a test that distinguishes first and
later born children

Test Construction Exercise:
Procedure
Divide into groups of 4 to 5 students
In Class
 As a group, develop an item to distinguish first
born from later born children
Note: use a personality construct and not a
physical characteristic (e.g. I have no older
siblings)
 Develop two responses for the item
 Once your item is ready, tell Sara or Eunyoe
so they can write it on the board (so others
won’t give the same item)

Administer Test
Item

% First Born
Agree

% Later
Born
Agree

Administer Final Test and Score!

Psychometric Properties
of Tests

Reliability and Validity

Reliability
 Consistency

of the observations or
measurements
 Reliability is inversely related to the
degree of error in the instrument.
 High measurement error translates to
low reliability
 Low measurement error translates to
high reliability

What !?
What does this mean!?


High measurement error
translates to low
reliability



Low measurement error
translates to high
reliability

Easy Example: A broken
scale
There will be high
measurement error on a
broken scale, correct?
How consistent are the
weights likely to be on a
broken scale?
Is a broken or working
scale going to have
more error?
Is the broken or working
scale going to be more
reliable?

Types of Measurement Error
Random
Factors unpredictably
influence
measurements.

Systematic
A persistent bias in the test
or in the interpretations
made by examiner.

Examples:
Mood, environmental
distractions, hunger or
motivation interfere with
the responses.

Systematic errors, because
they are consistently
made will not affect
reliability but they will
affect validity

Types of Reliability
 Inter-rater

reliability (relevant to
observational systems and psychological
assessments requiring ratings or
judgment)
 Test-retest reliability
 Split-half
Note: Each form of reliability is not equally
important for every assessment method

Inter-rater Reliability


Degree of correspondence between two raters



Inter-rater reliability of diagnoses based on
DSM criteria improved with DSM-III and the
development of operational criteria for most of
the mental disorders

Note: We will learn how to calculate next week!.

Test-Retest Reliability
 The

consistency of results over periods
of time.

 The

consistency of the results for a test
given at two different time periods

 The

correlation of test result scores

Quantifying Test-Retest Reliability


Reliability is expressed as a correlation
coefficient



Values range from 0 (not at all consistent or
reliable) to 1 ( perfectly consistent and reliable.



The value for adequate reliability is about .80
or greater

Factors Affecting Test-Retest
Reliability Estimates



Length of the intervening interval
Stability of the measured trait

For example:
In characteristics that are stable, like intelligence, the
interval of time between the two tests should not affect
the stability of the results.
In contrast, in characteristics that are not stable, like
depressed mood, the longer the interval between tests,
the less reliable or consistent the scores. (not necessarily
bad)

Split Half Reliability
 The

consistency of scores on two halves
of the test

Validity
A test can be reliable (consistently give the
same results) but not valuable.
Why?
If the test does not measure the correct
construct, then it is not useful even if the
results are consistent.

Validity
 The

degree to which a test measures
what it is designed or intended to
measure.

Types of Validity
 Face

validity
 Content validity
 Criterion validity (predictive and
concurrent)
 Discriminant
 Construct validity

Face Validity
A judgment about the relevance of test items
 A type of validity that is more from the
perspective of the test taker as opposed to the
test user
Example: Personality tests
Introversion-Extroversion test will be perceived
as a highly (face) valid measure of personality
functioning
The inkblot test may not be perceived as a (face)
valid method of personality functioning


Content Validity
 Degree

to which the measure covers the
full range of the (personality) construct.
and
 Degree to which the measure excludes
factors that are not representative of the
construct

Criterion Validity
 The

degree to which the test results
(from your measure) are correlated with
another related construct.
 WHAT!?
For example: the degree to which scores
on an intelligence test are correlated with
school performance or achievement.

Types of Criterion Validity



Concurrent: the two constructs are assessed at the same
time
Predictive: one construct may be measured at a later
date

For example:
Concurrent: the correlation of SAT score with G.P.A. at the
time of taking the SAT in high school.
Predictive: the correlation of SAT score taken in high school
with final G.P.A. upon graduating from college

Discriminant Validity


The degree to which the score on a measure
of a personality trait does not correlate with
scores on measures of traits that are unrelated
with the trait under investigation.

For example: (from text)
Trait being measured: phobia
Unrelated trait: intelligence
You would not expect the score on your phobia
scale to be correlated with the score on an
intelligence test

Construct Validity
 The

degree to which the measure
reflects the structure and features of the
hypothetical construct that is being
measured
 Measured by combining all these
aspects of validity.

Exercise: Reliability and Validity applied to the
Edinburgh Postnatal Depression Scale (EPDS)

 Let’s

consider reliability and validity in
the context of a real measure: the EPDS

What is the Edinburgh Postnatal
Depression Scale (EPDS)?







John Cox, Jenifer
Holden & Ruth
Sagovsky
10 item depression
screening tool
(reliable and valid)
Simple to complete
Acceptable to
mothers and health
workers

What is the Edinburgh Postnatal
Depression Scale (EPDS)?
Psychometric Characteristics
 10 item scale
 Assesses mood aspects of depression
not confounding somatic symptoms
 Acceptable to women
 Validated
 Translated into many languages

Stems of all 10 EPDS Items








I have been able to laugh and see the funny side
of things.
I have looked forward with enjoyment to things.
I have blamed myself unnecessarily when things
went wrong.
I have been anxious or worried for no good reason.
Things have been getting on top of me.

Stems of all 10 EPDS Items
(cont)









I have felt scared or panicky for no very good
reason.
I have been so unhappy that I have had
difficulty sleeping.
I have felt sad or miserable.
I have been so unhappy that I have been
crying.
The thought of harming myself has occurred to
me.

Psychometric Evaluation of the
EPDS: An Exercise
 Is

the EPDS a good measure of
depression?
 Psychometrically, what does it mean to ask
if the EPDS is a “good” measure of
depression?
Note: Follow the questions on the handout

Test Construction Exercise:
Part 2: Evaluating Developed Tests

Regroup into your “test groups”
2. Evaluate items in terms of content
validity and adequacy of scales
3. Select final items for test
4. Propose methods for evaluating
reliability and validity of new measure
1.

D8 and d9 personality test development 10 2007-posting

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Destacado

Destacado (20)

Similar a D8 and d9 personality test development 10 2007-posting

Similar a D8 and d9 personality test development 10 2007-posting (20)

Más de Blessed Santos

Más de Blessed Santos (14)

Último

Último (20)

D8 and d9 personality test development 10 2007-posting