Development of health measurement scales - part 1

Development of health
measurement scales – part I

Dr. Rizwan S A, M.D.

OUTLINE OF PRESENTATION
• Introduction
-Basic concepts

• Devising the items
• Scaling responses
• Selecting the items
• Biases in responding
• From items to scale
• Article

Some terms
• Scale
• Subscales

• Items
• Responses

History of scales
• Initially we used mortality and morbidity indicators
• After these came down, they were no longer
representative or sensitive, creating need for new
health indices, happiness, QOL, sadness
• WWII – provided the impetus
• Scaling techniques developing psychologists
• Sampling techniques were developed by political
scientists
• Development in data analysis

Psychophysics and psychometrics
• Power law – humans can make consistent numerical estimates of
sensory stimuli
• Extrapolating this evidence to the concept that people can make subjective
judgments about health in a consistent manner

32 degree
Celsius

Depression
score - 32

Basic Steps in Scale development
1.Searching the Literature
•

Awareness of Existing scales for the same purpose

2.Critical Review
• Reliability
• Validity

Basic Steps
Reliability
-Reliability refers to the degree to which the results obtained
by a measurement procedure can be replicated.
Assessing Reliability
• Internal Consistency
• The average correlation among all the items in the measure.
• Its calculated by Cronbach’s alpha, Kuder-Richardson, Split halves

• Stability
• Reproducibility of a measure on different occasions.
• Inter-Observer reliability
• Intra-Observer reliability
• Test-Retest reliability

Basic Steps
Validity
• An expression of the degree to which a measurement measures what it
purports to measure
Types:
1. Face Validity: the relevance of measurement may appear obvious to the
investigator
2. Content Validity: the extent to which the measurement incorporates the
domain of the phenomenon under study.
3. Construct Validity : the extent to which the measurement corresponds to
theoretical concepts
4. Criterion Validity : the extent to which the measurement correlates with
an external criterion of the phenomenon under study
-Concurrent Validity
-Predictive Validity

Basic Steps
Traditions of assessments
Categorical model
(eg- DMS-IIIR)

Dimensional model
(eg-CES-D)

Diagnosis requires multiple criteria each
with threshold values

Occurrence of some features at high
intensity can compensate for nonoccurrence of others

Differences between cases and non-cases
are implicit in definition

Difference between cases and non-cases
are less clearly delineated

Severity is lowest in instances that
minimally satisfy diagnostic criteria

Severity is lowest among non-disturbed
individuals.

One diagnosis often precludes others

A person can have varying amounts of
different disorders.

Multidimensional scaling is a bridge between them

Basic Steps
• Reduction of measurement error
In Clinical observation
• Through training
• Interviewing skills
• Clinical experience

In Psychometric tradition
•
•
•

•

Items screened to meet certain criteria
Consistency of answers across many items
Scale as a whole checked if its meeting other criteria

Two solitudes can be merged using Diagnostic
Interview Schedule (DIS)in psychiatry
It is derived from clinical examination used to diagnosed psychiatric
patient but can be administered by trained lay people

• Introduction
• Basic concepts

• Article

Devising the items
• Item
• Refers to an individual question or response phrases in any health
measurements.

• First step in writing a scale is devising the items
• By exploring various sources
• Identifying strengths and weakness of each of them

• Items may be repeated from previous scales
• Advantages
• saves work and necessity of constructing new
• proof of being useful and psychometrically sound
• only way of asking about a specific problem

• Disadvantages

• outdated terminology
• inadequate or incomplete for domain under study

Devising the items
• Sources of items
1.
2.

Focus group
Key informant interviews
•
•

3.
4.
5.

Clinical observation
Patients

Theory
Research findings
Expert opinion

• A scale may consist of items derived from some or all of these
sources.

Devising the items
• After generation of items, Content validity should be
addressed
• Content Relevance: Each item on the test should relate to one of the
course objectives
• Content Coverage: Each part of the syllabus should be represented by
one or more question.

Devising the items
• Generic versus Specific scales (Fidelity versus bandwidth issue)

Generic scale
(Bandwidth)

Specific scale
(Fidelity)

Allows comparison across different
disorders, severity of disease,
interventions,
demographic and cultural groups

Questions will be relevant and
appropriate for any specific problem

Psychometric properties well established

Short

Devising the items
• Translation

• Translating each item into other language.
• Done by a person who is fluent in both English and target
tongue, knowledgeable about the content area and aware of
intent of each item and scale as a whole

• Back Translation

• done by another bilingual person, knowledgeable who
translates it back into English

• Re-establishing the reliability and validity within new
context

• Introduction
• Article

Scaling Responses
• A method by which responses will be obtained
• Divided into categorical or continuous variable
• level of measurement are decided
• Nominal, ordinal, interval, ratio

Scaling Response
1-Dichotomous scale: one that arranges items into either of two mutually exclusive
categories ,eg , Yes/no, alive/dead.
2. Nominal scale: classification into unordered qualitative categories; eg., Race, religion,
and country of birth.
3. Ordinal scale: classification into ordered qualitative categories, eg., Social class (I, II,
III,etc.).
4. Interval scale: an (equal) interval involves assignment of values with a natural distance
between them, so that a particular distance (interval) between two values in one
region represents the same distance between two values in another
region of the scale. Examples include celsius and fahrenheit temperature.
5. Ratio scale: A ratio is an interval scale with a true zero point, so that ratios between
values are meaningfully defined. Examples are absolute temperature, weight,
height.One value as being so many times greater or less than another value.

Scaling Response
• Categorical judgment

• Required when response to a question is either yes or no/simple
check

• Problems
• Uncertainty and confusion on the part of respondents
• Potential loss of information and reduced reliability
• Loss of efficiency of the instrument

Scaling Response
• Continuous judgment

• Required when the response to a question is a continuous
variable

• Methods to quantify it are
• Direct Estimation Technique
• Comparative Methods
• Econometric Method

Scaling Response
Direct estimation methods
1. Visual analogue scale

Scaling Response
2.Adjectival scale

Scaling Response
3. Specific scaling methods
A. Likert scale
•

Rater expresses an opinion by rating his agreement on series of
statements, wherein responses are framed on an agree-disagree
continuum.

Scaling Response
B. Semantic differential scale
•

To define a number of related dimensions of a characteristics on a
series of continuous bipolar scales

Scaling Response
• General issues in construction of continuous scales
• How many steps should there be?
• Is there a maximum number of categories?

Scaling Response
•
•
•
•
•
•

Should there be an even or odd number of category?
Should all the points on the scale be labelled or only the ends?
Do adjectives always convey the same meaning?
Do numbers placed under the boxes influence the responses?
Should the order of successive question response changes?
Can it be assumed that data are interval?

Scaling Response
• Critique of Direct Estimation Methods
•
•
•
•
•
•
•

Subjective judgment
Easy to design
Little pre testing
Easily understood
Halo effect
End aversion bias
Positive skew

Scaling Response
• Comparative methods
• These methods scale the value of each description
before obtaining responses, to ensure the response
values to be on interval scale
• Types
• Thurstones’ method
• Paired comparison technique
• Guttman method

Thurstone’s method of equal appearing
interval
1. Selection of 100-200 statements
2. No. of judges are asked to sort them into single pile from
lowest to highest
3. Median rank of each statement computed and it’s the
scale value of that statement
4. Select a limited no. of statements about 25 having equal
intervals between successive items and spanning the
entire range of values
5. Applying scale to respondent-they were asked to indicate
the statement which applies to him/her
6. Respondents score will be average score of item selected

Scaling Response
• Paired comparison technique
• Similar to thurstone’s
• Except here judges are asked to judge each item one at a time to
remaining items
• Proportion of times each alternative is chosen over each other option
• Convert the values to z-score using property of normal curve

Scaling Response Paired comparison technique

Scaling Response
•Guttman method
• Differs from thurstones’ in small sample 10-20 items
• No calibration is done
• Items are tentatively ranked according to increasing amount of attribute
assessed and responses are displayed in subject-by-item matrix were 1 is
endorsed item and 0 is remaining item
• Its an ordinal scale not interval
• Here coefficient of reproducibility and coefficient of scalability is used to reflect
deviation from perfect cumulativeness
• Best suited to behaviors which are developmentally determined ,where
mastery of one behaviour virtually guarantees mastery of lower order behaviour

Scaling Response -Guttman method
• eg. Assessment of function of lower limb in people with
osteoarthritis

• A=4,B-3,C=2,D=2,E=1

Scaling Response -Guttman method
• The indices which reflect how much an actual scale deviate
from perfect cumulativeness are
• coefficient of reproducibility
The degree to which a person’s scale score is a predictor of his
response pattern .
Varies between 0and 1;should be >0.9
• coefficient of scalability
Reflects whether the scale is uni dimensional and cumulative,
Varies betwenm 0 and 1 and should be at least 0.6

Scaling Response
• Critique of Comparative method
• Requires more time for their development
• Thurstone’s and paired comparison guarantee interval level
measurement

Scaling Response
• Goal attainment scaling(GAS)
• An attempt to construct scale which are tailored to specific individuals, yet can
yield results on a common ratio scale across all people
• If intervention worked as intended subject should score 0
• A higher mean score for all indicate goals were set too low
• not all subjects need have same goals

Critique
• Ability to tailor the scale to specific goals of the individual
• Each subject has his own scale, different number of goals and vary
criteria for each one
• Extremely labour intensive

GAS is useful when
• The objective is to evaluate intervention as a whole, goals for each
person are different and adequate resources for training goal setters and
raters are present.

Scaling Response
• Econometric Method
• Required to scale benefits along a numerical scale so that
cost/benefit ratios can be determined
• Health state is rated by averaging judgements from a large number
of individuals to create a utility score for the state.
• Here focus of measurement is described health state not the
characteristics of the individual respondent.
• eg-choice between medical management and CABG in managing
angina approached by the following methods:
• Von Neumann-Morgenstern standard gamble
• Time tradeoff technique

Scaling Response-Econometric method
Von Neumann-Morgenstern standard gamble
You have been suffering from angina from several years. As a result of your illness you
have chest pain after even minor physical exertion.You have been forced to quit your
job and spend most days at home watching TV. Imagine you are offered a possibility of
an operation that will result in complete recovery , though operation carry some risk
there is a probability p that you will die during operation. How large must p be before
you will decline the operation and choose to remain in your present state?
Closer the present state is to perfect health , the smaller the risk of death one would
be willing to entertain. Having obtained an estimate of p from subjects,value of
present state can be directly converted to 0-1 scale by 1-p.

Time trade off Technique
Imagine living the remainder of your lifespan in your present state 40 years.Contrast
this, with operation you can return to your perfect health for fewer years .How many
years would you sacrifice if you have perfect health?
So the respondent is presented with the alternative of 40 years in her present state versus
0 years of complete health.

Scaling Response
• Critique of Econometric method
• Difficult to administer
• Require a trained interviewer

• Multidimensional scaling
• Technique to examine the similarities of different objects which may
vary along a number of separate dimensions
• Begins with some index of how close each object is to every other
object and then try to determine how many dimensions underlie
these evaluation of closeness

Scaling Response-Multidimensional scaling

Selecting the items
A. Pre-test the items to ensure that they
1. comprehensible to target populations
2. Unambiguous
3. ask only a single question

B. Eliminate or rewrite any item which do not meet
the criteria above and pre test again
C. Discard items endorsed by very few (or many)
subjects

Selecting the items
D. Check for internal consistency of the scale using
1. Item-Total correlation
a) Correlate each item with the scale total omitting that
item
b) Eliminate or rewrite any with Pearson r’<0.20
c) Rank order the remaining one and select items starting
with highest correlation

Selecting the items
2. Coefficient α or KR-20
a) Calculate α eliminating one item at a time.
b) Discard any item where α significantly increases.

E. For multi scale questionnaire, check the item is in ‘right’ scale
by
a) Correlating it with the totals of all the scales , eliminating
items which correlate more highly on scales other than the
one it belongs to
b) Factor-analysing the questionnaire, eliminating items which
load more highly on other factor than the one it should
belong to.

Biases in Responding
• The people who develop a scale a scale, those who use
in their work, and the ones who are asked to fill it out,
all approach scales from different perspectives, for
different reason
• Optimizing
• Describes performance a task in a careful and comprehensive
manner

a) try to interpret meaning of the question itself
b) try to retrieve all the relevant information from their
memories.
c) use this information to form a single integrated
summary judgement
d) try to convey that judgment on the answer sheet.

• Satisficing
Giving an answer which is satisfactory but not optimal which
may include-Selecting the first response option(written
form),last option (verbal form),agreeing with every
statement, answering either true or false to each option ,
keep things as they are as a response or I don’t know

It can be minimized by keeping simple task, words that are
short and easy, response with all the possibilities and
motivation of respondents

• Social desirability(SD) and faking good
• The subject does not deliberately try to deceive or lie and gives a socially
desirable answer is-SD
• When the subject is aware and intentionally attempt to create a false
positive impression it is called Faking good

• SD depends on on individual sex culture question and its
context.
• Assessed by - Differential Reliability Index(DRI),Social
Desirability Scale,Desirability scale,Social Relation Scale.
• Faking good being volitional are easier to modify through
instructions and careful wording of the items than social
desirability

• Deviation and faking bad
• The tendency to test items with deviant responses is opposite of social
desirability and is known as Deviation
• Faking bad- When the subject is aware and intentionally attempt to
create a false negative impression it is called faking bad opposite of
faking good.

• Minimizing biases by
• Disguising the intent of the test
• Use of subtle items ones where the respondent is unaware of
• Random response technique

• Yea-saying or acquiescence and Nay-saying
• The tendency to give positive response such as yes ,like , true and
negative response such as no , dislike , false etc irrespective of the
content of the item is called Yea and Nay saying respectively.
• It can be reduced by having an equal number of item keyed in
positive and negative directions.

• End –aversion bias or Central tendency bias
• It’s the reluctance of some people to use extreme categories of a
scale.
• Its reduced by avoiding absolute statements at the end points and
including throw away categories at the ends.

• Positive skew
• When responses are distributed more toward favourable end .It
produces ceiling effect.
• It can be minimized by not putting average need in the middle or middle
is expanded.

• Halo
• When judgement made on individual aspects of a person’s
performance are influenced by the raters. Overall impression of the
person.
• It can be minimized by training of raters, basing the evaluation on
large samples of behaviour and using more than one evaluator

• Framing

• When the persons’ choice between two alternative states depends on
how these sates are framed.
• People are RISK AVERSE when Gain is involved and RISK TAKERS when in
loss situations.
• A-200 people will be saved.
• B-There is one third probability that 600 people will be saved,and two
third that nobody will be saved.
OR
A.400 will die.
B.There is a one-third probablility that nobody will die , and two third that
600 will die.

• The safest strategy for the test developer is to assume that all of
these biases are operative and take the necessary steps to
minimize them whenever possible.

From items to scales
• Differential weighting of items rarely is worth the trouble
• For test being developed for local use ,total score can be
obtained by adding up all the items
• For general use and to be comparable transform the scores
into percentile , z or T scores
• For attributes which differ between males and females or
which show development changes ,separate age or age-sex
norms can be developed.

• Combining items into a scale and expression of final score
1.
2.

Add the score of the individual item when items are equally
contributing to the total score
Weighting the items when some item may be more important
Each item is given either the same weight or different
weight by different subjects

3.

Transformation of final scores when comparing the scores on
different scales in
Percentiles
Standard ad standardized scores
Normalized scores

• Percentiles is the percentage of people who score below a certain value
,lowest being 0th percentile and highest is 99th percentile. Its easy to
understand , requires many scores ,non normal in distribution and being
an ordinal data cannot be analyzed by parametric statistics.
• To address the problems with percentiles scale, z score ,T scores can be
calculated : Z-score by transforming scale with a mean of 0 and a
standard deviation of1 and T scores by transforming z-scores using new
mean and standard deviation chosen arbitrarily.
• To ensure normal distribution of z and T scores we use normalized
standard score.

• Establishing the cut points
• Receiver Operating Characteristics curves-ROC curve
• Requires true positive rate(sensitivity) and true negative
rate(specificity)
• A graph is plotted where X axis is 1-specificity(false positive rate) and
Y axis is sensitivity(true positive rate).The diagonal runs from (0,0) in
lower left hand corner to (1,1) in upper right reflect characteristics of
a test with no discriminating ability. The better the test in dividing
cases from non cases , the closer it approach the upper left corner .
An index of goodness of test is area under the curve as D’

curve

ROC

Development of health measurement scales - part 1

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (7)

Similar to Development of health measurement scales - part 1

Similar to Development of health measurement scales - part 1 (20)

More from Rizwan S A

More from Rizwan S A (20)

Recently uploaded

Recently uploaded (20)

Development of health measurement scales - part 1