1. Current Trends in Rater Training: A
Survey of Rater Training Programs in
U. S. Organizations
C. Allen Gorman
East Tennessee State University
Joshua L. Ray
Tusculum College
John P. Meriac
University of Missouri-St. Louis
Thomas W. Roddy
East Tennessee State University
2. Introduction
• The accuracy of performance ratings is
important to the success of a performance
management system (Werner & Bolino, 1997)
• Two general strategies for improving rating
accuracy (Woehr & Huffcutt, 1994)
– Rating scale development
– Rater training
• Rater training has become the most widely
accepted strategy (Roch, Woehr, Mishra, & Kieszczynska,
2011)
3. Purpose
• No published research on the prevalence of
rater training programs in organizations
• Purpose is to fill the void by conducting a
survey of U.S. organizations to determine
– Do organizations utilize rater training
programs?
– If so, what types of training programs?
4. Rater Training
• In general, rater training is effective for improving the
quality of performance ratings (Smith, 1986; Spool, 1978)
• Two major benefits of rater training (McIntyre, Smith, & Hassett,
1984)
– Enhance raters’ knowledge and skills for carrying out evaluations
– Motivate raters to use the knowledge and skills learned in the
training program
• Two meta-analyses have empirically demonstrated the
overall effectiveness of rater training programs for
improving rating accuracy (Roch, et al., 2011; Woehr & Huffcutt,
1994)
5. Approaches to Rater Training
• From Woehr & Huffcutt (1994)
– Rater Error Training (RET)
– Performance Dimension Training (PDT)
– Frame-of-Reference Training (FORT)
– Behavioral Observation Training (BOT)
6. Rater Error Training
• Developed as a way to combat the
prevalence of psychometric errors in
performance appraisal ratings (Borman, 2001)
• Generally focuses on recognizing and
avoiding halo, leniency, and central
tendency errors (Woehr & Huffcutt, 1994)
• RET reduces halo and leniency errors (Smith,
1986)
7. But….
• RET inadvertently lowers levels of rating
accuracy (Bernardin & Pence, 1980; Borman, 1979;
Landy & Farr, 1980)
• Smith (1986) argued that RET actually
produces a meaningless redistribution of
ratings
• Rater errors may not be errors, but could
actually reflect true score variance (Arvey &
Murphy, 1998; Hedge & Kavanagh, 1988)
8. Performance Dimension Training
• Criticisms of RET shifted focus of rater training
literature toward rating accuracy (Athey &
McIntyre, 1987)
• PDT emphasizes the cognitive processing of
raters as the key to the success of rater training
• Typically involves having raters review the
rating scale or participating in the
development of the scale (Woehr & Huffcutt, 1994)
• Generally effective for improving rating accuracy
(Woehr & Huffcutt, 1994)
9. Frame-of-Reference Training
• Proposed by Bernardin & Buckley (1981) in
response to the disappointing results of RET
• Essentially an extension of PDT, but
incorporates a practice and feedback session
(Woehr & Huffcutt, 1994)
• Involves categorizing behaviors into
appropriate dimensions and correctly judging
the effectiveness of those behaviors (Sulsky & Day,
1992; 1994)
10. FORT
• Has emerged as the most popular approach
for improving rating accuracy (Roch et al., 2011)
• Meta-analytic effect sizes
– Cohen’s d = .83 (Woehr & Huffcutt, 1994)
– Cohen’s d = .50 (Roch et al., 2011)
11. Criticisms of FORT
• Does not instruct raters on how to process behavior
information with goal of remembering the behavior
at a later time (Noonan & Sulsky, 2001)
• May cause raters to see certain behaviors that were
never exhibited (Noonan & Sulsky, 2001; Sulsky & Day, 1992)
• Little attempt to measure the information
processing that supposedly occurs during training
(Arvey & Murphy, 1998)
• Overreliance on standard videos of performance and
student raters in contrived rating situations (Arvey &
Murphy, 1998; Noonan & Sulsky, 2001)
12. Behavioral Observation Training
• Emphasizes the accuracy of behavioral
observations
• Important when considering that raters often
must observe performance in noisy
environments where competing demands
deplete cognitive resources (Noonan & Sulsky,
2001)
• Typically involves note taking or keeping a
diary (Woehr & Huffcutt, 1994)
13. BOT
• Reduces rating errors (Bernardin & Walter, 1977;
Latham, Wexley, & Pursell, 1975)
• Leads to increased observational accuracy
(Thornton & Zorich, 1980)
• Significantly increases rating accuracy (Hedge &
Kavanagh, 1988; Noonan & Sulsky, 2001; Pulakos, 1986)
• Criticisms
– Lack of agreement on what constitutes an
observational training program (Noonan & Sulsky,
2001)
– Note taking and diary keeping is likely impractical
14. Combinations of Rater Training
Approaches
• RET + FORT = no significant increase in
rating accuracy (McIntyre et al., 1984; Pulakos,
1984)
• RET + other approaches = no increase in
rating accuracy (Smith, 1986)
• FORT + BOT = no significant increase in
rating accuracy beyond FORT alone (Noonan &
Sulsky, 2001; Roch & O’Sullivan, 2003)
15. Summary of Rater Training Research
• FORT has become the “go to” training
– Although may have limited generalizability
• RET is effective
– At reducing rating accuracy
• Practice and feedback appear to be important
components of any successful rater training
program (Borman, 2001; Latham, 1986; Smith, 1986)
• Accumulation of empirical evidence suggests that
rater training programs should be worthwhile
interventions for improving ratings in organizations
16. However….
• Lack of widespread adoption of rater training
programs in applied settings (Bernardin, Buckley, Tyler, &
Wiese, 2001)
– Time consuming and expensive to implement (Stamoulis &
Hauenstein, 1993)
– Developing target scores for computing rating accuracy
indices is complex and time consuming (Bernardin et al.,
2001; Ilgen & Favero, 1985)
– May be insufficient due to low levels of user acceptance
and political influence (Carroll & Schneier, 1982; Longnecker, Gioia,
& Sims, 1987)
– Has yet to be shown to generalize across jobs and
members in organizations (Arvey & Murphy, 1998)
17. The Present Study
• No scholarly evidence of the prevalence and
types of rater training programs in
organizations today
• Some anecdotal evidence
– TVA, JP Morgan Chase, Lucent Technologies, AT&T
have adopted rater training programs (Levy, 2010)
– Employers Resource Council (2008) – 46% of the
73 organizations surveyed provide rater training
• Exploratory research question: Is rater training
related to organizational performance?
18. Method
• Procedure
– Survey part of a larger data collection effort on current
performance management practices (Gorman, Ray, Nugent, et al.,
2012)
– Recruited HR executives to complete survey
• Directly e-mailing HR departments in Fortune 500 companies
• Advertising on popular online business forums
• Asking HR execs to forward survey to other HR execs
• Participants
• HR executives from 101 U.S. organizations
• 88% report revenues of 1 million + dollars
• 88% employ at least 100 employees
• Largest percentage of the organizations were headquartered in the
Southeastern U.S. (44%)
19. Measures
• Rater Training
– 8 items (e.g., Does your company train managers how to conduct
performance appraisals?)
• Performance appraisal system effectiveness
– 1 item (Overall, how would you rate the effectiveness of your company’s
performance appraisal system?)
– 1 (extremely ineffective) to 5 (extremely effective)
• Performance appraisal system fairness
– 1 item (Overall, how would you rate the fairness of your company’s
performance appraisal system?)
– 1 (extremely unfair) to 5 (extremely fair)
• Firm-level performance
– 1 item (Approximately how much revenue does your company make
annually?)
– 1 (less than $1 million) to 4 (more than $100 million)
20. Results
Do Organizations Conduct Rater Training?
Response Train Managers Train Non-Managers Refresher/Recalibration
Training
Yes 77 31 50
No 24 70 19
21. Results
Frequency of Rater Training Approaches
Rater Training Approach Frequency Percent
No training 24 23.76%
Rater error training 13 12.87%
Performance dimension training 23 22.77%
Frame-of-reference training 31 30.69%
Behavioral observation training 8 7.92%
Other 2 1.98%
22. Results
Who Conducts Rater Training?
Training Conducted by Frequency Percent
External consultant 2 2.63%
Internal consultant 2 2.63%
Human resource personnel 61 80.26%
Department manager 6 7.89%
Other 5 6.58%
23. Results
Frequency of Rater Training
Frequency of Rater
Training
Frequency Percent
Less than one time a year 6 8.00%
One time per year 28 37.33%
Two times per year 13 17.33%
Three times per year 0 0.00%
Four times per year 3 4.00%
As needed 25 33.33%
24. Exploratory Analyses
• Control variable:
– Company size
• Performance appraisal systems that utilize
managerial rater training were judged to be
more effective (M = 3.70, SD = 1.51) than
those that do not (M = 3.37, SD = 1.61), t(98)
= 1.77, p < .05.
25. Exploratory Results
• No significant difference in perceived fairness
of performance appraisal system
• Organizations that utilized managerial rater
training generated higher revenue (M = 3.09,
SD = 1.03) than those that did not (M = 2.71, SD
= 1.04), t(98) = 3.07, p < .01.
• Performance appraisal systems were perceived
as significantly more legally defensible when
the system included a rater training program,
χ2(1, N = 101) = 4.13, p < .05.
28. Discussion
• Rater training is alive and well
– 76% of organizations surveyed utilize
managerial rater training
– 31% train non-managers
– FORT (40%) and PDT (30%) most popular
– Preliminary evidence that rater training is
linked to firm-level performance
29. Discussion
• Encouraging results
– In contrast to the presumed scientist-
practitioner gap in performance appraisal (Banks
& Murphy, 1985; Bretz, Milkovich, & Read, 1992)
– Evidence-based approaches are the
predominant rater training methods in use
today
– Widespread adoption across many
organizations and industries
30. Areas for Improvement
• Only 22 of the 77 organizations that offer
rater training have evaluated the training
• Majority of rater training sessions are only
offered either once per year or as needed
31. Limitations
• Single source data
• Links between rater training and firm
performance are not causal
• Training programs in practice may not
contain all elements of what is described in
the literature
• Small number of organizations; may not be
generalizable
32. Thank You!
• If you would like a copy of the chapter,
please e-mail me
– gormanc@etsu.edu