3. Context | DRIVERS Introduce multimedia assessment method Increase consistency between end-of-training assessments and on-the-job evaluations Increase authenticity of assessment Sam Gonzales, Director of Human Performance Technology
4. Context | DRIVERS Current Assessment Pencil-and-paper Question types Multiple choice Short answer Time consuming to score New Assessment Computer administered Multiple-choice Automatically scored 8 video clips Identify task Errors made? Identify errors or critical aspect
8. Context | Results of the test Correlation between end-of-training assessment and on-the-job performance evaluation was significantly improved in the experimental group
9. Key Stakeholders | Trainees Concerned about the difficulty and fairness of video questions
10. Key Stakeholders | Instructors Concerned about poor end-of-training performance reflecting on their own performance
11. Key Stakeholders | Supervisors Concerned about the changing workforce and preparedness of new employees
12. Performance Issues Facilitating preflight checks when unusual situations occur Unusual situations preflight Skills for facilitating preflight checks
13. Performance Issues Dealing with difficult passengers when unusual situations occur Difficult passengers increase task load and communication demand Communication skills for passenger interaction
14. Context | Results of the test FREQUENT UNUSUAL FREQUENT UNUSUAL
16. Solution | Add New Performance Goals Redesign the course, assessment and evaluation Add learning content to include instruction, discussion and additional practice handling unusual situations Add traditional test questions to assess the learner’s understanding of how to complete tasks in unusual situations Prompt the evaluator to distinguish between frequent vs. unusual situations when completing observational evaluations of performance
17. Evaluation | Conduct a second study Conduct a second study to evaluate the effectiveness of the added performance goals Control group of 20 completes original course and assessment, but the redesigned evaluation Experimental group of 20 is trained, assessed and evaluated using all the redesigned tools
18. Evaluation | Analyze Results Conduct reliability analysis for both training assessments without distinction between frequent and unusual performance goals Conduct a second reliability analysis on the experimental results WITH distinction between frequent and unusual performance goals
19. Evaluation | Analyze Results For each group, find correlation between end of training assessment and on the job performance evaluations without distinguishing between frequent and unusual performance goals
20. Evaluation | Analyze Results Compare on the job performance evaluations between the two groups WITH distinction between frequent and unusual performance goals
21. Summary | Recommendations Add content, assessment, and performance evaluation to support goals of completing tasks in unusual situations Adopt multimedia test method
22. References Ertmer, P. A. & Quinn, J. (2007.) The ID casebook: Case studies in instructional design, 3rd ed. Upper Saddle River, NJ: Pearson Prentice Hall. Fraenkel, J. R. & Wallen, N. E. (2008.) How to design and evaluate research in education. 7th ed. New York: McGraw Hill. Hale, J. (2007.) The performance consultant’s fieldbook: Tools and techniques for improving organizations and people, 2nd ed. San Francisco: Pfeiffer.
Notas del editor
This case study is based on measuring learning and performance of flight attendant new-hires at Atlantic Airlines. The airline is experiencing rapid growth and has recently hired 200 flight attendants. These trainees will soon be joining the existing workforce who has an average tenure of five years and measures high in both customer satisfaction and supervisor ratings.
The current instructor-led training program is built around the performance goals expected for the job, including various tasks around safety, service and communication. At the end of the training program, trainees must complete a comprehensive paper-based assessment.
In preparation for this influx of flight attendants, the director of human performance technology decided to try a new method of assessment, requiring trainees to answer questions about videotaped situations. He felt this method would allow for better alignment between the of the end-of-training assessment and on-the-job performance evaluation. He believed that this type of assessment would be more authentic.
In addition to increasing the effectiveness of the assessment, this new method could enhance efficiency as well. The currentend-of-training assessment is a pencil-and-paper test, with multiple choice and short answer questions. The open-ended questions must be scored manually, proving to be a time-consuming process.The new end-of-training assessment would be administered by computer, using only multiple-choice questions. It could be scored automatically, giving instant results. A series of eight questions for just two of the performance goals would be presented in video-clip format. After viewing the video clip, trainees were asked to identify the task being performed; state whether or not errors had been made in the performance of the task, and identify the errors or the critical aspect of the performance of the task.
Sam asked theirinstructional designer, Linda McMillan to produced a series of two-minute video clips for use in training and in the end-of-training assessments. The clips deal with two flight attendant performance goals: performing preflight checks and dealing with difficult passengers. For each goal, the clips portray flight attendants performing the tasks in either frequently occurring or unusual situations, and performing the tasks either correctly or incorrectly. Two sets of eight clips were produced, one for use during training for practice, and one set for use on the end-of-training assessment.
The participants in the experimental group performed about the same as the control group on the questions related to frequent situations, but they performed poorly on the four questions related unusual situations. This brought down the overall scores for the experimental group.[pause]
The reliability coefficient of the new test questions was poor, but improved when treated as 4 different performance measures[pause]
Correlation between end-of-training assessment and on-the-job performance evaluation was significantly improved in the experimental group[pause]
The flight attendant trainees who performed poorly on the new assessment items were concerned about the difficulty and fairness of the questions, and asked to be able to re-take the test in the original pencil-and-paper format.
The instructors were also concerned about the new items, judging them to be too difficult and giving the impression that they were not good instructors. They wanted to abandon the new multimedia assessments.[pause]
Supervisors were pleased that the end of training assessment was more predictive of the new employees’ job readiness[pause]
The results revealed that trainees performed poorly on multiple-choice questions based on the use of customer service skills in the event of an unusual situation on board. Unusual situations, such as emergencies, mechanical issues, and unsupervised children, are realistic obstacles that the Flight Attendant faces when delivering critical Preflight information to boarded passengers. Communication, flexibility and resourcefulness are the most important skills needed when expected situations occur.
When faced with a difficult passenger AND an unusual situation in the training videos, again, the trainees performed poorly. Unusual situations during flight, such as turbulence and technology failure, increases the task load that Flight Attendants must complete while airborne. Post training, trainees were evaluated using a checklist of standards critical to effective interaction and communication. The communication skills outlined in the evaluation checklist included: Listening to and paraphrasing passenger concerns, offering solutions, maintaining professional etiquette, and providing reassuring support.
The participants in the experimental group performed about the same as the control group on the questions related to frequent situations, but they performed poorly on the four questions related unusual situations. This brought down the overall scores for the experimental group.[pause]
Apparently, the new assessment questions were not only introducing a new multimedia method, but also new performance goals. In order to properly evaluate the efficacy of the multimedia assessment method, the questions related to performing tasks in unusual situations need to be eliminated. The data from the original test should be re-evaluated after eliminating results from the four test questions related to unusual tasks. Only then, will we know if the new video assessment method actually made a difference.
Because a significant improvement occurred in the correlation between the end of training assessment and on the job performance scores, this is worth investigating further. However, in order to do this properly, the course, traditional assessment and on the job evaluation must be redesigned to include this new set of performance goals.Although a couple practice video assessments were included in the original experimental treatment, for this new study, the course should be redesigned to provide instruction on reacting in unusual situations. In turn, additional traditional test questions must be included in the end of training assessment.Finally, in order to complete a proper correlation study between the newly designed end-of-training assessment and job evaluation tool should then be modified so that type of situation observed -- frequent or unusual -- can be recorded.
Now, with the course redesigned, we can conduct a second study to evaluate how the addition of the performance impacts on the job performance. Similar to the first test that we have re-analyzed to focus on the multi-media test method, we’ll compare two groups of 20 trainees. The control will participate in the original course and assessment. However, when they are evaluated on the job, the observed situations will be indicated so that we can distinguish between performance in frequent and unusual conditions.
The same reliability analysis will be performed on the training assessment results from both groups. Like in the original test, the analysis will be performed twice -- once while combining the results related to each job task -- under both frequent and unusual conditions, then and again with the frequent and unusual performance goals reported separately.
Now it’s time to see how the end of training assessment correlates with on the job performance. First we’ll do this while disregarding the distinction between frequent and unusual performance goals. This will allow us to compare results with the the first test we have re-analyzed to measure the effectiveness of ONLY the multi-media assessment method.
Finally, because we suspect that the real reason for the improved performance may be due to the additional performance goals, we will analyze the correlation while distinguishing between the frequent and unusual performance goals.
Based on the re-evaluation of the original test without the confounding factor, and the results of the proposed second test, we will make a recommendation to either add content, assessment and performance evaluation to support the goals of completing all tasks in unusual situations, adopt the multimedia test method, or do both.