Effective Training Evaluation: Criteria Development and Levels Overview

Purpose of Training Evaluation • Used to make decisions about selection, adoption, value and modification of training program. • Based on instructional objectives derived from needs assessment. • Blocks: • Lack of trained personnel • Resistance to evaluation • Outcomes of negative evaluations? • Evaluation not expected or rewarded. • Unrealistic criteria (ie., turnover used as criterion for measuring effectiveness of communication training.)

OVERVIEW • Day 1: Criterion Development • Day 2: Evaluation Designs

CRITERION DEVELOPMENT • CRITERIA are: • Standards by which training is evaluated. • Measures of training effectiveness. • Based on needs assessment and learning objectives. • Example: • PowerPoint training learning objectives: “Trainees will demonstrate proficiency in use of software.” • What does that mean? What criteria or outcomes will you use to measure “proficiency” ?

Evaluation of Criteria • CRITERION RELEVANCY • Right on target: KSA’s in training & evaluation are KSA’s needed for successful job performance. • CRITERION DEFICIENCY • Did you miss the boat? Did you leave out important job KSA’s in your training and/or training evaluation? • CRITERION CONTAMINATION • Did you miss the ocean? Are you evaluating training based on: • KSA’s that were not covered in training? • KSA’s that are not needed in job? • Needs Assessment critical for avoiding these problems.

LEVELS OF CRITERIA(Kirkpatrick, 1960) • REACTION • ADVANTAGES: • Widely used and accepted; cheap and easy. • Early reaction measures allow for mid-training changes. • Good feedback on way course is taught & trainee motivation. • DISADVANTAGES: • Little relation to learning or outcomes. • Feel-good sheets. • USE: • Anonymous answers. • Open-ended comment space. • Survey right after training and few months later (what should have been done differently?) • Analyze by group/department (which department had most positive reaction to training?)

LEARNING • ADVANTAGES: • More direct assessment of accomplishment of learning objectives. • More valid than reaction (self-report.) • Objective and quantifiable. • Taps learning rather than reaction to trainer. • DISADVANTAGES: • Time and cost. • USE: • Need pre and post test to assess whether change has occurred. • Change due to training? Need control group. • Format and scoring considerations: • Make sure questions are comparable. • Essay vs. multiple choice.

BEHAVIOR • ADVANTAGES: • Measures transfer of training to job setting. • Stronger case for effectiveness of training. • Good feedback for future needs assessments and re-design. • DISADVANTAGES: • Need to know what constitutes successful job performance; dependent on good task analysis in needs assessment. • Can’t control when or whether trainees will have chance to use new skills (opportunity bias). • USE: • Involve trainees, supervisors, peers & subordinates in both pre and post-training performance appraisals. • Before/after measures of job performance. • Collect post measures 3 months or more after training (lag effect.) • Compare to control group that did not receive training.

RESULTS • ADVANTAGES: • Cost-benefit analysis. Relation of training to costs, turnover, production, bottom-line.) • Provides information for utility analyses: • Does training decrease costs associated with poor selection? • How does formal training compare with on-the-job training? • DISADVANTAGES: • Outcomes are multi-determined - difficult to show relation between training and outcomes. • Positive relationship: Is it really due to training? • No relationship: Was it too much to expect? • Negative relationship: ??? • USE: • With criteria from other levels.

OTHER CRITERIA CLASSIFICATIONS • OUTCOME(Behavior/Results/Learning) vs. PROCESS(Reaction). • OBJECTIVE vs. SUBJECTIVE. • FORMATIVE(evaluate training process/reaction) vs. SUMMATIVE(trainee change/learning, behavior, results). • TIME • Immediate (taken during training: mid-term evaluations.) • Proximal (advanced training or shortly after training is over.) • Distal (taken a considerable time after training: transfer.) • NORM VS. CRITERION-REFERENCED • Norm: graded on curve. • Criterion: absolute threshold needed.

GUIDELINES FOR CRITERION DEVELOPMENT • Use MULTIPLE CRITERIA(reaction, learning, behavior & results.) • Different levels give different information. • Agreement/disagreement among levels. • Criteria derived from LEARNING OBJECTIVES & NEEDS ASSESSMENT. • Ensure CRITERION RELEVANCY & RELIABILITY. • Use CRITERION-BASED measures for critical outcomes (ie., how to fly a plane; drive a car.) • Use both LONG AND SHORT term measures.

EVALUATION(DAY 2) • OVERVIEW • Internal and External Validity • Threats to Validity • Research Designs

EVALUATION QUESTIONS • Based on criteria, has change occurred? • Is the change due to training?(internal validity) • Will the change occur for new trainees in the same organization?(external validity; intra-organizational validity) • Will the change occur for new trainees in other organizations?(external validity; inter-organizational validity.)

INTERNAL VALIDITY • Ability to say “A causes B.” • A = independent variable; predictor; training. B = dependent variable; criterion; performance. • Causality vs. Correlation. • Need to show 2 things: • 1. Change has occurred. • 2. Rule out alternative explanations for change: • Control alternatives (control group.) • Equivalency of 2 groups (random assignment.)

Examples • X P: Has change occurred ? • P1 X P2: Change has occurred, due to training? • P1 X P2 Add control group; change due to P1 P2: treatment or group differences? • R P1 X P2 Random assignment to 2 groups; R P1 P2: Rule out group differences; addresses external validity.

THREATS TO INTERNAL VALIDITY • HISTORY: • Events occurred between pre and posttest that affects posttest scores. • Controlled by control group. • MATURATION: • Participant changes (older, fatigued, more or less interested in training) between pre and posttests. • Control group and randomization. • INSTRUMENTATION: • Changes in grading standards, rater or instrument. • Control group.

Threats (cont’d) • TESTING: • Pretest sensitization: pretest affects posttest. • Threat to internal and external validity. • Control by using Solomon 4-group design: • 1. R P1 X P2 • 2. R P1 P2 • 3. R X P • 4. R P • Pretest has no effect: 1 = 3; 2 = 4.

Threats (cont’d) • STATISTICAL REGRESSION: • Statistical artifact occurs when trainees selected on basis of extreme scores. Error variance in instrument results in scores regressing towards mean on posttest. • Control with random assignment to 2 groups. • SELECTION: • Differences in characteristics between 2 groups (ie, women more likely than men to take training; effectiveness due to gender or gender X treatment interaction rather than training.) • Control with randomization. • SELECTION X MATURATION INTERACTION: • People who volunteer for interpersonal communication training have different levels of maturation: ready for training. • Control groups and randomization.

Threats (cont’d) • MORTALITY: • Differential loss in training and control groups (due to underlying traits, demographics, needs.) • Control with random assignment: groups equally likely to have traits, demographics to start. • DIFFUSION - IMITATION OF TREATMENT: • Treatment revealed to control group; performance increases. • COMPENSATORY EQUALIZATION OF TREATMENT: Others help control group. • COMPENSATORY RIVALRY: • Control group tries to catch up. • RESENTFUL DEMORALIZATION: • Control group performance decreases.

EXTERNAL VALIDITY • GENERALIZABILITY OF TREATMENT. • Will training work with different • People ? • Places ? • Times ? • Internal validity is a “pre-requisite” to external validity: Need to show training “works” before being concerned with generalizability of training.

THREATS TO EXTERNAL VALIDITY • REACTIVE EFFECTS: • Novelty & Hawthorne effects present early in training, absent later on. • PRETEST SENSITIZATION: • Use pretest in one group, not another. • INTERACTION OF SELECTION & TREATMENT: • Example: younger trainees do well with computers; training may be less effective with older trainees. • MULTIPLE TREATMENT INTERFERENCE: • Combination of training techniques critical, but not the same in future training.

Threats to External Validity (cont’d) • RANDOM SAMPLING reduces threats to external validity. • INTRA-ORGANIZATIONAL VALIDITY: • Will training work again in our organization? • Increase by re-checking needs assessment & effective evaluations. • INTER-ORGANIZATIONAL VALIDITY: • Will training work in other organizations? • Similarity of organizations & audiences key factors. • Needs assessment crucial.

RESEARCH DESIGNS:PREEXPERIMENTAL DESIGNS • Lacks Control Groups: Can’t show causality. • One-Group Posttest Only: X P • Can’t tell if change occurred. • Use if only option. • One-Group Pretest/ Posttest: P1 X P2 • Can assess if change occurred. • Can’t tell if it’s due to training. • Use: if can’t get control group.

EXPERIMENTAL DESIGNS • Uses RANDOMIZATION AND CONTROL GROUPS • PRETEST/POSTTEST CONTROL GROUP DESIGN • R P1 X P2R P1 P2 • Rigorous controls for many threats to internal validity, but not external validity: effect of pretest sensitization, diffusion of treatment, compensatory rivalry etc? • Ethical issues in using control groups; practical issues in random assignments and control groups. • SOLOMON 4-GROUP • Includes test for pretest sensitization. • Feasibility of 4 randomly assigned groups? • Can use when training entire organization.

QUASI-EXPERIMENTAL DESIGNS • Uses Intact Groups; not randomized. • NONEQUIVALENT CONTROL GROUP • P1 X P2P1 P2 • Susceptible to selection threat and interactions; pretest sensitization. • TIME SERIES DESIGN: • P1 P2 P3 P4 X P5 P6 P7 P8 • Better than one group pretest/posttest design; can check maturation effects; can’t assess history effects. • MULTIPLE TIME SERIES DESIGN: • Add a control group to above. • Best design uses random assignment. • Feasibility.

SUMMARY • USE MOST RIGOROUS DESIGN POSSIBLE. • ALL DESIGNS HAVE SOME LIMITATIONS. • ACKNOWLEDGE THREATS TO • INTERNAL VALIDITY • EXTERNAL VALIDITY • EVALUATION IS ONLY AS GOOD AS CRITERIA USED.

Exercise(Noe, 1999) • Consider this course as a training program. Identify: • 1. The types of outcomes (criteria) you would use in evaluating this course. • 2. The evaluation design you would use. • Justify your choice of a design based on minimizing threats to validity and practical considerations.

Effective Training Evaluation: Criteria Development and Levels Overview

Effective Training Evaluation: Criteria Development and Levels Overview

Presentation Transcript

Training Evaluation

Evaluation of Training

Evaluation of Training

Training Evaluation

PURPOSE OF TRAINING

Evaluation of Training Effectiveness

Training Evaluation

The Purpose of Training

Cardiorespiratory Fitness Purpose of Evaluation

Purpose of the Training

Training Purpose

Evaluation Purpose

Training Evaluation

Purpose of Network Evaluation

The PURPOSE of clinical evaluation

Evaluation of Training

Training Session Purpose

Purpose of Food Safety Training

TRAINING EVALUATION

Purpose of Speakers Bureau Training

Cardiorespiratory Fitness Purpose of Evaluation

Training Purpose