Developing a Comprehensive Plan for Evaluating Teaching Effectiveness

Developing a Comprehensive Plan for Evaluating Teaching Effectiveness One Department’s Experience

Defining the Territory How can we distinguish “the best” teachers in the department from “the average” teachers in the department? What information about the quality of teaching in the department should come from students? What other sources of evaluation should be used and how should they be weighted?

Distinguishing “best” from “average” • Students are enthusiastic about the class • Shows creativity in teaching techniques and methods of evaluation • Students report long-range effects of learning in the class • Is able to teach students at different levels/different intellects using a range of teaching activities • Makes goals of course clear to students, both overall and week by week

Was instructor prepared? Did class match the syllabus? Does evaluation cover what was read/talked about in class? Was instructor available to students? Was instructor organized? Did instructor explain material clearly for students at all levels? Did students learn in the class? What should students tell us about our teaching?

Peer Evaluation-- should count for 25%-40% of overall evaluation (range 0-66%) Self Evaluation--should count for 15%-20% of overall evaluation (range 10%-33%) Student Evals--should count for 30%-50% of overall evaluation (range 10%-66%) Other factors to consider: # courses; # students; overall impact on teaching mission of dept. How much weight should be assigned to various factors?

The Peer Evaluation Process

Results of Previous Research Feldman, K.A. (1992). Instructional effectiveness of college teachers as judged by teachers themselves, current and former students, colleagues, administrators and external (neutral) observers. Research in Higher Education, 33 (3), 317-375. Root, L.S. (1987). Faculty evaluation: Reliability of peer assessments of research, teaching, and service. Research in Higher Education, 26, 71-84. Chism, N. (1998). Peer Review of Teaching. Bolton, MA: Anker Publishing.

Summary • When peer evaluations of teaching are based solely on classroom observation, only slight interrater agreement can be expected. • Peer evaluators should concentrate on accepted teaching effectiveness criteria (seeCohen, P.A., & McKeachie, W. J. (1980). The role of colleagues in the evaluation of college teaching. Improving College and University Teaching, 28, 147-154.) • There is little or no research on the reliability or validity of peer evaluations of documentary evidence of effective teaching.

What have we done with the peer evaluation piece? • 1. We are focusing on a form of “portfolio evaluation” in which outside evaluators will be asked to review and evaluate the content and organization of classes. They will comment on currency of information, completeness of coverage, appropriateness of content (given level of the course and state of knowledge in the field)

Peer Evaluation, cont. • 2. Each instructor will also videotape a representative class (instructor selected), focusing the camera not only on the instructor, but also on the students, to have a sense for how students respond to what the instructor is doing and how the instructor responds to the students (recognizing confusion, boredom, etc.)

1. How often should we do this? Portfolio compilation is time consuming, and outside evaluators will have to be compensated. Should non-tenured and tenured faculty be on a different schedule? 2. How frequently should we do the videotaped evaluation? 3. Can we develop checklists to use for peer evaluation, so everyone is evaluated using the same items? Issues to be resolved about peer evaluation

The Student Evaluation Process

Results of Previous Research • 1. Are student evaluations reliable? Are they valid? • Marsh, H.W. (1987). Student evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388. • Marsh, H.W. & Roche, L.A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52 (11), 1187-97.

Summary • Intraclass correlation coefficients suggest that reliability increases as number of students increases, with 10 students being the minimum number for acceptable reliability. • If student evaluations are being used for administrative purposes, it is suggested that at least five courses with at least 15 students in each course be the minimum number used for each instructor. If different types of courses are included (grad vs. undergrad) several ratings of the different course types must be included.

More research. . . • Is there a relationship between course evaluation and student learning? • Cohen, P.A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51, 281-309. • Abrami, P.C., d’Apolonia, S., & Cohen, P.A. (1990). The validity of student ratings of instruction: What we know and what we don’t. Journal of Educational Psychology, 82, 219-231. • Wilson, R. (1998). New research casts doubt on value of student evaluations of professors. Chronicle of Higher Ed., 44 (19), A12.

Summary • Generally, student course evaluations are positively correlated with student learning as measured by multiple choice tests. Correlations range from .2 to .7, depending on design issues (such as number of students in the study). We need more data to determine whether correlations exist with learning measured in other ways.

Are student evaluations biased? If so, how? • “Bias” would exist if a circumstance having nothing to do with a teacher’s effectiveness influenced students’ ratings of that teacher. For example, if students consistently gave better evaluations to teachers with blond hair, that would be a biased evaluation. Faculty often cite the following as possible sources of bias: course difficulty, grading leniency, instructor popularity, course work load, class size, student GPA.

Relevant Research • Feldman, K.A. (1988). Effective college teaching from the students’ and faculty’s view: Matched or mismatched priorities? Research in Higher Education, 28, 291-344. • Educational Testing Service (1990). Interpretive Guide and Comparative Data, Student Instructional Report. Princeton, N.J.: ETS. • Abrami, P., Dickens, W., Perry, R., & Leventhal, L. (1980). Do teacher standards for assigning grades affect student evaluations of instruction? Journal of Educational Psychology, 72, 107-118.

Summary • Class size matters. Small discussion classes and large enrollment classes with discussion sections tend to get the best evaluations. • Discipline matters. Math and science classes get lower scores for faculty-student interaction and for course difficulty and work load, but not for course organization and planning or for tests and exams. These differences, however, DO NOT affect all teachers. Some science teachers get very high ratings, some humanities teachers get low ones.

Summary, cont. • Grading leniency does NOT seem to affect student ratings. Students usually don’t know their grades prior to filling out the rating scale, and students appear to be equally, if not more influenced by their perception of how much they learned in the class. Administrators concerned about whether a teacher’s high evaluations resulted from artificially inflated grades should consult the grading distributions over time for that teacher vs. the department as a whole.

What can you discover if you develop your own evaluation instrument?

Step 1: Developing an Instrument • Based on faculty responses to our initial questionnaire, and on interviews with several concerned faculty members, we developed a draft of a department-specific evaluation form, reflecting what WE believed students could and should tell us about our teaching.

Step 2: Using the SPHS Instrument along with BEST • For 4 semesters, teachers in the department used both the SPHS form and the BEST form to collect course evaluations. Students were told that we were attempting to develop a new evaluation form. The two forms were given at the same time, with the instructor out of the room while students filled them out.

Step 3: Analyzing the Data • Several questions were asked: • 1. What is the underlying factor structure of the SPHS form? • 2. Are faculty rankings using SPHS and BEST forms correlated? • 3. Can rankings of faculty using the two forms be predicted by some subset of the items?

1. The Underlying Factor Structure • Three factors account for 92% of variance: • Factor 1: Effectiveness of class for student learning • Factor 2: Quality of teaching methods and exams • Factor 3: Instructor focus on student learning. • Split-half reliability assessment showed the factor structure to be stable.

Factor 1: Effectiveness of Class for Student Learning • Course readings were of assistance in learning subject matter (1=always; 5= never) • Assignments reflected material emphasized in the course (always-never) • Instructor cleared up points of confusion for me (always-never) • I attended this class A-always; B-usually; C-sometimes; D-rarely; E-never

Factor 2: Quality of Teaching Methods & Exams • The instructor used a variety of teaching methods that were appropriate & helpful. (1=strongly agree; 5=strongly disagree) • Quality of assignments/examinations given in class (1=lowest, 5=highest) • Overall quality of this instructor relative to other SPHS instructors (1=lowest; 5=highest)

Factor 3: Instructor Focus on Student Learning • The instructor explained the material clearly. (1=always; 5=never) • The instructor stimulated useful class participation. (1=always; 5=never) • The instructor was interested in helping students learn. (1=strongly agree; 5=strongly disagree) • The instructor and/or AI were available to students outside of class.(1=sa; 5=sd)

BEST Factor Structure • 3 factors account for 58% of variance. Factor 1 accounts for 46.9% of variance. • I. Instructor behaviors (my instructor explains material clearly; My instructor is well prepared for class meetings) • II. Exams and grading (grading procedures are fair; Exams cover most important aspects of course) • III. Instructor/student interaction (instructor recognizes when students fail to comprehend; Instructor makes me feel free to ask questions)

2. What are the rankings like using the two forms? • 10 required UG and 6 required grad courses were ranked in terms of the average total rating assigned (rating across scales and students) on both the SPHS and BEST forms. These courses were chosen because they all had enrollments greater than 30.

BEST & SPHS UG Rankings

BEST & SPHS Grad Rankings

The spread of ratings from the best rated to the worst rated course is not great-- 1.3 points separate best from worst grad course, and 1.13 points separate best from worst undergrad course. Undergrad ratings are compressed at the upper end. The best rated course is separated from the #5 course by only .24 point. DO WE HAVE “RATING INFLATION” AT WORK HERE? What is notable about these rankings?

Are the ratings skewed?

3. Predicting the Rankings • 46% of the variance in the SPHS rankings can be accounted for by one item (#27): • Overall, in this course I have learned ____ in most other courses. • A. Much more than • B. More than • C. About the same as • D. Less than • E. Much less than

Predicting SPHS rankings, cont. • Item 27 is highly correlated with: • Overall quality of this instructor relative to other instructors at IU. (1=lowest;5=highest) • Adding 2 additional items accounts for 81% of total variance in the rankings: • Overall quality of this course relative to other courses taken at IU • Overall quality of this instructor relative to other SPHS instructors

Predicting BEST Rankings • One item (#5) accounts for 73% of total variance in the rankings: • My instructor explains the material clearly. • Adding item #2 (Overall, I would rate this instructor as outstanding) brings total variance accounted for to 76%. • No other individual items appreciably increase the % of variance accounted for.

Other Sources of Input about Teaching • Exit interviews with graduating seniors and graduate students (conducted by emeritus faculty & undergrad advisor) • Alumni surveys (conducted by mail) • Core content area exam given to graduating seniors (a 70-item multiple choice test made up by the faculty)

Exit Interview Data • Among other questions, students were asked to indicate which 3 courses in the major had the most value and which three had the least value. Courses most valued were also those in the top 5 in the student evaluations, and courses least valued were in the bottom 5 in the evaluations.

Core Content Examination • 74% of graduating seniors took the exam. • Analysis of results shows areas of the curriculum in which students did best & worst. These areas correspond roughly to the courses students rate highest and lowest on student evaluations. • Breakdown of student performance by student GPA showed students w/ highest GPA performed better.

Alumni Surveys • Graduate alumni (MA-degree holders in clinical practice) are surveyed periodically by mail. Among other questions, they are asked to indicate which courses had the most impact on their current clinical practice, and which faculty members had the greatest impact on their intellectual development.

We can develop a reliable and valid instrument for student evaluation of courses. It is possible to achieve some parsimony for administrative purposes. Our students appear to be focused on how much and how well they learn in our courses. We appear to be making distinctions between “good” and “excellent,” instead of poor and good. What do we know now?

What did we learn from the process? • We didn’t begin it with much consensus about how (or even whether) teaching should be evaluated. • The struggle for that consensus was valuable in itself. • We have a good sense for one component of a total evaluation, but we still have work to do.

Developing a Comprehensive Plan for Evaluating Teaching Effectiveness

Developing a Comprehensive Plan for Evaluating Teaching Effectiveness

Presentation Transcript

Teaching Effectiveness

DEVELOPING A COMPREHENSIVE ENROLLMENT MANAGEMENT PLAN

Evaluating Teacher Effectiveness

Models for Evaluating Teacher Effectiveness

Evaluating Curriculum Effectiveness

Models for Evaluating Teacher Effectiveness

Evaluating principal effectiveness

Evaluating Teacher Effectiveness

Developing A Comprehensive Plan: Major Components

Welcome to Developing a Comprehensive Strategic Plan

Models for Evaluating Teacher Effectiveness

Evaluating Management Effectiveness:

Evaluating the Effectiveness of Teaching with IT

Teaching Effectiveness

Developing a Plan for Communication

Evaluating Teacher Effectiveness

Developing A Comprehensive Facilities Plan At UVM

Developing a framework for evaluating qualitative research

Evaluating Website Effectiveness

Evaluating Psychotherapy’s Effectiveness

Evaluating Teacher Effectiveness

Evaluating teaching