1 / 40

The Value in E valu ation

The Value in E valu ation. Erica Friedman Assistant Dean of Evaluation MSSM. DOES. Longitudinal Clinical Observation. OSCE. SHOWS HOW. SP. Practical. Oral. Application. Essay. Note review. KNOWS HOW. MEQ. MCQ. KNOWS. Written. Observed. FORMATIVE. SUMMATIVE.

wyatt
Download Presentation

The Value in E valu ation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Value in Evaluation Erica Friedman Assistant Dean of Evaluation MSSM

  2. DOES Longitudinal Clinical Observation OSCE SHOWS HOW SP Practical Oral Application Essay Note review KNOWS HOW MEQ MCQ KNOWS Written Observed FORMATIVE SUMMATIVE

  3. Session Goals • Understand the purposes of assessment • Understand the framework for selecting and developing assessment methods • Recognize the benefits and limitations of different methods of assessment Conference Objectives • Review the goals and objectives for your course or clerkship in the context of assessment • Identify the best methods of assessing your goals and objectives

  4. Purpose of Evaluation • To certify individual competence • To assure successful completion of goals/objectives • To provide feedback • To students • To faculty, course and clerkship directors • As a statement of values (what is most critical to learn) • For Program Evaluation- evaluation of an aggregate, not an individual (ex. average ability of students to perform a focused history and physical)

  5. Consequences of evaluation • Steering effect- exams “drive the learning”- students study/learn for the exam • Impetus for change (feedback from students, Executive Curriculum, LCME)

  6. Definitions- Reliability • The consistency of a measurement over time or by different observers- ( ex. a thermometer always reads 98 degrees C when placed in boiling, distilled water at sea level) • The proportion of variability in a score due to the true difference between subjects (ex. The difference between Greenwich time and the time on your watch)- • Inter-rater reliability (correlation between scores of 2 raters) • Internal reliability (correlation between items within an exam)

  7. Definitions-Validity • The ability to measure what was intended (the thermometer reading is reliable but not valid) • Four types- • Face/content • Criterion • Construct/predictive • Internal

  8. Types of validity • Face/content- Would experts agree that it assesses what’s important?-(driver’s test mirroring actual driving situation and conditions) • Criterion- draw an inference from test scores to actual performance. Ex. if a simulated driver’s test score predicts the road test score, the simulation test is claimed to have a high degree of criterion validity. • Construct/predictive- does it assess what it intended to assess (ex. Driver’s test as a predictor of the likelihood of accidents- results of your course exam predict the student’s performance on that section of Step 1) • Internal- do other methods assessing the same domain obtain similar results (similar scores from multiple SPs assessing history taking skills)

  9. Types of Evaluations- Formative and Summative Definitions: • Formative evaluation- provide feedback so the learner can modify their learning approach- “When the chef tastes the sauce, that’s formative evaluation” • Summative evaluation- done to decide if a student has met the minimum course requirements (pass or fail)- usually judged against normative standards- “when the customer tastes the sauce, that’s summative evaluation”

  10. Conclusions about formative assessments • Stakes are lower (not determining passing or failing, so lower reliability is tolerated) • Desire more information, so they may require multiple modalities (it is rare for one assessment method to identify all critical domains) for validity and reliability • Use evaluation methods that support and reinforce teaching modalities and steer students’ learning • May only identify deficiencies but not define how to remediate

  11. Conclusions about summative assessments • Stakes are higher- students may pass who are incompetent or may fail and require remediation • Desire high reliability (>0.8) so often require multiple questions/problems or cases (20-30 stations/OSCE, 15-20 cases for oral presentations, 700 questions for an MCQ) • Desire high content validity (single cases have low content validity and are not representative) • Desire high predictive validity (correlation with future performance), which is often hard to achieve • Consider reliability, validity, benefit and cost (resources, time and $) in determining the best assessment tools

  12. How to Match Assessment to Goals and Teaching Methods • Define the type of learning (lecture, small group, computer module/self study, etc) • Define the domain to be assessed (knowledge, skill, behavior) and the level of performance expected (knows, knows how, shows how or does) • Determine the type of feedback required

  13. Purpose of feedback • For students: To provide a good platform to support and enhance student learning • For faculty: To determine what works (what facilitated learning and who were appropriate role models) • For students and faculty: To determine areas that require improvement

  14. Types of Feedback • Quantitative • Total score compared to other students, providing the high, low and mean score and minimum requirement for passing grade • Qualitative • Written personal feedback identifying areas of strength and weakness • Oral feedback one on one or in a group to discuss the areas of deficiency to help guide further learning

  15. Evaluation Bias-Pitfall • Can occur with any evaluation requiring interpretation by an individual (all methods other than MCQ) • Expectation bias (halo effect)- prior knowledge or expectation of the outcome influences the ratings (especially a global rating) • Audience effect- a learner’s performance is influenced by the presence of an observer (seen especially with skills and behaviors) • Rater traits- the training of the rater or the rater’s traits affect the reliability of the observation

  16. Types of assessment tools-Written • Does not require an evaluator to be present during the assessment and can be open or closed book • Multiple choice question (MCQ) • Modified short answer essay question (MEQ)- Patient management problem is a variation of this • Essay • Application test • Medical note/chart review

  17. Types of assessment tools-Observer Dependent Interaction • Usually requires active involvement of an assessor and occurs as a single event • Practical • Medical record review • Standardized Patient(s) (SP) • Objective Structured Clinical Examination (OSCE) • Oral examination- chart stimulated recall; triple jump or direct observation

  18. Types of assessment tools- Observer Dependent Longitudinal Interaction • Continual evaluation over time • Preceptor evaluation – either completion of a a critical incident report or structured rating form based on direct observation over time • Peer evaluation • Self evaluation

  19. MCQ • Definition: A test composed of questions on which each stem is followed by several alternative answers. The examinee must select the most correct answer. • Measures: Knows and Knows how • Pros: Efficient; cheap; samples large content domain (60 questions/hour); high reliability; easy objective scoring, direct correlate of knowledge with expertise • Cons: Often a recall of facts; provides opportunity for guessing (good test-taker); unrealistic; doesn’t provide information about the thought process; encourages learning to recall • Suggestions: Create questions that can be answered from the stem alone; avoid always, frequently, all or none; randomly assign correct answers; can correct for guessing (penalty formula)

  20. MEQ • Definition: A series of sequential questions in a linear format based on an initial limited amount of information. It requires immediate short answers followed by additional information and subsequent questions. (patient management problem is a variation of this type) • Measures: Knows and Knows how • Pros: Can assess problem solving, hypothesis generation and data interpretation • Cons: Low inter-case reliability; less content validity; harder to administer; time consuming to grade and variable inter-rater reliability • Suggestions: Use directed (not open ended) questions; provide extensive answer key

  21. Open ended essay question • Definition: Question allowing a student the freedom to decide the topic to address and the position to take- it can be take home • Measures: Knows, Knows how • Pros: Assesses ability to think (generate ideas, weigh arguments, organize information, build and support conclusions and communicate thoughts; high face validity • Cons: Low reliability; time intensive to grade; narrow coverage of content • Suggestions: strictly define the response and the rating criteria

  22. Application test • Definition: Open book problem solving test incorporating a variety of MCQs and MEQs. It provides a description of a problem with data. The examinee is asked to interpret the data to solve the problem. (ex. Quiz item 3) • Measures: Knows and knows how • Pros: Assesses higher learning; good face/content validity; reasonable reliability; useful for formative and summative feedback • Cons: Harder to create and grade

  23. Practical Exam • Definition: Hands on exam to demonstrate and apply knowledge (ex. Culture and identify the bacteria on the glove of a Sinai cafeteria worker, or performance of a history and physical on a patient) • Measures: Know, knows how , and ? shows how and does • Pros: Can test multiple domains, actively involves the learner (good steering effect); best suited for procedural/technical skills; higher face validity • Cons: Labor intensive (creation and grading); hard to identify gold standard, so subjective grading; high rate of item failure (unanticipated problems with administration) • Suggestions: Pilot first; adequate, specific instructions and goals; specific, defined criteria for grading and train raters; for direct observation, require multiple encounters for higher reliability

  24. Medical record/note review • Definition: Examiner reviews learner’s previously created document; can be random • Measures: Knows how and Does • Pros: Can review multiple records for higher reliability; high face validity; less costly than oral (done without learner and at examiner’s convenience) • Cons: Lower inter-rater reliability; less immediate feedback; unable to determine basis for decisions • Suggestions: Create a template with specific ratings for skills

  25. Standardized Patients • Definition: Simulated patient/actor trained to present history in reliable, consistent manner and to use a checklist to assess students skills and behaviors • Measures: Knows, Knows how, Shows how and Does • Pros: High face validity; can assess multiple domains; can be standardized; can give immediate feedback • Cons: Costly; labor intensive; must use multiple SPs for high reliability

  26. OSCE (Objective Structured Clinical Exam) • Definition: Task oriented, multi-station exam; stations can be 5-30 minutes and require written answers or observation (ex. Take orthostatic VS; perform a cardiac exam; smoking cessation counseling; read and interpret CXR or EKG results; communicate lab results and advise a patient • Measures: Knows, Knows how, Shows how and Does

  27. OSCE (Objective Structured Clinical Exam) • Pros: Assesses clinical competency; tests a wide range of knowledge, skills and behaviors; can give immediate feedback; good test-retest reliability; good content and construct validity; less patient and examiner variability than with direct observation • Cons: Costly (manpower and $); case specific; requires > 20 stations for internal consistency; weaker criterion validity

  28. Oral Examination • Definition: Method of evaluating a learner’s knowledge by asking a series of questions. The process is open ended with the examiner directing the questions. –(ex. chart stimulated patient recall or a triple jump) • Measures: Knows, Knows how, sometimes Shows how and does

  29. Oral Exam • Pros: Can measure clinical judgement, interpersonal skills (communication) and behavior; high face validity; flexible; can provide direct feedback • Cons: Poor inter-rater reliability (dove vs hawk and observer bias); content specific so low reliability (must use > 6 cases to increase reliability); labor intensive • Suggestions: multiple short cases; define questions and answers; provide simple rating scales and train raters

  30. Triple Jump • Definition: Three step written and oral exam- written, research and then oral part- (ex. COMPASS 1) • Measures: Knows, knows how, shows how and does • Pros: Assesses hypothesis generation, use of resources, application of knowledge to problem solve and self directed learning; provides immediate feedback; high face validity • Cons: only for formative assessment (poor reliability); time/faculty intensive; too content specific and inconsistent rater evaluations

  31. Clinical Observations • Definition: Assessment of various domains longitudinally by an observer- either preceptor, peer or self (small group evaluations during first two years and preceptor ratings during clinical exposure) • Measures: Knows, knows how, Shows how and Does • Pros: Simple; efficient; high face validity; formative and summative

  32. Clinical Observations • Cons: low reliability (only recent encounters often influence grade); halo effect (lack of domain discrimination); more often a judgement of personality and “Lake Woebegone” effect (all students are necessarily above average); unwillingness to document negative ratings (fear of failing someone) • Suggestions: Frequent ratings and feedback; increase the number of observations; multiple assessors (with group discussion about specific ratings)

  33. Peer/Self Evaluation • Pros: Useful for formative feedback • Cons: Lack of correlation with faculty evaluations; same cons as others (measure of “nice guy”, low reliability, halo effect- peer evaluations have friend effect or fear of retribution or desire to penalize • Suggestions: limit the # of behaviors assessed; clarify the difference between evaluation of professional and personal aspects; develop operationally proven criteria for rating; provide multiple opportunities for students to do this and provide feedback from faculty

  34. Erica Friedman’s Educational Pyramid Direct Observation, Practical Does OSCE, T Jump Oral, SP, Practical Chart review Shows How MEQ,Essay Knows How MCQ Knows

  35. Critical factors for choosing an evaluation tool • Type of evaluation and feedback desired: formative/summative • Focus of evaluation: Knowledge, skills, behaviors (attitudes) • Level of evaluation: Know, Knows how, Shows how, Does • Pros/Cons: Validity, Reliability, Cost (time, $ resources)

  36. How to be successful • Students should be clear about the course/clerkship goals and the specifics about the types of assessments used and the criteria for passing (and if relevant, just short of honors and honors) • Make sure the choice of assessments is consistent with the values of your course and the school • Final judgments about students’ progress should be based on multiple assessments using a variety of methods over a period of time (instead of one time point)

  37. Number of courses or clerkships using a specific assessment tool-assessing our assessment methods

  38. Why assess ourselves? • Assure successful completion of our course goals and objectives • Assure integration with the mission of the school • Direct our teaching/learning-(determine what worked and what needs changing)

  39. How we currently assess ourselves • Student evaluations (quantitative and qualitative)- most often summative • Performance of students on our exam and specific sections of USMLE • Focus and feedback groups (formative and currently done by Dean’s office) • Peer evaluations of course/clerkship- by ECC • Self evaluations- yearly grid completed by course directors and core faculty • Consider peer evaluation of teaching and teaching materials

More Related