Developing “Better” Selected-Response Assessments

Developing “Better” Selected-Response Assessments Louis Franceschini, Ph.D. Center for Research in Educational Policy University of Memphis http://umdrive.memphis.edu/lfrncsch/DSCC/

Unpacking “Better” • This presentation refers to developing selected-response assessments that may be in either one or two senses “better” than is usual. • Better in the sense of “more challenging.” • Better in the sense of “more informative.”

“Better” in the First Sense • “Better” in first sense refers to the link that has been forged between selected response items and memorization. • On this view, significant educational outcomes can only be assessed with “constructed response” items (e.g., essays), • while so-called “selected response” sorts of items (e.g., multiple choice) are useful mainly for assessing “just the facts, ma’am.”

“Better” in the First Sense • Put another way, selected response items are suitable for remembering sorts of outcomes (in the lingo of the “revised” Bloom’sTaxonomy) but not those pertinent tounderstanding,applying, and higher cognitive processes linkedto “critical thinking.”

“Better” in the First Sense Who wrote War and Peace? a. Turgenev b. Tolstoy c. Dostoyevsky d. Solzhenitsyn To be sure, MCQs can very easily get at remembering factual items like the following:

“Better” in the First Sense • During a sleep study, you notice that your research participant’s EEG shows brain activity known as spindles and K-complexes. The participant is probably in stage ___ sleep. • a. 1 • b. 2 • c. 3 • d. 4 The student is being asked to interpret what he/she sees on the EEG and make an inference about stage of sleep. But with a bit of practice, your MCQs can also assess outcomes linked to understanding.

“Better” in the First Sense • Naomi asked seven randomly selected people the following question, "On a scale from 0 (not at all) to 9 (definitely), how strongly do you believe in ghosts?" Their responses were as follows: 0, 0, 1, 1, 9, 9, 9. Which measure of central tendency should Naomi use if she wants to convince her friends that most people strongly believe in ghosts? • a. Medianb. Mean c. Mode d. Range The student must know how to compute these three measures of central tendency and choose the one most relevantto making her point apropos this dataset. Applying and so on . . .

“Better” in the First Sense Re: Point 1 above, a start on getting “better” can be achieved by developing a two-way (and more or less detailed) TABLE of SPECIFICATIONS.

“Better” in the First Sense One dimension of the “two-way” table is devoted to the content addressed (as course topics, instructional objectives, standards of learning, etc.)

“Better” in the First Sense The second dimension of the “two-way” table is devoted to the level of cognitive demandassociated with the item reflecting the topic.

“Better” in the First Sense Minimally, such tables are intended to 1) identify the most important achievement domains being measured and 2) ensure that a fair and representative sample of questions appear on the assessment.

“Better” in the First Sense However, such tables can go to include information about the number and types of items used to assess content mastery, given the emphasis placed on the objective during instruction,and the level of cognitivedemandexpected of the learner.

“Better” in the First Sense Detail could be added by assigning point values to the various items (e.g., 1 pt. for remember/understand, 2 pts. for apply/analyze), 5 pts. for problem solving). 1 pt. 2 pts. 5 pts.

“Better” in the First Sense Note that the MCQ suffices for most items on this test. However, for “problem solving,” the constructed response format type is being evoked.

“Better” in the First Sense With this said, it’s also important to recognize that less successful MCQs may sometimes work better as a different sort of selected response item altogether!

“Better” in the First Sense Suppose that a dozen of your students completed a 10-item multiple- choice test and earned the following number of correct scores: 5, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 10 T F 09. The median for your students’ scores is 7.5. T F10. The mode for the set of scores is 8.0. T F 11. The range of the students’ scores is 5.0. T F12. The median is different from the mean. With respect standard two, one can test for students’ ability to “analyze data using mean, median . . .” without resort to either multiple-choice or constructed response items. A “multiple T/F item” can be used.

“Better” in the First Sense • In developing the rows of the Table of Specifications, a first step would involve listing all of the topics covered in the instruction and then making some judgment about which topics seem essential, which important, and which “good to know.”

“Better” in the First Sense Required for developing the columns of the table, however, is at least some acquaintance with an analytic structure that deals with different levels of cognitive demand.

“Better” in the First Sense One such structure is the “original” (1956) Bloom’sTaxonomy of EducationalObjectives in the Cognitive Domain.

“Better” in the First Sense • According to David Krathwohl, the intent of the “original” Taxonomy was to serve as a • common language about learning goals and a basis for determining for a particular course or curriculum the specific meaning of broad educational goals; • means for determining the congruence of educational objectives, activities, and assessments in a unit, course, or curriculum; and • panorama of the range of educational possibilities against which the limited breadth and depth of any particular educational course or curriculum could be considered. • Updated in the 1990’s, the “revised” Taxonomy has similar aims.

“Better” in the First Sense The “revised” Taxonomyis preferred bymany educatorsbecause theway it isarticulatedaccommodatesthe S-V-O waythat disciplinary“standards” are expressed.

“Better” in the First Sense In the revision, what were once six nouns are now six VERBS,with the impliedSUBJECTbeingthe learner.(NB that now“creating” bests“evaluating.”)

“Better” in the First Sense More important, these VERBStake on OBJECTSin the revision.These objects are fourforms of knowledge. The result is a two-way matrix that is not unlike a typical TABLE of SPECIFICATIONS.

The “new” Bloom’s Taxonomy These four forms of knowledge are defined as: The basic elements students must know to be acquainted with a discipline or solve problems in it. The interrelationships among the basic elements within a larger structure that enable them to function together.How to do something, methods of inquiry and criteria for using skills, algorithms, techniques and methods. Knowledge of cognition in general as well as awareness and knowledge or one’s own cognition.

The “new” Bloom’s Taxonomy In the context of developing a Table of Specifications, more important to be acquainted with are the definitions of the six levels of the cognitive process dimension. Hence. . .

The “new” Bloom’s Taxonomy • Recognising • Listing • Describing • Identifying • Retrieving • Naming • Locating • Finding Rememberingconsists of recognizing and recalling relevant information from long-term memory.

The “new” Bloom’s Taxonomy • Interpreting • Exemplifying • Summarizing • Inferring • Paraphrasing • Classifying • Comparing • Explaining • Understandingrefers to the construction of meaning from instructional messages, including oral, written and graphic communication.

The “new” Bloom’s Taxonomy • Analyzingconcerns breaking knowledge down into its parts and determining how the parts relate to an overall structure or purpose. • Differentiating • Organizing • Attributing • Applyingrefers to using a learned procedure either in a familiar or new situation. • Executing (carrying out) • Implementing (using)

The “new” Bloom’s Taxonomy • Creatingconcerns putting elements together to form a coherent or functional whole OR with reshaping elements into a new pattern or structure. • Generating • Planning • Producing • Checking • Critiquing Evaluatingrefers to making judgments using criteria or standards.

“Better” in the First Sense For manypurposes,Bloom’staxonomy is too-detailed.Hence, otherscholars havedevised simplerstructures to be used in developing classroom tests.

“Better” in the First Sense An expert on MCQ assessment, Thomas Haladyna (2005) has developed a template for developing items that incorporates much of the same information as a garden-variety Table of Specifications.

“Better” in the First Sense Much simpler than Bloom’s taxonomy, Haladyna’s template captures both MENTAL DEVELOPMENT (cognitive processes dimension) and CONTENT (knowledge dimension).

“Better” in the First Sense Haladyna’s template is gaining in popularity not only because of its simplicity but also because the it is paired with an item-writing technique called ITEM SHELLS, “stripped-down items that have the content removed but retain a tried-and-true structure.” For cognitive processes beyond RECALL, Haladyna has developed one or more item shells touching upon understanding, critical thinking (as evaluation), critical thinking (as prediction), and problem solving (which is to be paired with additional stimulus material)

“Better” in the First Sense • Claiming, for example, that UNDERSTANDING has primarily to do with being able to define, citecharacteristics, and provide examples, Haladyna suggests the item shells • Which best defines __________ ? • Which is (un) characteristic of __________ ? • Which of the following is an example of __________ ?

“Better” in the First Sense • For Haladyna, one part of CRITICAL THINKING is EVALUATING; hence he suggests the item shells • What is most effective (appropriate) for __________ ? • What is better (worse) __________ ? • What is the most effective method for __________ ? • What is the most critical step in this procedure? • What is (un)necessary in a procedure?

“Better” in the First Sense • As CRITICAL THINKING may also involve PREDICTING, Haladyna provides the item shells • What would happen if __________ ? • If this happens, what should you do __________ ? • On the basis of . . ., what should you do __________ ? • Given . . ., what is the primary cause of . . .?

“Better” in the First Sense • Finally, for purposes of measuring/developing PROBLEM SOLVING Haladyna recommends the use of the following item shells accompanied by an appropriate scenario: • What is the nature of the problem? • What do you need to solve this problem? • What is a possible solution? • Which is a solution? • Which is the most effective (efficient) solution? • Why is ________ the most effective (efficient) solution?

“Better” in the Second Sense • Although “Better” in the sense of making items more cognitively challenging is a necessarycondition of assessing students effectively, it is unfortunately NOT sufficient, as demonstrated by the following item. • To generate a correlation coefficient between two variables with ordinal data, which set of instructions would you give PASW? • Analyze  Crosstabs  Descriptive Statistics  Spearman  ok • Graphs  Frequencies  [select variables]  Spearman  ok • Analyze Compare Means  ANOVA table  First Layer  Spearman ok • Analyze Correlate  Bivariate  [select variables]  Spearman ok

“Better” in the Second Sense • Although the item isno doubt “cognitively challenging”—it aims at both understanding and applying—its value isundercut byissues surrounding “better” in the second sense: CORRECTNESS! • To generate a correlation coefficient between two variables with ordinal data, which set of instructions would you give PASW? • Analyze  Crosstabs  Descriptive Statistics  Spearman  ok • Graphs  Frequencies  [select variables]  Spearman  ok • Analyze Compare Means  ANOVA table  First Layer  Spearman ok • AnalyzeCorrelateBivariate  [select variables]  Spearman ok

“Better” in the Second Sense • “Better” in the second sense goes to the fundamental purpose of assessment: namely, to provide accurate (that is “valid”) information about the status of student with respect to some underlying variable of interest. • Re: classrooms, the “variable of interest” typically means student achievement relative to some well-defined body of knowledge or set of skills.

“Better” in the Second Sense • Unfortunately, the quality (validity) of the information provided by classroom assessments is often compromised in one or both of two ways: • The assessment as a whole does NOT adequately reflect the domain of interest. There is a mismatch between instruction and assessment. • Responses to individual assessment items reflect something OTHER than students’ status relative to the variable of interest.

“Better” in the Second Sense Point 1 goes to the notion of an assessment’s content validity. If the six figures below represent the way that a set of items (the dots) accurately represent a domain of interest (the squares), only the one at the upper left can be said to have adequate content validity.

“Better” in the Second Sense • To generate a correlation coefficient between two variables with ordinal data, which set of instructions would you give PASW? • Analyze  Crosstabs  Descriptive Statistics  Spearman  ok • Graphs  Frequencies  [select variables]  Spearman  ok • Analyze Compare Means  ANOVA table  First Layer  Spearman ok • AnalyzeCorrelateBivariate  [select variables]  Spearman ok If one’s TABLE of SPECIFICATIONS is well-written, issues concerning point #1—the match between instruction and assessment—should be largely resolved. Not so with point #2, however, concerning responses thatreflect something OTHER than student’s knowledge relative to the variable of interest. WHY??

“Better” in the Second Sense Point 2 speaks to the notion of “construct irrelevant variance”—the sort of psychometric “noise” that masks and distorts what a response truly indicates about a student’s underlying knowledgeor ability. Re: the previous item,it’s a toss-up whetherthe item is measuring student’s statistical know-how and PASW acumen or his/her reading ability

“Better” in the Second Sense The TRUE SCORE model above suggests that uncontrollable “noise” (random error) is always a factor in measuring people’s abilities The point is to avoid ADDING to that noise!

“Better” in the Second Sense • Thus, re: Point 2, “better” can be achieved by complying with “item-writing wisdom” pertinent to different types of selected response items. Clustered, bits of such wisdom concern: • Eliminating potentially confusing wording or ambiguous requirements (clarity). • Decreasing the chance of guessing correctly. • Improving test-taking “efficiency.” • Controlling for “testwiseness” (unintended clues).

“Better” in the Second Sense 1. According to the experts, POTENTIALLY CONFUSING WORDING/ABIGUOUS REQUIREMENTS is problematic in assessment “because if some respondents understand a question or a set of instructions, and others do not, their responses may vary as a result of that difference, not as a result of different underlying levels of knowledge or skill.”

“Better” in the Second Sense • Across ALL selected-response item types, the experts caution against confusion linked to • Unnecessarily complex syntax, • Vocabulary inappropriate to the age of students being tested, • Wordiness, • Vague qualitative modifiers such as “many,” “large,” “much,” “small,” “old,” and “important,” • Negatives in general (like NOT, EXCEPT) and “double negatives” in particular.

“Better” in the Second Sense Recent research has suggested that adults often become domineering towards children because of their inherited characteristics. The monkey living in Chris Griffin’s closet became evil after his wife cheated on him. • Who is most admired for outstanding dramatic performance in film in the past ten years? • Jane Fonda • Shirley McClain • Meryl Streep Grammatical agreement issues have been known to cause confusion. As have unsourced items presented as FACT that are really opinion.

“Better” in the Second Sense 2. Likewise, GUESSING is problematic because “if respondents choose a correct answer by chance, instead of knowing the correct answer, there is no validity in the interpretation that the correct response reflects knowledge.” Re: this guideline, all multiple choice and matching exercise answer options should be plausible and there should be as many such options as is reasonable.

Developing “Better” Selected-Response Assessments

Developing “Better” Selected-Response Assessments

Presentation Transcript

The Georgia Writing Assessments

Chapter 5 Frequency Response Method

Developing Selected-Response Items

The Florida Assessments for Instruction in Reading Overview

Rheumatologic Assessments

What is the probability that a person chosen at random from those in the sample will be in the 31-45 age category?

ACE Personal Trainer Manual, 4 th edition Chapter 7: Functional Assessments: Posture, Movement, Core, Balance, and

Learning Media Assessments

ACE Personal Trainer Manual, 4 th edition Chapter 8: Physiological Assessments

Creating Common Assessments

2013–2014 End-of-Course Assessments Training Materials

Response to an Active Shooter

Climate Change and Response

Incident Termination

Canterbury Primary Response Earthquakes Sept/Feb/June

Selected MaxCompiler Examples

A Walk In the PARCC

Selected MaxCompiler Examples

Survivor

Making Assessments Matter to Students

Improving your classroom assessments