Introduction to Psychometrics and Item Writing. Agenda. Ground Rules and Expectations Section 1: Overview of Performance Assessments Blooms Taxonomy Item Writing 101 Test Your Item Writing Skills Section 2: Introduction to Reliability and Validity Introduction to Standard Setting
Agenda • Ground Rules and Expectations • Section 1: • Overview of Performance Assessments • Blooms Taxonomy • Item Writing 101 • Test Your Item Writing Skills • Section 2: • Introduction to Reliability and Validity • Introduction to Standard Setting • Introduction to Probability and Statistics • Psychometrics 101 • Learning Team Application
Ground Rules and Expectations • By the end of the day, what is the most important learning for you to take away from the training? • There is a great deal of information to cover: • Please feel free to ask focused questions. • I will do my best to manage our time and keep us on task.
Overview of Performance Assessments
What is a performance assessment and why are we doing it? • Performance assessments call upon the examinee to demonstrate specific skills and competencies, and to apply the skills and knowledge they have mastered. • A critical step in demonstrating criterion validity; that is, whether examinees can demonstrate proficiency at a given skill.
In this module you will learn to… • …identify the different cognitive levels of Bloom’s Taxonomy, • …understand the hierarchy of Bloom’s Taxonomy, and • …develop test items designed to measure the appropriate cognitive level of Bloom’s Taxonomy
What is Bloom’s Taxonomy? • Bloom’s Taxonomy • Industry-accepted classification scheme that maps cognitive levels to performance characteristics. • Human cognition can be broken down into the following six categories:
Bloom’s Taxonomy • The major idea of the taxonomy is that what exam developers want candidates to know can be measured in a hierarchy from less to more complex. • The higher up in the taxonomy that one progresses, the higher order of cognitive skill that is being assessed. Bloom’s Pyramid of Cognitive Abilities“upper levels subsume lower levels”
Level 1: Knowledge • Knowledge – exhibiting previously learned material by recalling facts, terms, basic concepts, and answers. • Key words: who, what, why, when, omit, where, which, choose, find, how, define, label, show, spell, list, match, name, relate, tell, recall, select. • Example Question Cues • What is…? How is…? • Where is…? When did _________ happen? • How did _______ happen? • Why did …? Can you select…? • How would you show…?
Level 2: Comprehension • Comprehension – demonstrating understanding of facts and ideas by organizing, comparing, translating, interpreting, giving descriptions, and stating main ideas. • Key words: compare, contrast, demonstrate, interpret, explain, extend, illustrate, infer, outline, relate, rephrase, translate, summarize, show, classify. • Example Question Cues • How would you classify the type of…? • How would you compare…? contrast…? • What facts or ideas show…? • What is the main idea of…? • Which statements support…?
Level 3: Application • Application – solving problems by applying acquired knowledge, facts, techniques, and rules in a different way. • Key words: apply, build, choose, construct, develop, interview, make use of, organize, experiment with, plan, select, solve, utilize, model, identify. • Example Question Cues • How would you use…? • How would you solve ______ using what you have learned? • How would you organize ______ to show? • What approach would you use to…? • What elements would you choose to change…?
Level 4: Analysis • Analysis – examining and breaking information into parts by identifying motives or causes; making inferences and finding evidence to support generalizations. • Key words: analyze, categorize, classify, compare, contrast, discover, dissect, divide, examine, inspect. • Example Question Cues • How is ______ related to…? • What inference can you make…? • What conclusions can you draw…? • How would you categorize…? • What is the relationship between…?
Level 5: Synthesis • Synthesis – compiling information together in a different way by combining elements in a new pattern of proposing alternative solutions. • Key words: build, choose, combine, compile compose, construct, create, design, develop, estimate, formulate, imagine, invent. • Example Question Cues • What changes would you make to solve…? • How would you improve…? • What way would you design…? • How would you test…? • Can you think of an original way for the…?
Level 6: Evaluation • Evaluation – presenting and defending opinions by making judgments about information; validity of ideas or quality of work based on a set of criteria. • Key words: choose, conclude, criticize, decide, defend, determine, dispute, evaluate, judge, justify, measure, compare. • Example Question Cues • How would you prove…? Disprove…? • Would it be better if…? • How would you evaluate…? • How would you determine…? • Based on what you know, how would you explain…?
What is an item? • Most people would identify the example on the next slide as a question and answer. • In the test development world, the vocabulary is more precise. • What appears next is called an “item.” • The item consists of the stem (the body of a question or statement) and options (answer choices). • The key is the one and only correct answer. • Distractors are plausible, yet wrong options. • It is important to understand and use this terminology because the remainder of this training will refer routinely to these components.
The Best Written Items… • are clear, well written, and grammatically correct • stand independently of other items (do not need other items in order to answer) • are technically correct (i.e. equations, definitions) • test the knowledge of the subject and not the test-taking skills of the candidate • focus on one thought, problem, or idea • present a specific problem • are formatted properly for readability • use graphics and scenarios effectively • possess only one possible answer • contain a verifiable source, which is not obtuse, not out of print, and is readily available • are always a question (preferably)
The Best Written Items Avoid… • use of ambiguous or over complex language • cultural, ethnic, and/or gender specifics or insensitivities • true or false construction • measuring trivia or knowledge that could easily become outdated or is not relevant • “fluff”
It Should be Distracting • Distractors are the “wrong” answers. It is important, however, that they be well thought-out because they must be plausible and of equal weight with the key. If any obviously do not match the question, then it is a signal to the candidate that this option can be eliminated as the possible key. For example: • What type of water is the best kind to drink? • A. Filtered • B. Contaminated • C. Frozen • D. Distilled • The candidate would immediately know that “frozen” could not be correct because you cannot drink frozen water.
Focus on Stems… • The stem is the part of the item that presents the problem. The stem can consist of aquestion or an open statement. • The stem should: • express a complete thought • include all information necessary to answer the problem • avoid words such as except, not, always, never, best, recommended • contain as much information as possible AND be free of irrelevant material • CAPITALIZE and BOLD words of emphasis
…and Options • The options are the choices from which the candidate must pick an answer. Most items consist of four options. • All options should: • be grammatically and contextually consistent with the stem • be grammatically and contextually consistent with each other • be plausible with only one correct answer (key) • avoid phrases such as All of the above or None of the above
Working with Stems and Options • If the stem is a question, end it with a question mark and start each option with a capital letter. (See previous) • If the option contains a complete sentence, end the option with a period. • Example: • What type of water is hazardous to your health? • A. Filtered water, because it does not contain minerals. • B. Boiled water, because it kills all the bacteria that are good for your digestive system. • C. Contaminated, because it contains bacteria and other elements that can make you sick. • D. Fluoride, because it weakens your bones. • If the option does not contain a complete sentence, do NOT end the option with a period.
The Answer Lies in the Key • The key is the one correct answer. The key should: • always have a verifiable source that is readily available and neither obtuse nor out of print • not be based on an opinion • not be noticeably shorter or longer then the other options
Working with Scenarios • Crafting a solid scenario presents a bit of a challenge compared with a straight-forward item. • A scenario is meant to give information from which a question can be drawn. It is not about setting a “mood.” • Length does not matter: it should be long enough to present all the facts necessary, without giving non-relevant details to the reader. • To qualify as a scenario, at least two stems must accompany it, but no more than four. The stems of the items should refer to information found in the scenario. A scenario should: • not be too short because it needs to present sufficient detail from which the candidate can understand the situation. • not be too long because it must not take the candidate too long to read. • not contain unnecessary information.
Example Scenario • Company A and Company B merge to create a new organization, Company ABC. Both companies operated strategic business units and employed full-time project managers. Although both companies were composite matrix management organizations, their corporate and project management cultures and organizational structures differed. Company A’s project management organizations tailored their processes and tools to their assigned strategic business units; Company B’s project management organizations centralized the development of processes and tools for corporate-wide adoption. The new organization, Company ABC, retained the strategic business units, composite matrix management organization, and full-time project managers. It has a single project organization that aligns project managers with strategic units and has separate headquarters in the United States and the United Kingdom to oversee its North American and European regions, respectively. In so doing, cultural diversity is recognized and accepted.
Example Scenario Items • What kind of project management behavior culture would MOST likely have developed at Company A? • A. Isolated • B. Fragmented • C. Non-cooperative • D. Competitive • Which cultural characteristics are MOST likely to be evident at the North American headquarters? • A. Communicate formally and respect tradition. • B. Focus on task accomplishments and reward individualism. • C. Value history, hierarchy, individualism and loyalty. • D. Communicate indirectly and emphasize hard work and success.
Example of a Scenario As Program Manager you have just finished gathering the program requirements and have defined its deliverables. Now, you are set to build the program's WBS (PWBS). • What should you do FIRST? • A. Check similar programs in the organization or industry and start from there. • B. Ask your project managers to build a WBS for their respective projects and combine them (bottom-up). • C. Build the program's WBS first and then build the individual project's WBS (top-down). • D. Identify the deliverables at the project level. • What level of detail should the WBS (PWBS) include? • A. Program-level deliverables, first to second level of each project's WBS • B. Program- and project-level deliverables • C. Program level only • D. Program-level WBS and project-level WBS's that are required for control.
Which of the following is NOT a Web browser? A. Microsoft Internet Explorer B. Adobe Acrobat C. Opera D. Netscape Navigator E. Lynx
Which one of the following statements is TRUE? A. A packet is a complete message sent over the Internet. B. All browsers display information in exactly the same way. C. URL stands for “Ultimate Resource Location.” D. A browser is used to read FTP messages. E. The Web was created by American physicians.
Scenario… Which one of the following code fragments creates a frameset with a horizontal frame all across the top with a height of 50 pixels with two columns underneath, the left column taking 30% of the page width and the right column taking 70% of the page width?
Scenario… A Web page has a file size of 4 kb (including just text and html code). A photo with 23 kb and a button-bar with 14 kb are also placed on the page. An additional navigational element with 1.5 kb is used four times on the page. What amount of data must a visitor’s browser load to show this page with all graphics? A. 4 kb B. 42.5 kb C. 47 kb D. 60 kb
Networks allow the connected computers to share which of the following? A. Files and resources B. Resources and programs C. Files and programs D. Files only E. Files, resources, and programs
HTML stands for: A. high text master language B. hypertext markup language C. hypertext methodology D. high tech machine language
Which one of the following <list> attributes is NOT deprecated in HTML 4.0? A. type B. start C. style D. All of the above attributes are deprecated. E. None of the above attributes are deprecated.
What is reliability and validity? Reliability and validity of measurement is used to determine the extent to which: • We can learn something about the phenomenon we are studying, and • We can draw meaningful conclusions from our data. • They are necessary components of the test development process and help to demonstrate legal defensibility.
Reliability • Reliability is the extent to which the measurement instrument yields consistent results when the characteristic being measured hasn’t changed. • Example: We use a thermometer to measure temperature; the results yield consistent information over measurement periods; different thermometers provide similar measurements.
Validity • Validity is the extent to which the instrument measures what it purports to measure. • Example: ruler->length, scale->weight, clock->time, thermometer->temperature, barometer->air pressure, performance assessment -> competency?
Relationship What’s the relationship between reliability and validity? • Reliability is a necessary but insufficient condition of validity. • The target metaphor:
Types of Reliability and Validity Reliability and validity take various forms, depending on: • The nature of the research problem, • The nature of the data that we collect, and • The general methodology the researcher uses to address the problem.
Types of Reliability • Internal reliability, • Inter-rater reliability, • Test-retest reliability, and • Equivalent forms reliability
Inter-rater Reliability The extent to which two or more raters evaluating the same performance give similar judgments.
Internal Reliability The extent to which all the items within a single instrument yield similar results.
Equivalent Forms Reliability The extent to which two different versions of the same instrument yield similar results.