clear pre conference workshop testing essentials l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CLEAR Pre-Conference Workshop Testing Essentials PowerPoint Presentation
Download Presentation
CLEAR Pre-Conference Workshop Testing Essentials

Loading in 2 Seconds...

play fullscreen
1 / 160

CLEAR Pre-Conference Workshop Testing Essentials - PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on

CLEAR Pre-Conference Workshop Testing Essentials. Job Analysis- Reed A. Castle, PhD Item Writing- Steven S. Nettles, EdD Test Development- Julia M. Leahy, PhD Standard Setting- Paul D. Naylor, PhD Scaling/Scoring-Lauren J. Wood- PhD, LP 5 topics, 20 minutes and 20 minutes Q&A.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

CLEAR Pre-Conference Workshop Testing Essentials


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
clear pre conference workshop testing essentials
CLEAR Pre-Conference WorkshopTesting Essentials
  • Job Analysis- Reed A. Castle, PhD
  • Item Writing- Steven S. Nettles, EdD
  • Test Development- Julia M. Leahy, PhD
  • Standard Setting- Paul D. Naylor, PhD
  • Scaling/Scoring-Lauren J. Wood- PhD, LP
  • 5 topics, 20 minutes and 20 minutes Q&A

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

job analysis

Job Analysis

Reed A. Castle, Ph.D.

Schroeder Measurement Technologies, Inc.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

what is a job analysis
What is a Job Analysis?
  • An investigation of the ability requirements that go with a particular job (Credentialing Exam Context).
  • It is the study that helps establish a link between test scores and the content of the profession.
  • The Joint Technical Standards14.14
  • “The content domain to be covered by a credentialing test should be defined clearly and justified in terms of importance of the content for the credential-worthy performance in an occupation or profession. A rationale should be provided to support a claim that the knowledge or skills being assessed are required for credential-worthy performance in an occupation and are consistent with the purpose for which the licensing or certification program was instituted.”

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

why conduct a job analysis
Why Conduct a Job Analysis?
  • Need to establish a validity link.
  • Need to articulate a rationale for examination content.
  • Need to reduce the threat of legal challenges.
  • Need to determine what is relatively important practice.
  • Need to understand the profession before we assess it.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

types of job analyses
Types of Job Analyses
  • Focus Group
  • Traditional Survey-Based
  • Electronic Survey-Based
  • Transportability

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

focus group
Focus Group
  • Need to Identify the best group of SMEs possible
    • Areas of Practice
    • Geographic representation
    • Demographically Balanced
  • 8 to 12

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

focus group7
Focus Group
  • Prior to Meeting-
  • Comprehensive review of profession
    • Job Descriptions
    • Performance Appraisals
    • Curriculum
    • Other job-related documents
  • Create a Master Task List
  • Send list to SMEs prior to meeting to give them chance to review

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

focus group8
Focus Group
  • At Meeting-
  • Review Comprehensive Task List
  • Determine which tasks are important
  • Determine which tasks are performed with an appropriate level of frequency
  • Determine which tasks are duplicative
  • Identify and add missing tasks
  • Organize into coherent outline

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

focus group9
Focus Group
  • Advantages-
    • May be only solution for new/emerging professions
    • Relatively quick
    • Less expensive

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

focus group10
Focus Group
  • Disadvantages
    • Based on one group (Results may not generalize)
    • May be considered a weaker model when considering validation.
    • May result in complaints from constituents about the content of the test.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based
Traditional Survey-Based
  • First steps are similar to the focus group (i.e., task list is generated in same manner)
  • After the task list is created, three more issues must be addressed to complete the first survey development meeting.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based12
Traditional Survey-Based
  • First, demographic questions must be developed with two goals in mind.
    • Questions should help describe the sample of respondents
    • Some Question will be used for analyses help generalize across groups (e.g., geographic regions)

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based13
Traditional Survey-Based
  • Second, rating scale(s) should be developed.
    • Minimally, two pieces of information should be collected
      • Importance or significance
      • Frequency of performance
    • Additional scales can be added but may take away from response rate.
    • Shorter is sometimes better.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based14
Traditional Survey-Based
  • Sample Scale combining Importance and Frequency
  • High correlation b/w Freq and Imp Ratings (.95 and higher)
  • Considering both the importance and frequency, how important is this task in relation to the safe, effective, and competent performance of a Testing Professional? If you believe the task is never performed by a Testing Professional, please select the 'Not performed' rating.
    • 0 = Not performed
    • 1 = Minimal importance
    • 2 = Below average or low importance
    • 3 = Average or medium importance
    • 4 = Above average or high importance
    • 5 = Extreme or critical importance

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based15
Traditional Survey-Based
  • Sampling-
  • One of the more important considerations is the sampling model employed.
  • Surveys should be distributed to a sample that is reflective of the entire population.
  • Demographic questions help describe the sample.
  • One should anticipate a low response rate (20%) when planning for an appropriate number of responses.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

traditional survey based16
Traditional Survey-Based
  • Mailing Surveys
  • Enclose a postage paid return envelope.
  • Plan well in advance for international mailings (can be logistically painful with different countries).
  • When bulk mailed, plan extra time.
  • Keep daily track of return volume.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

electronic survey based
Electronic Survey-Based
  • Identical to traditional, but delivery and return are different.
  • Need Email addresses.
  • Need profession with ready access to Internet.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

electronic survey based18
Electronic Survey-Based
  • Advantages
    • Faster response time.
    • Data entry is no longer needed.
    • Reduced processing time on R & D side.
    • Possibly less expense (less admin costs).
    • Can modify sampling and survey on the fly if needed
    • Sample can be the population with little additional cost.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

electronic survey based19
Electronic Survey-Based
  • Disadvantages
    • Need Email addresses
    • High rate of “bounce-back”
    • Control for ballot stuffing
    • Data compatibility

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

transportability
Transportability
  • Using the results of other job analysis
  • Determine compatibility or transportability
  • Similar to Focus Group

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

four types review
Four Types Review
  • Focus Group
  • Traditional Survey-Based
  • Electronic Survey-Based
  • Transportability

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

slide22
Demographics

Importance Ratings

Frequency Ratings

Composite

Sub group Analyses

Decision Rules

Reliability

Raters

Instrument

Survey Adequacy

Data

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

primary demographics
Primary Demographics
  • Geographic Region
  • Years Experience
  • Work Setting
  • Position Role/Function
  • Percent Time in certain activities

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

mean importance ratings 3 0 criterion
Mean Importance Ratings- 3.0 criterion

Out

In

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

not performed ratings criterion 25 75 perform
% Not Performed Ratings, Criterion 25% (75% perform)

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

composite ratings
Composite Ratings
  • Composite ratings using rating scale Natural Logs (when multiple scales are used) can be calculated and combined based on some weighting scheme.
  • For example, if you want to weight frequency 33.33% and importance 66.66%, you can adjust for this in the composite rating equation.
  • Personal opinion is that you will likely end up in a very similar place if establishing decision criteria on each scale individually.
  • In addition, multiple decision rules is more conservative

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

mean importance sub group analyses
Mean Importance Sub-group Analyses

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

assessment type
Assessment Type
  • SMEs are asked to determine which assessment type will best measure a given task
  • Multiple choice
  • Performance
  • Essay/short answer

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels
Cognitive Levels
  • Each task on the content outline requires some level of cognition to perform
  • 3 basic levels exist (from Bloom’s Taxonomy)
    • Knowledge/Recall
    • Application
    • Analysis
  • Steve will discuss in next presentation

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels30
Cognitive Levels
  • Of the remaining tasks-post inclusion decision criteria, SMEs are asked to rate them on a 3 point scale
  • For each major content area, an average rating is calculated
  • The average is applied to specific criteria to determine the number of items by cognitive level for each content area

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

weighting
Weighting is usually done with SME’s based on some type of data

For example, average importance or composite rating for a given content area

Applied to assessment type and cognitive levels.

Weighting

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test specifications weights
Standard Exclusion/Inclusion criteria

Test Specifications

Assessment type/Cognitive levels

Weights based on rational approach

Reflect test-type

Statistical

Consensus

Test Specifications/Weights

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

item writing

Item Writing

Steven S. Nettles, EdD

Applied Measurement Professionals, Inc.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

overview of measurement
Overview of Measurement
  • Job Analysis
  • Test Specifications
    • Detailed Content Outline
  • Item Writing
  • Examination Development
  • Standard Setting
  • Administration and Scoring

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test specifications detailed content outline
Test Specifications & Detailed Content Outline
  • Developed based on the judgment of an advisory committee as they interpreted job analysis results from many respondents.
  • Guides item writing and examination development.
  • Provides information to candidates.
    • Required!

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

item writing workshop goals

Certified

Item Writing Workshop Goals
  • appropriate item content and cognitive complexity
  • consistent style and format
  • efficient examination committee work

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

a test item
A test item
  • measures one unit of content.
  • contains a stimulus (the question).
  • prescribes a particular response form.
  • The response allows an inference about candidates’ abilities on the specific bit of content.
    • When items are linked to job content, summing all correct item responses allows broader inferences about candidates’ abilities to do a job.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

preparing to write
Preparing to Write
  • Each item must be linked to the prescribed
    • part of the Detailed Content Outline.
    • cognitive level (optional).
  • Write multiple-choice items.
    • Three options better for similar ability groups.
    • Five options better for diverse groups.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

why multiple choice
Why multiple choice?
  • Dichotomous (right/wrong) scoring encourages measurement precision.
  • Validity is strongly supported because each item measures one specific bit of content.
    • Many items sample the entire content area.
  • The flexible format allows measurement of a variety of objectives.
  • Examinees cannot bluff their way to receiving credit (although they can correctly guess).
    • We will talk more about minimizing effective guessing.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

item components include
Item components include
  • stem.
  • three to four options.
    • one key
    • two to three distractors.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

item components
Item Components
  • Stem
    • The statement or question to which candidates respond.
    • The stem can also include a chart, table, or graphic.
    • The stem should clearly present one problem or idea.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

example stems
Example Stems
  • Direct question
    • Which of the following best describes the primary purpose of the Code of Federal Regulations?
  • Incomplete statement
    • The primary purpose of the CFR includes

New writers tend to write clearer direct questions. If you are new to item writing, it may be best to concentrate on that type.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

among the options will be the
Among the options will be the
  • key
    • With a positively worded stem, the key is the best or most appropriate of the available stem responses.
    • With a negatively worded stem, the key is the least appropriate or worst of the available stem responses.
      • Negatively written items are not encouraged!
  • distractors - plausible yet incorrect responses to the stem

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels44
Cognitive levels
  • Recall
  • Application
  • Analysis

Cognitive levels are designated because we recognize that varying dimensions of the job require varying levels of cognition. By linking items to cognitive levels, a test better represents the job, i.e., is more job-related.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels45
Cognitive levels
  • Recall items
    • use an effort of rote memorization.
    • are NEVER situationally dependent.
    • have options that frequently start with nouns.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

recall item
Recall item

Which of the following beers is brewed in St. John’s?

A. LaBlatts

B. Molson

C. Moosehead

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels47
Cognitive levels
  • Application items
    • use interpretation, classification, translation, or recognition of elements and relationships
      • Any item involving manipulations of formulas, no matter how simple, are application level.
      • Items using graphics or data tables will be at least at the application level.
    • have keys that depend on the situation presented in the stem
      • If the key would be correct in any situation, then the item is probably just a dressed up recall item.
    • have options that frequently start with verbs.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

application item
Application item

Which of the following is the best approach when trout-fishing in the Canadian Rockies?

A. Use a fly fishing system with a small insect lure.

B. Use a spinning system with a medium Mepps lure.

C. Use a bait casting system with a large nightcrawler.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cognitive levels49
Cognitive levels
  • Analysis items
    • use information synthesis, problem solving, and evaluation of the best response.
    • require candidates to find the problem from clues and act toward resolution.
    • have options that frequently start with verbs.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

analysis item
Analysis item

Total parenteral nutrition (TPN) is initiated in a non-diabetic patient at a rate of 42 ml/hour. On the second day of therapy, serum and urine electrolytes are normal, urine glucose level is 3% and urine output exceeds parenteral intake. Which of the following is the MOST likely cause of these findings?

A. The patient has developed an acute glucose tolerance.

B. The patient’s renal threshold for glucose has

been exceeded.

C. The patient is now a Type 2 diabetic requiring supplemental insulin.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

other item types complex multiple choice cmc
Other Item Types: complex multiple-choice (CMC)
  • are best for complex situations with multiple correct solutions.
  • may incorporate a direct question or incomplete statement stem format.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

cmc items
CMC items

Which of the following lab test results have been associated with fibromyalgia or myofascial pain?

1. Elevated CPK

2. Elevated LDH isoenzyme subsets

3. White blood cell magnesium deficiency

4. EMG abnormalities

A. 1 and 3 only

B. 1 and 4 only

C. 2 and 3 only

D. 2 and 4 only

Elements

Options

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

k type item
K - Type Item

A child suffering from an acute exacerbation of rheumatic fever usually has

1. An elongated sedimentation rate

2. A prolonged PR interval

3. An elevated antistreptolysin O titer

4. Subcutaneous nodules

A. 1, 2, and 3 only

B. 1, 3 only

C. 2, 4 only

D. 4 only

E. All are correct (From: Constructing Written Test Questions for the Basic and Clinical Sciences, Case & Swanson, 1996, NBME, Philadelphia, PA)

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

other item types negatively worded
Other Item Types: negatively worded
  • Avoid negative wording when a positively worded item (e.g., CMC type) can be used.
    • Negative wording encourages measurement error when able candidates become confused.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

negatively worded items
Negatively worded items

A civil subpoena is valid for all of the following EXCEPT when it is

A. served by registered mail.

B. accompanied by any required witness fee.

C. accompanied by a written authorization from the patient.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

convert negatively worded items to cmc items
Convert negatively worded items to CMC items
  • If you find yourself writing a negatively worded item, finish it.
  • Then consider rewriting it as a CMC item where 2-3 elements are true, and 1 or 2 elements are not included in the key.
  • Don’t write all CMC items to have 3 true and 1 false element.
    • Mix it up – e.g., 2 true and 2 false.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

things to do
Things to do

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

use an efficient and clear option format
Use an efficient and clear option format.
  • List options on separate lines.
  • Begin each option with a letter (i.e., A, B, C, D) to avoid confusion with numerical answers.
  • Write options in similar lengths.
    • New item writers tend to produce keys that are longer and more detailed than the distractors.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

put as many words as possible into the stem
Put as many words as possible into the stem.

The psychometrician should recommend

A. that the committee write longer, more difficult to read stems.

B. that the committee write distractors of length

similar to the key.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

write distractors with care
Write distractors with care
  • Item difficulty largely depends on the quality of the distractors.
  • The finer distinctions candidates must make, the more difficult the item.
  • When writing item stems, you should do all you can to help candidates clearly understand the situation and the question.
  • Distractors should be written with a more ruthless (but not tricky) attitude.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

write distractors you know some candidates will select
Write distractors you know some candidates will select.
  • Use common misconceptions.
  • Use candidates’ familiar language.
  • Use impressive-sounding and technical words in the distractors.
  • Use scientific and stereotyped phrases, and verbal associations.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

things to avoid

Things to avoid

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

avoid
Avoid
  • using “All of the above or” “None of the above” as options.
  • using stereotypical or prejudicial language.
  • overlapping data ranges.
  • using humorous options.
  • placing similar phrases in the stem and key, even including identical words.
  • writing the key in far more technical, detailed language.
  • producing items related to definitions.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

avoid64
Avoid
  • using modifiers associated with
    • true statements (e.g., may, sometimes, usually) for keys.
    • false statements (e.g., never, all, none, always) for distractors.
  • options having the same meaning.
    • Therefore, both options must be incorrect.
  • using parallel options (mutually exclusive) unless balanced by another pair of parallel options.
  • writing items with undirected stems
    • Use the “undirected stem test”
  • writing items that allow test wise candidates to converge on the key.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

converging on the key
Converging on the key

A. 2 pills qid

B. 4 pills bid

C. 2 pills qid

D. 6 pills tid

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

are you test wise
Are you test wise?

You are test wise if you can select the key based on clues given in the item without knowing the content.

Please refer to your Pre-test Exercise.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test development

Test Development

Julia M. Leahy, PhD

Chauncey Group International

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test development68
Test Development

Job

Analysis

Item Evaluation

Test Plan

Test

Content Experts

Pretest

Draft Items

Edit & Review

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

validity
Validity
  • Test specifications derived from job analysis
  • Test items linked to job analysis and test specifications
  • Test items measure content that is relevant to occupation or job.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

validity70
Validity

Content validity refers to the degree to which the items on a licensure/ certification examination are representative of the knowledge and/or skills that are necessary for competent performance

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

validity71
Validity

The specific use of test scores and/or the interpretations of the results.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

validity72
Validity

Supports the appropriateness of the test content to the domain the test is intended to represent.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test form assembly
Test Form Assembly

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

program considerations
Program Considerations

Factors to consider :

  • Computer-Based Testing
    • Linear test forms versus pools of items
    • Continuous testing or windows
    • Issues of exposure
    • Item bank size: probably large
  • Paper-and-Pencil Examinations
    • Single versus multiple administrations
    • Issues of exposure
    • Item bank size: small to large

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test specifications
Test Specifications
  • What content is important to test?
  • What content is necessary to be a minimally competent practitioner?
  • How much emphasis should be placed on certain content categories?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

creating test specifications
Creating Test Specifications
  • Use data from Job Analysis
  • Determine the following:
    • What content do we put in each test?
    • What feedback do we give people who are not successful?
    • What kinds of Questions do we ask?
      • How many of each kind do we ask?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test specifications77
Test Specifications
  • Should include:
    • Purpose of the Test
    • Intended Population
    • Test Domain and Relative Emphasis
      • Content to be tested
      • Cognitive level to be assessed
    • Mode of Assessment & Item Types
    • Psychometric Characteristics

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test specifications78
Test Specifications
  • Use test specifications
    • Every time a test form is created
    • To be certain each test form asks questions on important content
      • Validity - Passing the test is supposed to mean that a person knows enough to be considered proficient
    • Fairness
      • It would be unfair if certain content were not on every form

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

using test specifications for item development
Using Test Specifications for Item Development

You’re not flying blind!

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

factors influencing numbers of forms and items needed annually
Factors Influencing Numbers of Forms and Items Needed Annually
  • Test Modality
    • Is the test a paper-and-pencil or a computer-based test?
    • Is both methodologies used?
    • If CBT, will test be administered in windows or continuous?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

factors influencing numbers of forms and items needed annually81
Factors Influencing Numbers of Forms and Items Needed Annually
  • Test length
    • How many items will be needed for one form?
    • How many forms?
    • How often will forms be changed?
    • What is the allowable percentage of overlap?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

factors influencing numbers of forms and items needed annually82
Factors Influencing Numbers of Forms and Items Needed Annually
  • Number of test administrations per year
    • Will each administration have a different form?
    • What is the expected test volume per form?
    • How are special test situations handled?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

factors influencing numbers of forms and items needed annually83
Factors Influencing Numbers of Forms and Items Needed Annually
  • Level of Security Needed:
    • Is this a high-stakes examination for licensure or certification in an occupation or profession?
    • Or is the examination for low-stakes certificates, such as with continuing education or self-assessment?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

factors influencing numbers of forms and items needed annually84
Factors Influencing Numbers of Forms and Items Needed Annually
  • Organizational Policies:
    • When and under what circumstances can failing candidates repeat the examination?
    • Must items be blocked for repeat candidates?
    • Are there a minimum number of candidate responses required for new/pretest items?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

using test specifications
Using Test Specifications

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test form assembly86
Test Form Assembly

Select items

  • Meet the test plan specifications
    • Total number of items
    • Correct distribution of items by domains or subdomains
  • Preference to use items with known statistical performance
    • Distribution of statistical parameters, such as difficulty and discrimination

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test form assembly87
Test Form Assembly

Select items

  • Determine need to consider non-test plan parameters in form assembly
    • Cognitive level
  • Use automatic selection software, if possible
    • Generate test forms that meet required and preferred parameters

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test form assembly88
Test Form Assembly

Consider Test Delivery: CBT

  • For continuous CBT delivery, large numbers of equivalent forms are needed for security reasons
  • Every form must meet the same detailed content and statistical specifications
    • Quality assurance is vital; forms cannot vary in quality, content coverage, difficulty, or pacing
    • Reproducibility and accuracy of scores and pass/fail decisions must be consistent across forms and over time

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test form assembly89
Test Form Assembly

Automatic Item Selection

  • Consider multiple rules for selecting items, such as content codes, statistics,
  • Determine number of forms to reduce exposure
  • Still need to evaluate selected form for overlap

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

5 step item review process
5-Step Item Review Process
  • Grammar
  • Style
  • Internal Expert Review
  • Sensitivity/Fairness
  • External/Client Review

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

grammar
Grammar
  • Review items for spelling, punctuation, and grammar
  • Usually done by an editor or trained test developer

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

style
Style
  • Format items to conform with established style guidelines
  • Use capitalization and bolding as appropriate to alert candidates to words such as:
    • Maximum, Minimum, and Except

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

internal review
Internal Review

Review by internal experts for verification of content accuracy

  • Is the key correct?
  • Is the key referenced (if applicable)?
  • Are the distractors clearly wrong, yet plausible?
  • Is the item relevant?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness
Sensitivity/Fairness
  • Candidate Fairness: all candidates should be treated equally & fairly regardless of differences in personal characteristics that are not relevant to the test
  • Acknowledge the multicultural nature of society and treat its diverse population with respect

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness95
Sensitivity/Fairness
  • Review by test developers and/or external organizations
  • Review items for references to gender, race, religion, or any possibly offensive terminology
    • Use only when relevant to the item

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness96
Sensitivity/Fairness

ETS Sensitivity Review Guidelines and Procedures:

  • Cultural diversity of the United States
  • Diversity of background, cultural traditions and viewpoints
  • Changing roles and attitudes toward groups in the US.
  • Contributions of various groups
  • Role of language in setting and changing attitudes towards various groups

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness97
Sensitivity/Fairness

Stereotypes:

  • No population group should be depicted as either being inferior or superior
  • Avoid inflammatory material
  • Avoid inappropriate tone
  • Appropriate tone reflects respect and avoids upsetting or otherwise disadvantaging a group of test takers.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness98
Sensitivity/Fairness

Stereotypes:

  • Examples:
    • Men who are abusers
    • Woman who are depressed
    • African Americans who live in depressed environments
    • 65 and older adults who are frail, elderly and unemployable

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness99
Sensitivity/Fairness

Stereotypes:

  • Examples:
    • People with disabilities who are nonproductive
    • Using diagnoses or conditions as adjectives

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness100
Sensitivity/Fairness

How to Avoid Stereotypes:

  • Examples:
    • A depressed patient
      • The patient with depression
    • A diabetic patient
      • The patient with diabetes mellitus
    • An elderly person
      • A 72-year-old person
    • A psychiatric patient
      • A patient with paranoid schizophrenia

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness101
Sensitivity/Fairness

How to Avoid Stereotypes:

  • Examples:
    • A male who abuses woman
      • An individual with abusive tendencies (avoid gender)
    • A housewife
      • An individual who is a primary caretaker
    • A Hispanic who speaks no English
      • An individual who speaks English as a second language
    • An Asian American who eats Sushi
      • An individual whose diet consists mainly of fish

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness102
Sensitivity/Fairness
  • Population diversity: No one population group should be dominant:
    • Ethnic balance: use ethnicity only when necessary.
    • Gender balance: avoid male and females identification if at all possible.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness103
Sensitivity/Fairness

Ethnic group references:

  • African American or Black;
  • Caucasian or White;
  • Hispanic American;
  • Asian American

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

sensitivity fairness104
Sensitivity/Fairness

Inappropriate tone:

  • Avoid highly inflammatory material that is inappropriate to content of examination
  • Avoid material that is elitist, patronizing, sarcastic, derogatory or inflammatory
    • Examples: lady lawyer; little woman; strong-willed male
  • Avoid terminology that might be known to only one group
    • Examples: stickball; country clubs; maven

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

differential item functioning
Differential Item Functioning
  • Identifies items that function differently for two groups of examinees or candidates
  • DIF is said to occur for an item when the performance on that item differs systematically for focal and reference group members with the same level of proficiency

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

differential item functioning106
Differential Item Functioning
  • Reference group:
    • Majority group: In nursing, that is general white-females
  • Focal groups:
    • All minority groups
      • Men
      • Non-white ethnic groups

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

differential item functioning107
Differential Item Functioning
  • Use the Mantel-Haenszel (MH) procedure, which matches the reference and focal groups on some measure of proficiency, which generally is the total number right score on the test
  • Requires a minimum number per focal group—can do with a minimum of 40-50 in focal group

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

differential item functioning108
Differential Item Functioning

Examples of content related issues:

  • Negative category C DIF was noted for males on content related to woman’s health
  • Positive category C DIF was noted for males on content involving the use of equipment and actions likely to be taken in emergencies

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

differential item functioning109
Differential Item Functioning
  • Negative category C DIF was noted for the focal minority groups on content involving references to/inferences about:
      • assumptions regarding the nuclear family, childrearing, and dominant culture;
      • idiomatic use of language;
      • hypothetical situations requiring "role-playing"

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

external client review
External/Client Review
  • Review by external experts for verification of content accuracy
    • Is the key correct?
    • Is the key referenced (if applicable)?
    • Are the distractors clearly wrong, yet plausible?
    • Is the item relevant?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

review and approval
Review and Approval
  • Item review can be an iterative process
  • Yes, No, and Yes with modifications
  • Who has final sign off on items?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test production
Test Production

Paper-and-pencil forms

  • Determine format
    • One column;
    • Two column
  • Directions -- back page, if booklet sealed
  • Answer sheets

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

test production113
Test Production

Computer-based tests

  • Linear forms or test pools
  • Tutorials
  • Item appearance:
    • Top -- bottom
    • Side by side
  • Survey forms
  • Form review

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

item pool review
Item/Pool Review
  • Establishing a process for item review
  • How item review relates to item approval
  • Who should be involved in the item review
  • How often should items be reviewed

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

timing of item pool reviews
Timing of Item/Pool Reviews
  • Establish a schedule for item reviews
    • Anticipate regulatory or industry changes
  • Review and revise items for content accuracy
    • Use candidate comments for feedback on items
  • Review and revise items based on statistical information
    • * To be discussed in the afternoon session

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

setting the cut score

Setting The Cut-Score

Paul D. Naylor, Ph.D.

Psychometric Consultant

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

standard setting
Standard Setting
  • The process used to arrive at a passing score
  • Lowest score that permits entry to the field
  • Recommended standard

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

standards
Standards
  • Mandated
  • Norm-referenced
  • Criterion-referenced

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

mandated standards
Mandated Standards
  • Often used in licensing
  • Difficult to defend
  • Not related to minimum qualification

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

norm referenced standards
Norm-referenced Standards
  • Popular in schools
  • Limits entry
  • Inconsistent results

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

criterion referenced standards
Criterion-referenced Standards
  • Wide acceptance in professional testing
  • Determines minimum qualification
  • Not test population dependant
  • Exam or item centered

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

procedures
Procedures
  • Angoff (modfied)
  • Nedelsky
  • Ebell
  • Others

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

minimally competent performance
Minimally Competent Performance
  • Minimum acceptable performance
  • Minimal qualification
  • Borderline
  • It’s all relative

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

angoff method
Angoff Method
  • Judges
    • Selection
    • Training
  • Probabilities
  • Would vs Should
  • Rater agreement
  • Tabulation

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

ratings av 1 2 3 4 5 6
RatingsAV 1 2 3 4 5 6

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

application
Application
  • Adjustment
  • Angoff Values
  • Alternative Forms of Exam
  • Passing Score

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

references
References
  • Livingston, S.A. & Zieky, M.J.(1982). Passing Scores. ETS.
  • Cizek, C.J. Standard setting guidelines. Educational Meausrement: Issues and Practices, 15 (1), 13-21, 12.
  • CLEAR Exam Review (Winter 2001, Summer 2001 and others)

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

presentation follow up
Presentation Follow-up
  • Please pick up a handout from this presentation -AND/OR-
  • Please give me your business card to receive an e-mail of the presentation materials -OR-
  • Presentation materials will be posted on CLEAR’s website

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

questions and answers
QUESTIONS AND ANSWERS

THANK YOU!

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring

Scaling and Scoring

Lauren J. Wood, PhD, LP

Director of Test Development

Experior Assessments:

A Division of Capstar

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring131

Scaling and Scoring

Scaling and Scoring
  • Objectives:

-Describe a number of types of scores that you may wish to report

-Define “scaled scores” and describe the scaling process

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

back to the basics

Scaling and Scoring

Back to the Basics

Public Protection

versus

Candidate Fairness

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

slide133

Scoring

  • Scoring of the examination needs to be considered long before the examination is administered.
  • Important to the examinees that the decision (pass/fail) and the scoring (raw, scaled) be reported in simple/clear language

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring who gets the score reports

Scaling and Scoring

Scoring:Who gets the score reports?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring what do you report
Scoring: What do you report?

Types of scores:

-Score outcome: pass/fail,

-Score in comparison to a criterion

-Score in comparison to others

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring what do you report136
Scoring: What do you report?
  • Helpful to candidates to report subscore or section information
    • Strengths and weaknesses
    • Plan for remediation
  • Need to caution that subscore or section score is meaningful
    • Number of subscores/questions
    • Can be graphical rather than numeric

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring what do you report137
Scoring:What do you report?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring when where do you report the scores
Scoring:When/where do you report the scores?
  • Delayed: Mailed to examinee’s home

-group comparisons

-scoring/scaling

  • Immediate: At the test site

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scoring
Scoring
  • Score Reporting:

-Compute examinee scores directly (raw scores)

-Compute scaled or derived scores

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring140
Scaling and Scoring

Three Candidate’s Experience:

Exam: Residential Plumbing Exam (Theory)

Administration date: August 11, 2003

Cut score: 70 % correct

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports
Three Candidate’s Score Reports

Scaling and Scoring

Candidate #2

The Occupational and Professional Licensing Division regrets to inform you that you did not attain a satisfactory grade on the Residential Plumbing examination that you took on August 11, 2003.

Your examination grade is: 74 Fail

You may retake the examination…

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports142
Three Candidate’s Score Reports

Scaling and Scoring

Candidate #3

The Occupational and Professional Licensing Division is pleased to inform you that you attained a satisfactory grade on the Residential Plumbing examination that you took on August 11, 2003.

Your examination grade is: 66 Pass

Congratulations! You may apply for licensure by…

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports143
Score

Candidate #1 68

Candidate #2 74

Candidate #3 66

Result

Fail

Fail

Pass

Scaling and Scoring

Three Candidate’s Score Reports

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports144
Three Candidate’s Score Reports

Scaling and Scoring

What happens next?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring145
Scaling and Scoring

Three Candidate’s Score Reports

Candidate #1

The Occupational and Professional Licensing Division regrets to inform you that you did not attain a satisfactory grade on the Residential Plumbing examination that you took on August 11, 2003.

Your examination raw score is: 68 Fail

You may retake the examination…

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports146
Residential Plumber Examination

Form #1

Form #2

Form #3

Scaling and Scoring

Three Candidate’s Score Reports

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling
Scaling

Why develop multiple forms of the same examination?

  • Exam security
  • Examination and question content changes over time

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

how do the forms of the examination differ
Item content differs, though the content of the items remains true to the exam content outline

Item difficultly, discrimination, etc. differ across the different forms of the examination

Scaling and Scoring

How do the forms of the Examination Differ?

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports149
Residential Plumber Examination

Form #1: P-value=.66

Form #2: P-value=.72

Form #3: P-value=.70

Three Candidate’s Score Reports

Scaling and Scoring

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

equating

Scaling and Scoring

Equating:
  • The design and statistical procedure that permits scores on one form of a test to be comparable to scores on an alternative form of an examination

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

why equate forms

Scaling and Scoring

Why equate forms?
  • Adjust for unintended differences in form difficulty
  • Ease in candidate to candidate score interpretation
  • Maintain candidate fairness in the testing process

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

how is this done

Scaling and Scoring

How is this done?
  • There are a number of methods used to equate examination scores
  • Statistical conversions of the scores are applied and the resulting scores are often called “scaled scores” or “derived scores”

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports153

Scaling and Scoring

Three Candidate’s Score Reports

RawScaledStatus

ScoreScore

Candidate #1 68 67 Fail

Candidate #2 74 66 Fail

Candidate #3 66 70 Pass

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

three candidate s score reports154
Three Candidate’s Score Reports

Scaling and Scoring

Candidate #1

The Occupational and Professional Licensing Division regrets to inform you that you did not attain a satisfactory grade on the Residential Plumbing examination that you took on August 11, 2003.

Your examination scaled score is: 67 Fail

You may retake the examination…

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring155
Scaling and Scoring
  • Raw score
    • Advantage
      • Meaning clearly understood
    • Disadvantage
      • Can’t make comparisons
      • Specific for each test administration

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring156
Scaling and Scoring
  • Scaled score:

Based on test mean, standard deviation and raw score

    • Advantage
      • Make meaningful comparisons
    • Disadvantage
      • Interpretation not clear cut

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling157
Scaling
  • When scaling procedures are performed it is often important to provide an explanation that such procedures will be performed and what these scores will mean to the candidate.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling158
Scaling

“The Purpose of Scaling” (Candidate Information Bulletin)

Scaling allows scores to be reported on a common scale. Instead of having to remember that a 35 on the examination that you took is equivalent to a 40 on the examination that your friend took, we can use a common scale and report your score as a scaled score of 75.

Since we know that your friend’s score of 40 is equal to your score of 35, your friend’s score would also be reported as a scaled score of 75.

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

scaling and scoring159
Scaling and Scoring
  • Starts at the beginning of the test development process
  • Report as outcome, relation to criterion, relation to others
  • Delayed or immediate
  • Raw or scaled scores

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003

presentation follow up160
Presentation Follow-up
  • Please pick up a handout from this presentation -AND/OR-
  • Please give me your business card to receive an e-mail of the presentation materials -OR-
  • Presentation materials will be posted on CLEAR’s website

Presented at CLEAR’s 23rd Annual Conference

Toronto, Ontario September, 2003