test development n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Test Development PowerPoint Presentation
Download Presentation
Test Development

Loading in 2 Seconds...

play fullscreen
1 / 50

Test Development - PowerPoint PPT Presentation


  • 1070 Views
  • Uploaded on

Test Development. stages. Test conceptualization defining the test Test construction Selecting a measurement scale Developing items Test tryout Item analysis Revising the test. 1. Test conceptualization. Defining the scope, purpose, and limits of the test.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Test Development


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Test Development

    2. stages • Test conceptualization • defining the test • Test construction • Selecting a measurement scale • Developing items • Test tryout • Item analysis • Revising the test

    3. 1. Test conceptualization • Defining the scope, purpose, and limits of the test.

    4. Initial questions in test construction • Should the item content be similar or varied? Should the range of difficulty be narrow or broad? • ceiling effect vs. floor effect • How many items should be created?

    5. Which domains should be tapped? • the test developer may specify content domains and cognitive skills that must be included on the test. • What kind of test item should be used?

    6. 2. Test construction

    7. Selecting a scaling method

    8. levels of measurement • N • O • I • R

    9. Scaling methods • Most are rating scales that are summative • May be unidimensional or multi-dimensional

    10. Method of paired comparisons • Aka forced choice • Test taker is forced to pick one of two items paired together

    11. Comparative scaling • Test takers sort cards or rank items from “least” to “most”

    12. Categorical scaling • Test takers sort cards into one of 2 or more categories. • Stimuli are thought to differ quantitatively not qualitatively

    13. Likert type scales • Response choices are ordered on a continuum from one extreme to the other (e.g., strongly agree to strongly disagree). • Likert assumes an interval scale although this may not be realistically accurate.

    14. Guttman scales • Response choices for each item are various statements that lie on a continuum. • Endorsing the most extreme statement reflects endorsement of milder statements as well.

    15. Method of equal-appearing intervals • Presumed to be interval • For knowledge scale: • obtain T/F statements • Experts rate each item • For attitude scale • Judges rate each item on a likert scale assuming equal intervals • For both • Total test score for the test taker is based on “weighted” items (determined by averaging the experts ratings)

    16. Method of absolute scaling • Way to determine the difficulty level of items. • Give items to several age groups, with one age group acting as the anchor. • Item difficulty is assessed by noting the performance of each age group on each item as compared to the anchor group.

    17. Method of empirical keying • Based entirely on empirical findings. • Test developer comes up with several items and then gives these to a group of people who are known to possess the construct and a group who is known not to possess the construct. • Items are selected based on how well they distinguish one group from the other.

    18. Writing the items

    19. Item format • Selected response • Constructed response

    20. Multiple choice • Pros---- • Cons----

    21. Matching • Pros---- • Cons----

    22. True/False • Pros---- • Cons---- • Forced-choice methodology.

    23. Fill in • Pros---- • Cons----

    24. Short answer objective item • Pros--- • Cons---

    25. Essay • Pros---- • Cons----

    26. Scoring items • Cumulative model • Class/category • Ipsative • Correction for guessing

    27. 3. Test tryout • Should be on group that represents the ultimate group of test takers (who the test is intended for) • Good items • Reliable • Valid • Discriminate well

    28. Before item analysis, look at the variability of scores within the test • Floor effect? • Ceiling effect?

    29. 4. Item analysis • helps determine which items should be kept, revised, deleted.

    30. Item-difficulty index • proportion of examinees who get the item correct. • can get a mean item difficulty.

    31. Ideal item difficulty • when using multiple guess items, try to account for the probability of chance. • Optimal item difficulty = 1+g/2 • exception to choosing item difficulty around mid-range involves tests of extreme groups.

    32. Item endorsement • proportion of examinees who endorsed the item.

    33. Item reliability index • Indication of internal consistency • Product of the item SD and the correlation between the item and total scale • Items with low reliability can be eliminated

    34. Item validity index • Correlate item with criterion – (helps identify predictively useful test items) • Multiply the item score and the criterion total score with the SD of the item. • The usefulness of an item also depends on its dispersion or ability to discriminate

    35. Item discrimination index • how well the item discriminates between high scorers and low scorers on the test. • For each item, compare the performance of those in the upper vs lower performance ranges. Formula: d= (U-L)/N • U = # of pple in the upper range who got it right • L= # of pple in the lower range who got it right • N= total # of pple in the upper OR lower range.

    36. Interpreting the IDI • can vary from –1 to +1. • A (–) number = • A 0 indicates = • The closer the IDI is to +1 • Can also use the IDI approach to examine the pattern of incorrect responses.

    37. Item characteristic curves • “Graphic representation of item difficulty and discrimination” • horizontal line = ability • vertical line = probability of a correct response

    38. plots the probability of a correct response relative to the position on the entire test. • If the curve is an incline slope or like an S, the item is doing a good job of separating low and high scorers.

    39. Item fairness • Items should measure the same thing across groups • Items should have similar ICC across groups • Items should have similar predictive validity across groups

    40. Speed tests • Easy items, similar items – everyone gets correct. • Measuring response time • Traditional analyses of items do not apply

    41. Qualitative item analysis • Test takers descriptions of the test • Think aloud administrations • Expert panels

    42. 5. Revising the test • based on the info we obtained from the item analysis. New items and additional testing of these items may be required.

    43. Cross validation • Once you have your revised test, need to seek new, independent confirmation of the test’s validity. • The researcher uses a new sample to determine if the test predicts the criterion as well as it did in the original sample.

    44. Validity shrinkage • Typically, with cross validation, you will find that the test is less accurate in predicting the criterion with this new sample.

    45. Co-validation • Validating two or more tests at the same time • Co-norming • Saves $ • Beneficial for tests that are used together

    46. 6. Publishing the test • final step that involves development of a test manual.

    47. Production of testing materials • Testing materials that are user friendly will be more accepted. The lay out of the materials should allow for smooth administration.

    48. Technical manual • Summarizes the technical data and references. Item analyses, scale reliabilities, validation evidence , etc can be found here.

    49. User’s manual • provides instruction for administration, scoring, and interpretation. • The Standards for Educational and Psychological Testing recommend that manuals meet several goals (p 135). • two of the most important: • 1. describe the rationale and recommended uses of the test • 2. provide data on reliability and validity.

    50. Testing is big business