Appendix 3 Statistical Properties of Standardized Tests: How to Interpret a Child’s Test Score

Appendix 3Statistical Properties of Standardized Tests: How to Interpret a Child’s Test Score Beate Peter

Standardized Tests in Clinical Practice SLPs use standardized tests routinely as part of a comprehensive assessment The test yields raw scores plus various standardized scores These scores need to be interpreted and incorporated into clinical decisions

Articulation and Phonology Tests Tests vary along the following parameters: • Purpose: articulation or phonological processes • Target sounds: consonants only, consonants plus rhotic vowels, consonants plus all vowels • Weighting of sounds • sample all sounds in all possible word positions, sum the errors • Sample all sounds in all possible word positions, then weight the errors by how frequently the errored sound occurs in spoken language • Norming sample

Constructing a Hypothetical Standardized Test of Articulation Figure A3.1 Dot plot of hypothetical test scores • Raw scores • Premise: children’s abilities change as a function of age; the trajectories differ for boys and girls • Give the test to, say, 400 children • Within given age ranges • Separately for boys and girls • The hypothetical test scale ranges from 0 to 50

Figure A3.2 Histogram of test scores consolidated into 14 bins

Descriptive Statistics • Mean: What was the average of the raw scores? • Sum up the raw scores from all children, then divide by the number of children • Variance: How widely spread were the scores (in units of squared test scores)? • Take a child’s raw score, compute the difference to the mean, then square that difference (so it’s always positive), then do the same for all the other children’s scores and add them up, then divide that sum by the number of children (minus 1) • Standard deviation: How widely spread were the scores (in units of test scores)? • Take the square root of the variance

Mean Variance (Average sum of squares) Standard deviation

More Information About Norming Samples • Norming distributions do not necessarily follow a normal distribution • Skewness is a measure of asymmetry • Negative skew: left tail is longer than the right tail • Positive skew: right tail is longer than the left tail • Kurtosis is a measure of how flat or peaked the distribution is • Platykurtic distributions: flatter than a normal distribution • Leptokurtic distributions: higher and narrower than a normal distribution

Standardized Test Scores • Z score: A normed test score in units of standard deviations, e.g., -1,2 or 0.5 • Standard score: A linear transformation of the zscore to one of several available normed scales (e.g., mean 100 [SD = 15]; mean = 10 [3]; mean = 50 [8]) • T score: A linear transformation of the zscore to convert it to a scale that is nearly always positive (multiply by 10, add 50) • Percentile: Out of 100 hypothetical random observation, X were lower than the tested individual. • Example: “Ella obtained a percentile ranking of 29.” = If Ella were one of 100 children taking the test, 29 would obtain a lower score than Ella. • Confidence interval: A measure of how reliable the obtained test score is. • Example: “Ella’s standard score falls into a 90% confidence interval of 87 to 94.” = If the test were to be repeated 100 times, 90 times the true score would be found in the interval between 89 and 94. Higher confidence of 95% usually requires a broader interval. • Age equivalent: The median score that children of a certain age obtained. • Example: “Kyle, age 5;6 (years/months) obtained a raw score of 37, with an age equivalent of 4;6.” = Kyle’s performance was as high as that of half the children in a sample at age 4;6, which looks like he is delayed in his abilities. • Bear in mind that the median score of a group of same-age children is not a meaningful way to describe the performance of a tested child.

The Quality of a Standardized Test • For some tests, the score distribution for an age/sex cohort does not follow the normal (Gaussian) distribution. In that case, a standard score of 100 does not necessarily correspond to the 50th percentile. • Norming samples vary, so be sure and look under the hood to learn to which cohort you are comparing your proband child. Here are just some of the variables: • Size • Age • Sex • Ethnicity • SES • Geographical regions • Inclusion or exclusion of children with disabilities

Comparing Some Norming Samples

Validity and Reliability • Validity: What the test measures • Concurrent validity: Correlations with other closely or distantly related tests; tests administered to the same individuals at the same time • Predictive validity: Correlations with other tests administered to the same individuals later • Construct validity: Correlations with measures that are known to target the same ability or trait • Reliability: Howthe test performs its purpose • Inter-rater reliability: Correlation between tests scores obtained by different clinicians • Test-retest reliability: Correlation between test scores obtained in a first and second administration of the test • Internal consistency: Correlations between test scores from two halves of the test items

From Standardized Test Scores to Clinical Decisions • How should standard scores be interpreted? • How can they be used in clinical decisions? • The answers depend, in part, on given guidelines and resources • Most agree that scores > -1 standard deviation do not qualify for treatment • Some settings (by federal, state, or local guidelines) require a score < -1.5 standard deviations (about 7th percentile) or even -2 standard deviations (about 2rd percentile) • In some settings, the nature of the speech errors determines whether a child qualifies for services, e.g., • Must have speech sounds across at east two classes in error • Must have difficulty saying his/her own name • Must demonstrate reduced access to/benefit from instruction at school • Rules for qualifying a child for treatment depend on • Other clinical observations • Nature of speech errors • Overall profile of strengths and needs • Impact of the disorder in daily life

Appendix 3 Statistical Properties of Standardized Tests: How to Interpret a Child’s Test Score