EDN 523 Educational Research Validity and Educational Reform Accountability Models
Validity and Educational Reform
In the current age of educational reform, high-stakes decisions based on large-scale testing performance are becoming increasingly common. The decisions associated with test performance carry significant consequences (e.g., rewards and sanctions). The degree of confidence in, and the defensibility of test score interpretations depends upon valid, consistent, and reliable measurements. Stated differently, as large-scale assessment becomes more visible to the public, the roles of validity (and reliability) become more important.
Validity is a test’s most important characteristic.
It is the degree to which a test measures what it is supposed to measure and as a consequence allows and supports appropriate interpretation of the scores.
What is this test really measuring?
Constructs or concepts can be non-observable traits, such as intelligence, which are “invented” terms to explain educational outcomes.
Using this information, we created a construct called intelligencethat is related to learning and that everyone possesses to a greater or lesser degree.
Let’s say Billy Bob owns a testing company in Texas. Billy Bob has designed an IQ test that he is marketing to public schools across the country.
If we wanted to determine if Billy Bob’s IQ Test was construct valid, then we would need to carry out several validation studies.
Using Billy Bob’s Test and relating it to Dr. Kozloff’s presentation last week…if the findings “say” that a student has an IQ of 140 and this is confirmed whencompared with the student’s scores on other, well-established IQ tests and other measurements, then these “matches” strengthen the constructvalidity or validness of Billy Bob’s Test as an intelligence measuring instrument.
Content Validity is based on professional judgments about the relevance of the testcontent to the particular domain of interest, such as the NC Standard Course Of Study, and about the representativeness with which items and/or task content on the instrument “cover” the domain.
The Content Validity of the NC ABC Accountability Tests is directly related to whether or not the items, tasks, and concepts are aligned with the domains, constructs, and/or variables we are attempting to assess and measure.
The establishment of evidence of test relevance and representativeness of the “target” domains is a critical first step in being able to say that the test score interpretations are based on strong content validity.
Professor Kozloff discussed program decision making last week relative to evaluating whether or not a new reading program would be needed if students have low reading test scores.
He recommended that we ask the following questions:
How was the reading evaluated?
Did the reading test DIRECTLY measure reading skills?
Math Example: If we simply look at data on math ability among individuals, it initially appears that shoe size is directly related to a person’s math skills. Each time we test or measure a person’s math ability and compare it to their shoe size, there is a 1:1 positive correlation; the larger a person’s shoe size, the better they are in solving higher level math problems.
How is that possible?
Shoe Size ↔ Math Ability
Education Level ↔ Earning Power
College Graduate’s Earning Power
High School Graduate’s Earning Power
Are they related? You bet your derriere they are!
College graduates average lifetime earnings = $ 2.2 million
HS graduates average lifetime earnings = $ 1.2 million
Difference = $ 1 million
30 year work life/$ 1 million = $33, 333 more per year
It is estimated that there are 3.8 million youth between the ages of 18 and 24 who are neither employed nor in school—roughly 15 percent of all young adults. Since 2000 alone, the ranks of these non-engaged young adults grew by 700,000, a 19 percent increase over 3 years (Annie E. Casey Foundation, 2004).
Two Data Sources:
U.S. Department of Justice Office of Justice
Programs Bureau of Justice Statistics Data, (2005).
D. A. Anderson, (1999) "The Aggregate Burden
of Crime," Journal of Law and Economics,
Source: Justice Expenditure and Employment Extracts
Total = $ 170 Billion
Cost of goods and services that would be unnecessary in a country with low crime rates.
Total = $ 233.3 billion
In addition to the direct cost of resources devoted to crime, there is a sizable loss of time by people who are potential victims of crime and by those who have committed crime. And as the saying goes - time is money.
Total = $ 44.7 Billion
We also need to add the cost of property and money
that is stolen or obtained through fraud.
Total comes to $603 billion.
Grand Total = $1.12 Trillion Dollars Per Year
Gateway or Barrier to a prosperous, successful life
If you have it, you have opportunities
If you don’t have it, you have a cross to bear
What is the primary reason students drop out of
school and do not graduate with a diploma?
THEY CANNOT READ!
Let’s see…educated citizenry, economic well-being, secure and prosperous nation, successful individuals capable of providing for themselves and their families…what do you think?
Passed in 1995 sponsored by then state senator Beverly Perdue, required the State Board of Education to reorganize the NCDPI and to develop an accountability plan for the state.
The result was the ABCs of Public Education, which included a plan to revise the Standard Course of Study.
The General Assembly accepted the accountability plan and subsequently enacted Senate Bill 1139, the School-Based Management and Accountability Program.
The Goal of North Carolina's Reading First (NCRF) initiative is to ensure that all children learn to read well by the end of the third grade. This goal will be accomplished by applying scientifically based reading research to reading instruction in all North Carolina schools. The initiative requires phonics instruction.
Alignment is a key issue in standardized tests, as much as it provides a means for establishing evidence for score interpretation.
Test validity is not a static quality; it is an evolving property and a continuing process.
Test content validity is based on professional
judgments about the relevance of the test
content to the content of a particular behavioral
domain of interest & about the representativeness
with which items and tasks cover that domain.
If a test is designed to measure reading achievement and a test score is judged relative to a set proficiency standard (i.e., a cut score), the interpretation of reading proficiency will be heavily dependent on a match (or alignment) between test content and content areaexpectations.
Alignment provides an avenue for
establishing evidence for score interpretation.
Evaluating test alignment should occur
regularly, taking its place in the recurring
process of assessment development and revision
of the testing instruments. In addition, tests
should be regularly assessed and evaluated
following established standards.
The "Standards for Educational and Psychological Testing" established by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, are intended to provide a comprehensive approach for evaluating tests based on key standards applicable to most test evaluation situations (Rudner, 1994).
If we consider these questions relative to the NC ABCs Testing Model , then there must be a clear statement of recommended uses, a meaningful description of the population for which the test is intended, and a valid representation of the NC Standard Course of Study (NCSCOS).
The individuals in the norming and validation samples should represent the group for which the test is intended in terms of age, experience and background.
In an effort to evaluate the validity of the NC ABCs Accountability System, the State Board of Education established a plan in March, 2005 that will utilize the Standards for Educational Accountability Systems model developed by the National Center for Research on Evaluation, Standards, and Student Testing.
1. Accountability expectations should be made public for all participants
in the system.
2. Accountability systems should employ many different types of data
from multiple sources.
3. Accountability systems should include data elements that allow for
interpretations of student, institution, and administrative performance.
4. Accountability systems should include the performance of all students,
including subgroups that historically have been difficult to assess.
5. The weighting of elements in the system, including different types of
test content, and different information sources, should be made
6. Rules for determining adequate progress of schools and individuals
should be developed to avoid erroneous judgments attributable to
fluctuations of the student population or errors in measurement.
7. Decisions about individual students should not be made on the basis of a
8. Multiple test forms should be used when there are repeated administrations
of an assessment.
9. The validity of measures that have been administered should be documented for the various purposes.
10. If tests are to help improve system performance, there should be
information provided to document that test results are modifiable by quality
instruction and student effort.
11. If test data are used as a basis of rewards or sanctions, evidence of
technical quality of the measures and error rates associated with
misclassification of individuals or institutions should be published.
12. Evidence of test validity for students with different language backgrounds
should be made public.
13. Evidence of test validity for children with disabilities should be made
14. If tests are claimed to measure content and performance standards,
analysis should document the relationship between the items and specific standards or sets of standards.
NCG.S. 115C-105.35, Section 7.12 (a)
VALIDITY OF ABC ACCOUNTABILITY SYSTEM
During the 2004-2005 school year and at least every five
years thereafter,the State Board shall evaluate the
accountability system and, if necessary, modify the testing
standards to assure the testing standards continue to
reasonably reflect the level of performance necessary to be
successful at the next grade level or for more advanced
study in the content area.
As part of this evaluation, the State Board shall, where available, review the historical trend data on student academic performance on State tests.
To the extent that the historical trend data suggests that the current standards for student performance may not be appropriate, the State Board shall adjust the standards to assure that they continue to reflect the State's high expectations for student performance.
SECTION 7.12.(b) The State Board shall complete its initial evaluation and any necessary modifications to the testing standards required under G.S. 115C-105.35, as rewritten by subsection (a) of this section, so that the modified standards are in effectno later than the 2005-2006 school year.
A review of the original growth formulas found that:
Statewide ABCs growth over time, by grade level, forms a
saw-toothed pattern of gains and dips in the percent of
schools meeting and exceeding growth targets in reading or
mathematics as a cohort of students moves from grade to
The percent of schools meeting or exceeding growth
expectations in reading or mathematics does not appear to
be highly correlated to curricular implementation (i.e., an
historically high percent of schools met and exceeded
expectations in the first year of testing a new curriculum).
North Carolina’s ABCs Accountability Model uses a quasi-longitudinal
approach wherein at least a year’s worth of growth for a year of
schooling is expected.
The average rate of growth observed across the state as a whole from one
grade in the spring of one year to the next grade in the spring of the next
year serves as a benchmark improvement for students in a given grade.
Comparisons to expected growth are used to classify schools into one of
Four categories: exemplary schools, schools meeting expected growth,
schools having adequate performance, and low-performing schools
The quasi-longitudinal model is recommended
by most research and evaluation groups and
organizations as a more “accurate” means for
assessing public school performance.
If that is true, how does NC’s Accountability
Model compare to other states?
The National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card," is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas. Since 1969, assessments have been conducted periodically in reading, mathematics, science, writing, US history, civics, geography, and the arts.
The National Assessment of Educational Progress (NAEP) is a congressionally mandated project of the National Center for
Education Statistics (NCES), within the Institute of Education Sciences at the U.S. Department of Education
NAEP has two major goals:
1) to discover what American students know and can do in key subject areas, and
2) to measure educational progress over long periods of time.
A score, derived from student responses to NAEP assessment items, that summarizes the overall level of performance attained by a group of students.
NAEP subject area scales typically range from 0 to 500 (reading, mathematics, history, and geography) or from 0 to 300 (science, writing, and civics).
When used in conjunction with interpretive aids, such as item maps, they provide information about what a particular aggregate of students in the population knows and can do.
NAEP reports information for the nation and specific geographic regions of the country. It includes students drawn from both public and nonpublic schools and reports results for student achievement at grades 4, 8, and 12.
It also provides state comparison information.
Since 1990, NAEP assessments have been conducted to give results for participating states. Those that choose to participate receive assessment results that report on the performance of students in that state. In its content, the state assessment is identical to the assessment conducted nationally.
However, because the national NAEP samples were not, and are not currently designed to support the reporting of accurate and representative state-level results, separate representative sample of students are selected for each participating jurisdiction/state.
Out 40 States
4th Grd Reading
4th Grd Math
8th Grd Reading
8th Grd Math
Public access to all NAEP information, data, and
reports is available on the following web sites:
The validity (and reliability) of high-stakes testing has a long way to go. NCLB has established
the beginning of an accountability system that it is attempting to ensure high quality public schools across the country. Regardless of how we feel about the mandate and subsequent requirements we should be compelled to continue our quest for measuring student performance and “how we are doing” for the following reasons:
North Carolina ranks 34th in college attendance rates among the 50 states. 85% of all jobs by 2010 will require 14 or more years of education.
Less than 80% of adult North Carolinians have completed high school.
Only 3 states have lower high school completion rates than North Carolina.
Less than a third of NC's fourth and eighth graders scored in the proficient range on standardized tests.
NC places 47th in a state comparison of SAT scores. The placement improves to 32nd when the scores are adjusted for the number of students being tested.
Twelve percent of teens 16 to 19 in NC are not enrolled in school & are not high school graduates. When the number of adults holding college degrees in NC is compared with other states, we're dropping further behind, from 37th to 39th.
Questions & Comments