An International Partnership for Improving the Quality of Admissions Tests

An International Partnership for Improving the Quality of Admissions Tests PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on
  • Presentation posted in: General

An International Partnership for Improving the Quality of Admissions Tests. State Students Admission Commission (SSAC)Maleyka Abbaszade Oleg Shelaginov Educational Testing Service (ETS)Luis SaldiviaLinda Cook. Project Background. IAEA 2004 ConferenceUSAID and IIE Sponsored Training in the USETS Evaluation of SSAC Admissions Testing ProgramDevelopment of Action PlanAssistance in Implementing the Action Plan.

Download Presentation

An International Partnership for Improving the Quality of Admissions Tests

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. An International Partnership for Improving the Quality of Admissions Tests IAEA 2007 Baku, Azerbaijan Good Morning [afternoon]. I’d like to welcome all of you to our session on An International Partnership for Improving the Quality of Admissions Tests. My name is Linda Cook. I work at Educational Testing Service in the United States and it is my pleasure to co-chair this session with Dr. Maleyka Abbaszade, the Director of the State Students Admission Commission of Azerbaijan. This morning [afternoon] we are going to tell you about a partnership between SSAC and ETS to improve the quality of the SSAC University Entrance Examinations.Good Morning [afternoon]. I’d like to welcome all of you to our session on An International Partnership for Improving the Quality of Admissions Tests. My name is Linda Cook. I work at Educational Testing Service in the United States and it is my pleasure to co-chair this session with Dr. Maleyka Abbaszade, the Director of the State Students Admission Commission of Azerbaijan. This morning [afternoon] we are going to tell you about a partnership between SSAC and ETS to improve the quality of the SSAC University Entrance Examinations.

2. An International Partnership for Improving the Quality of Admissions Tests State Students Admission Commission (SSAC) Maleyka Abbaszade Oleg Shelaginov Educational Testing Service (ETS) Luis Saldivia Linda Cook The speakers in this morning’s [afternoon’s] session will be Maleyka Abbaszade, the Director of the State Students Admission Commission, Oleg Shelaginov, [add Oleg’s title], Luis Saldivia, an Assessment Specialist in the ETS Test Development area, and myself, and I am a Research Scientist in the Center for Validity Research at ETS. The speakers in this morning’s [afternoon’s] session will be Maleyka Abbaszade, the Director of the State Students Admission Commission, Oleg Shelaginov, [add Oleg’s title], Luis Saldivia, an Assessment Specialist in the ETS Test Development area, and myself, and I am a Research Scientist in the Center for Validity Research at ETS.

3. Project Background IAEA 2004 Conference USAID and IIE Sponsored Training in the US ETS Evaluation of SSAC Admissions Testing Program Development of Action Plan Assistance in Implementing the Action Plan Before we begin our presentations today, I thought it might be useful to provide you with some background about the project that we are going to talk about. Following a conference of the IAEA that was held in 2004, Dr. Abbaszade approached the U.S. Ambassador in Azerbaijan and asked for support to improve the university entrance examination system in Azerbaijan. As a result of this contact, the United States Agency for International Development (USAID) and the Institute of International Education (IIE), agreed to support representatives from the SSAC, as well as other stakeholders in higher education in Azerbaijan, to travel to Princeton, New Jersey, in the United States, to participate in seminars provided by the Global Institute of ETS. In addition the SSAC obtained sponsorship from USAID and World Learning to support a team of experts from ETS (Luis, Mary Pitoniak, who can not be with us today, and myself) to travel to Baku in the spring of 2006 to provide an onsite evaluation of the university entrance examinations program and to develop an Action Plan to assist the SSAC in improving their tests and test administration procedures. The ETS team was fortunate to be able to return to Baku in the spring of 2007 to work with SSAC to implement specific steps in the Action Plan. The presentation you will hear this morning [afternoon] will describe the collaboration that took place between ETS and the SSAC during the two visits of the ETS team to Baku.Before we begin our presentations today, I thought it might be useful to provide you with some background about the project that we are going to talk about. Following a conference of the IAEA that was held in 2004, Dr. Abbaszade approached the U.S. Ambassador in Azerbaijan and asked for support to improve the university entrance examination system in Azerbaijan. As a result of this contact, the United States Agency for International Development (USAID) and the Institute of International Education (IIE), agreed to support representatives from the SSAC, as well as other stakeholders in higher education in Azerbaijan, to travel to Princeton, New Jersey, in the United States, to participate in seminars provided by the Global Institute of ETS. In addition the SSAC obtained sponsorship from USAID and World Learning to support a team of experts from ETS (Luis, Mary Pitoniak, who can not be with us today, and myself) to travel to Baku in the spring of 2006 to provide an onsite evaluation of the university entrance examinations program and to develop an Action Plan to assist the SSAC in improving their tests and test administration procedures. The ETS team was fortunate to be able to return to Baku in the spring of 2007 to work with SSAC to implement specific steps in the Action Plan. The presentation you will hear this morning [afternoon] will describe the collaboration that took place between ETS and the SSAC during the two visits of the ETS team to Baku.

4. Overview of Presentation SSAC Entrance Examinations Maleyka Abbaszade The ETS Audit Process and Results Linda Cook Implementation of Action Plan; Onsite Training Luis Saldivia Changes in the SSAC Testing Program Oleg Shelaginov Dr. Abbaszade will begin our presentation this morning [afternoon] by providing you with some of the history and background of the SSAC entrance examinations. When Dr. Abbaszade has finished, I will say a few words about the ETS audit process and the results of our audit of the SSAC entrance examinations. Next, Luis Saldivia will talk about the action plans that ETS and SSAC prepared to implement some of the suggested changes and he will also talk about the onsite training that took place in Baku this past spring, and finally Oleg Shelaginov will tell you about some changes that have been made in the SSAC entrance examinations as a result of this project. Now, I’ll turn things over to Maleyka Abbaszade for a brief discussion of the SSAC entrance examinations.Dr. Abbaszade will begin our presentation this morning [afternoon] by providing you with some of the history and background of the SSAC entrance examinations. When Dr. Abbaszade has finished, I will say a few words about the ETS audit process and the results of our audit of the SSAC entrance examinations. Next, Luis Saldivia will talk about the action plans that ETS and SSAC prepared to implement some of the suggested changes and he will also talk about the onsite training that took place in Baku this past spring, and finally Oleg Shelaginov will tell you about some changes that have been made in the SSAC entrance examinations as a result of this project. Now, I’ll turn things over to Maleyka Abbaszade for a brief discussion of the SSAC entrance examinations.

7. Who we are: a public organization immediately subordinated to the President of the Republic of Azerbaijan was founded in 1992

8. What we do: administering entrance examinations testing candidates to civil service providing a wide range of on-line educational and informational services approximately 100,000 applicants annually

10. The challenge was to develop a system which: ensures equal access in higher education sector provides transparency of the examination process minimizes risk in decision-making applies objective scientific approaches in knowledge assessment sets up high standards aligned with international practice.

11. Mission Statement To ensure equal access to higher education by eradicating partiality and nepotism in university entrance process

12.

14.

15. The Internet Services subsystem includes a wide range of free on-line services : registration of applicants pretest or trial test access to examinations results data base on secondary schools access to educational materials media releases

16. Internet Services Subsystem Websites created by SSAC: www.tqdk.gov.az www.abiturient.az is the website of the journal “Abiturient”. www.mekteb.edu.az – an information resource on Azerbaijani schools. www.elachi.edu.az – online preparation courses. www.polyglot.az – portal of the “polyglot” dictionaries system prepared by SSAC. www.unicode.az – the portal is devoted to solving problems concerning the usage of the Azerbaijani alphabet due to its change (in 1992) from Cyrillic into Latin.

17. Besides these websites we are producing: digital materials TV programmes educational publications like our bestseller journal “Abiturient” in your conference folder

22. The ETS Audit Process Linda Cook What I am going to do now is to give you a very brief overview of the ETS audit process, which is the process that we used to evaluate the SSAC entrance examinations, and then I will provide a very brief summary of the audit results.What I am going to do now is to give you a very brief overview of the ETS audit process, which is the process that we used to evaluate the SSAC entrance examinations, and then I will provide a very brief summary of the audit results.

23. The ETS Audit Process Developed at ETS to evaluate our products and services Takes place annually Each ETS product or service is reviewed once every three years The ETS audit process is a process that was developed by ETS to use to evaluate our own products and services. At ETS, the audit process takes place annually, with each ETS product or service being reviewed once every three years.The ETS audit process is a process that was developed by ETS to use to evaluate our own products and services. At ETS, the audit process takes place annually, with each ETS product or service being reviewed once every three years.

24. ETS Standards for Quality and Fairness Developed by ETS experts Closely aligned with the Standards for Educational and Psychological Testing American Educational Research Association American Psychological Association National Council on Measurement in Education The ETS audit process consists of evaluating our products and services against the ETS Standards for Quality and Fairness which were developed by ETS experts and which are closely aligned with the Standards for Educational and Psychological Testing developed by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education.The ETS audit process consists of evaluating our products and services against the ETS Standards for Quality and Fairness which were developed by ETS experts and which are closely aligned with the Standards for Educational and Psychological Testing developed by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education.

25. The ETS External Audit or Evaluation Service Provides international institutions with objective feedback on their testing processes Testing programs undergoing this process have agreed to be evaluated with respect to a uniform, rigorous set of standards through a well documented process The ETS audit program does not issue any type of certification or accreditation The audit program offers guidance on how to remedy specific instances of noncompliance with internationally accepted standards of fair and valid testing The purpose of the ETS Standards is to help ETS, as well as other organizations who use the standards, to develop and deliver technically sound, fair, and useful products and services. ETS offers an audit service, using the ETS Standards, to national and international institutions. Through this audit or evaluation service, ETS provides international institutions with objective feedback on their testing processes. A testing program such as SSAC that agrees to undergo the ETS audit process has agreed to be evaluated with respect to a uniform, rigorous set of standards through a well-documented process. It is important to note that the ETS audit program does not issue any type of certification or accreditation , but it does offer guidance on how to remedy specific instances of noncompliance with the ETS standards, a set of well documented and accepted standards for fair and valid testing. The purpose of the ETS Standards is to help ETS, as well as other organizations who use the standards, to develop and deliver technically sound, fair, and useful products and services. ETS offers an audit service, using the ETS Standards, to national and international institutions. Through this audit or evaluation service, ETS provides international institutions with objective feedback on their testing processes. A testing program such as SSAC that agrees to undergo the ETS audit process has agreed to be evaluated with respect to a uniform, rigorous set of standards through a well-documented process. It is important to note that the ETS audit program does not issue any type of certification or accreditation , but it does offer guidance on how to remedy specific instances of noncompliance with the ETS standards, a set of well documented and accepted standards for fair and valid testing.

26. Evaluation of the SSAC Entrance Examinations Evaluation focused on five key sets of standards Reliability Validity Cut scores, scaling, and equating Assessment development Assessment use For the evaluation of the SSAC entrance examinations, ETS and the SSAC agreed to focus on five key sets of standards, with each set related to one of the following five areas: validity, reliability, cut scores, scaling and equating, assessment development; and assessment use. What I’d like to do next is to briefly describe each of these sets of standards.For the evaluation of the SSAC entrance examinations, ETS and the SSAC agreed to focus on five key sets of standards, with each set related to one of the following five areas: validity, reliability, cut scores, scaling and equating, assessment development; and assessment use. What I’d like to do next is to briefly describe each of these sets of standards.

27. Purpose of the Validity Standards The purpose of the validity standards is to help ensure that a testing program gathers and documents evidence that supports the intended uses, inferences, and actions that may be based on the scores that are reported for the assessment. There are eight validity standards. The standards focus on aspects of the assessment such as how clearly the construct is defined, and what evidence is provided to support the validity of inferences that are made from the assessment scores. Read from slideRead from slide

28. Purpose of the Reliability Standards The purpose of the reliability standards is to help ensure that scores or other reported assessment results will be sufficiently reliable to meet their intended purposes and that the program is using appropriate procedures for determining and reporting reliability. There are six reliability standards that focus on the level of reliability of scores reported by the program, the quality and completeness of the information that is reported to score users, and the appropriateness of procedures used to determine reliability. Read from slideRead from slide

29. Purpose of the Cut Score, Scaling, and Equating Standards The purpose of the cut score, scaling, and equating standards is to help ensure that assessments will use score-reporting scales that are meaningful, that cut scores have been established rationally, that the procedures used are clearly described and that assessments that are meant to be linked to other assessments will have a level of comparability that supports the use of the test scores. There are four cut score, scaling and equating standards that focus on the types of scales, the procedures used to establish the scales, and the information provided to score users about the studies carried out to establish the scales and cut scores for the assessment. Read from slideRead from slide

30. Purpose of the Assessment Development Standards The purpose of the assessment development standards is to help ensure that assessments will be constructed using planned, documented processes that include advice from diverse people: formative and summative evaluations; and attention to fairness, reliability and validity. There are eight assessment development standards that focus on such things as the content of the assessment, the intended population, the development of test specifications, and the procedures used to write and review items and tests. Read from slideRead from slide

31. Purpose of the Assessment Use Standards The purpose of the assessment use standards is to help ensure that a testing program provides information that describes and encourages proper assessment use and warns intended users of assessment results to avoid common misuses of the assessment. There are six assessment use standards that focus on providing users with information that will help them interpret scores and avoid misuse of assessments results. Read from slide Next, I’d like to spend just a few minutes telling you about the results of our audit.Read from slide Next, I’d like to spend just a few minutes telling you about the results of our audit.

32. Results of SSAC Audit SSAC Entrance Examinations Program strong when evaluated against 5 sets of standards Key strengths SSAC Staff’s technical skills and openness Dr. Abbaszade’s leadership Our audit results showed that the SSAC entrance examinations program was very strong when evaluated against the five sets of standards that I’ve just described. I don’t have time this morning to describe all of the strengths of the program. Certainly the major strength of the program is the SSAC staff and Dr. Abbasazde’s leadership. The staff are very well trained and were very open in their discussions with us. Consequently, it was easy for us to work together to understand the strengths of the program and the areas that we needed to help them focus on for improvement. Also, Dr. Abbasazde’s leadership contributes very significantly to the strength of the program and contributed greatly to the success of the audit. Dr. Abbasazade’s willingness to embrace change and her proactive seeking out of ways to improve the entrance examinations are a tremendous asset to the testing program and helped make the audit a very productive process. Our audit results showed that the SSAC entrance examinations program was very strong when evaluated against the five sets of standards that I’ve just described. I don’t have time this morning to describe all of the strengths of the program. Certainly the major strength of the program is the SSAC staff and Dr. Abbasazde’s leadership. The staff are very well trained and were very open in their discussions with us. Consequently, it was easy for us to work together to understand the strengths of the program and the areas that we needed to help them focus on for improvement. Also, Dr. Abbasazde’s leadership contributes very significantly to the strength of the program and contributed greatly to the success of the audit. Dr. Abbasazade’s willingness to embrace change and her proactive seeking out of ways to improve the entrance examinations are a tremendous asset to the testing program and helped make the audit a very productive process.

33. Results of SSAC Audit Model of communications with the public Test preparation practices Test development practices Test assembly and item classification software Careful match with curriculum Diversity of contributors to test content Test security procedures There are several specific strengths of the program that I would like to mention. One strength is the model of communication that the entrance examinations have used to communicate with students, parents, teachers and other score users. The publication, the Abituriyent, which is produced a number of times during the year, is an excellent example of how to communicate clearly and often with the public about the tests and testing procedures. SSAC is also very careful to provide ample opportunity for examinees to become familiar with their tests before they actually take them for credit. They do this through a unique pretest program that administers tests on a weekly basis. We were impressed by a number of test development practices. The staff at SSAC have developed and use sophisticated test assembly and item classification software to assemble their tests and to evaluate the results of testing. In addition, the tests are very carefully matched to the curriculum and updated often to reflect changes in the curriculum. We were also impressed by the diversity of the contributors who establish the test content. SSAC conducts 17 seminars a year to gather input for the test content from content experts. Finally, the manner in which the SSAC protects the security of the test is quite extraordinary. SSAC staff realize that protecting the security of the tests is an important fairness issue and is key to the credibility of the test scores that they report and they have put extremely effective test security procedures in place. There are several specific strengths of the program that I would like to mention. One strength is the model of communication that the entrance examinations have used to communicate with students, parents, teachers and other score users. The publication, the Abituriyent, which is produced a number of times during the year, is an excellent example of how to communicate clearly and often with the public about the tests and testing procedures. SSAC is also very careful to provide ample opportunity for examinees to become familiar with their tests before they actually take them for credit. They do this through a unique pretest program that administers tests on a weekly basis. We were impressed by a number of test development practices. The staff at SSAC have developed and use sophisticated test assembly and item classification software to assemble their tests and to evaluate the results of testing. In addition, the tests are very carefully matched to the curriculum and updated often to reflect changes in the curriculum. We were also impressed by the diversity of the contributors who establish the test content. SSAC conducts 17 seminars a year to gather input for the test content from content experts. Finally, the manner in which the SSAC protects the security of the test is quite extraordinary. SSAC staff realize that protecting the security of the tests is an important fairness issue and is key to the credibility of the test scores that they report and they have put extremely effective test security procedures in place.

34. Recommendations for Improvement Simplification of current score reporting scale Study comparability of scores on some tests Document procedures used for scaling and setting cutscores Carry out cutscore studies Strengthen item writing and reviewing procedures Formalize and document procedures used to train item writers and reviewers In spite of the quality of the testing program, we were able to identify some areas for improvement which resulted in 16 specific recommendations for change or development. Again, because I have so little time today, I’m not able to go into much detail. Some of the more important recommendations that we made were associated with the equating, scaling and cutscore guidelines and with the test development guidelines. For example, we made recommendations related to the simplification of the current score reporting scale and also related to studying the comparability of scores on some of the tests to determine if equating procedures might be needed. We also recommended that procedures used for scaling and setting cutscores be documented and that, in the future, SSAC might consider carrying out additional studies to evaluate the properties of some of the cutscores that are currently used. Some examples of the types of recommendations we made for the test development process are recommendations that the item writing and reviewing procedures be strengthened and documented and that the procedures that are currently used to train item writers and reviewers be formalized and documented. I’m going to turn things over to Luis now, who will tell you about the action plans we developed jointly with SSAC and about the training we provided to help SSAC staff carry out some of these action plans.In spite of the quality of the testing program, we were able to identify some areas for improvement which resulted in 16 specific recommendations for change or development. Again, because I have so little time today, I’m not able to go into much detail. Some of the more important recommendations that we made were associated with the equating, scaling and cutscore guidelines and with the test development guidelines. For example, we made recommendations related to the simplification of the current score reporting scale and also related to studying the comparability of scores on some of the tests to determine if equating procedures might be needed. We also recommended that procedures used for scaling and setting cutscores be documented and that, in the future, SSAC might consider carrying out additional studies to evaluate the properties of some of the cutscores that are currently used. Some examples of the types of recommendations we made for the test development process are recommendations that the item writing and reviewing procedures be strengthened and documented and that the procedures that are currently used to train item writers and reviewers be formalized and documented. I’m going to turn things over to Luis now, who will tell you about the action plans we developed jointly with SSAC and about the training we provided to help SSAC staff carry out some of these action plans.

35. Action Plans and Implementation Luis Saldivia What I am going to do now is to give you a very brief overview of the ETS audit process, which is the process that we used to evaluate the SSAC entrance examinations, and then I will provide a very brief summary of the audit results.What I am going to do now is to give you a very brief overview of the ETS audit process, which is the process that we used to evaluate the SSAC entrance examinations, and then I will provide a very brief summary of the audit results.

36. Development of the Action Plans Prioritization of Technical Recommendations After we presented the results of our evaluation to the SSAC and other stakeholders, a meeting was held to develop an action plan for each of the recommendations. First, the recommendations were prioritized. Second, a separate action plan consisting of major milestones, start and end dates, responsibility for the milestone, and resources that would be needed to provided to accomplish each milestone was drafted for each recommendation. The first step undertaken during the action planning meeting was to place each of the 16 recommendations into one cell of a table by priority and by time period. Discussion among the stakeholders resulted in the recommendations being prioritized as follows.After we presented the results of our evaluation to the SSAC and other stakeholders, a meeting was held to develop an action plan for each of the recommendations. First, the recommendations were prioritized. Second, a separate action plan consisting of major milestones, start and end dates, responsibility for the milestone, and resources that would be needed to provided to accomplish each milestone was drafted for each recommendation. The first step undertaken during the action planning meeting was to place each of the 16 recommendations into one cell of a table by priority and by time period. Discussion among the stakeholders resulted in the recommendations being prioritized as follows.

37. Action Plans Specifying Critical Elements for Each Recommendation What are the major milestones? Who is responsible for accomplishing them? What are the start and end times? Where do the resources come from? After priorities and time periods were considered, the following elements for each recommendation were determined:After priorities and time periods were considered, the following elements for each recommendation were determined:

38. Action Plans

39. 2007 Activities Two main types of activities were conducted: workshops providing overviews of three main topic areas related to the recommendations of the action plans: Scaling and equating Cutscores Test development consultations about the feasibility of implementing the best practices reviewed in the workshops In May, 2007 the ETS team returned to Baku for a period of five days in order to provide a series of training workshops for the staff of SSAC and to provide consultation on the implementation of several of the high-priority activities identified in the 2006 work plan In May, 2007 the ETS team returned to Baku for a period of five days in order to provide a series of training workshops for the staff of SSAC and to provide consultation on the implementation of several of the high-priority activities identified in the 2006 work plan

40. Equating and Scaling Two parts Training Workshop on Equating and Scaling Consultation on accomplishing Tasks #8 and #10 from the Work Plan Carrying out a research study to ensure that the difficulty levels of the variants are comparable Carrying out a research study to compare the difficulty levels of the pretest and the operational test The equating and scaling consultation provided to staff from the State Students Admission Commission (SSAC) had two parts. The equating and scaling consultation provided to staff from the State Students Admission Commission (SSAC) had two parts.

41. Equating and Scaling The specific topics covered in the Equating and Scaling Workshop were: Why is equating necessary? Definitions of score equating Classical Test Theory methods for score equating Data collection designs Equating models Equating item statistics

42. Accomplishment At the end of two days of instruction, SSAC staff had a working knowledge of scaling and equating procedures The workshop allowed the SSAC staff to focus on planning next steps for the SSAC entrance examinations These procedures included choosing and implementing an appropriate data collection design as well as choosing and implementing an appropriate scaling or equating modelThese procedures included choosing and implementing an appropriate data collection design as well as choosing and implementing an appropriate scaling or equating model

43. Cutscores A workshop with four goals: define standard setting describe different types of standards review the steps used when setting standards on educational tests describe ways to evaluate the validity of standards The goals of the workshop were fourfold. The first goal was to define standard setting; that is, to clarify that the purpose of setting cutscores is to determine who will pass a test vs. fail it, or obtain a particular level of achievement on a test. The second goal was to describe different types of standards, including those that are relative, or norm-referenced, in which the scores of other examinees determine where a particular individual’s score will result in her passing the test or failing it; and those that are absolute, or criterion-referenced, in which levels of mastery of content are the primary consideration. The third goal of the workshop was to review the steps used when setting standards on educational tests. This included an overview of the most commonly-used standard setting methods. The final goal of the workshop was to describe ways to evaluate the validity of standards. After the standard-setting session has been completed, the critical activities of evaluating the proceedings must begin, and different approaches to doing so were reviewed.The goals of the workshop were fourfold. The first goal was to define standard setting; that is, to clarify that the purpose of setting cutscores is to determine who will pass a test vs. fail it, or obtain a particular level of achievement on a test. The second goal was to describe different types of standards, including those that are relative, or norm-referenced, in which the scores of other examinees determine where a particular individual’s score will result in her passing the test or failing it; and those that are absolute, or criterion-referenced, in which levels of mastery of content are the primary consideration. The third goal of the workshop was to review the steps used when setting standards on educational tests. This included an overview of the most commonly-used standard setting methods. The final goal of the workshop was to describe ways to evaluate the validity of standards. After the standard-setting session has been completed, the critical activities of evaluating the proceedings must begin, and different approaches to doing so were reviewed.

44. Accomplishment We reviewed that standard setting involves determining performance levels on a test We discussed the cutscores on the SSAC tests Two different types of standards were reviewed—relative and absolute We discussed the types of standards that exist for SSAC tests We discussed the advantages and disadvantages of relative and absolute cutscores, both in general and in reference to the SSAC context The process of setting a cutscores on a testThe process of setting a cutscores on a test

45. Accomplishments An additional topic reviewed was that of errors of classification SSAC staff were taken step by step through the procedures generally used in a standard setting study conducted in order to set criterion-referenced cutscores The details of several different standard setting methods were discussed We reviewed the criteria that can be evaluated in order to determine whether a cutscore is valid for a given test use 3) Two test-centered methods were discussed—the Angoff method and the Bookmark method in addition two examinee-centered methods were discussed—the Borderline Group method and the Contrasting Groups method. 3) Two test-centered methods were discussed—the Angoff method and the Bookmark method in addition two examinee-centered methods were discussed—the Borderline Group method and the Contrasting Groups method.

46. Test Development The first goal was to describe the principles of effective item writing that have solid basis in the research literature The second goal was to discuss the steps that typically must be accomplished in the development of sound educational measurements The third and last goal was to introduce the SSAC test development staff to some of the practices applied when tests are developed under Evidence Centered Design. The nature and quality of test items is critical to the development of tests that can be validly interpreted and used. To achieve the appropriate quality of the items, it is essential to train item writers in recognized principles of effective item writing. We intended to train the SSAC staff in those principles so they can then use them to train current and future item writers Having an awareness of all the steps involved in the test development process helps test developers to establish a framework and an organization for their work The nature and quality of test items is critical to the development of tests that can be validly interpreted and used. To achieve the appropriate quality of the items, it is essential to train item writers in recognized principles of effective item writing. We intended to train the SSAC staff in those principles so they can then use them to train current and future item writers Having an awareness of all the steps involved in the test development process helps test developers to establish a framework and an organization for their work

47. Accomplishments SSAC test developers were presented with an overview of the test development process The definition of the assessment. Writing and reviewing questions. Pretesting items. Evaluating the results. Assembling operational forms. Conducting post-test activities. We discussed the importance for test developers to have a clear definition of the purpose of a test, as well as of the intended uses and inferences to be made from test scores. Since items should be developed to respond to the purposes of the test, we emphasized the importance of the purpose of the test in guiding the test development activities. We discussed the steps to follow when test developers have a set of specifications, a pool of items, and the assignment to assemble a test. At this stage, the assembler concerns are essentially of two types: 1) those related to meeting the content specifications and 2) those related to meeting the statistical specifications. Which of these should have a priority is an open question depending on the purpose of the test. In practice, compromise is often necessary and it is not uncommon to have to give slightly more weight to one or the other, depending upon the circumstances. The test needs to adhere to a detailed set of content specifications and to a set of statistical specifications, such as mean difficulty level, standard deviation of difficulties, and a level of discrimination. A set number of items for each difficulty level may even be specified. Moreover, the equating block in the final form, if used, must reflect the content and the statistical specifications too. We also discussed testing time issues. The extent to which the candidates may not have sufficient time to respond to the items in the test is another factor that enters into the assembly of a test. Speededness may reduce the number of items available for future equating blocks. Although many of the discussed details of assembling tests into final forms might seem routine and mundane, we emphasized the fact that the product of this step is the test that the examinees will encounter. Test assembly errors reduce validity by introducing construct irrelevant variation to the test and might lead to the invalidation of the score for some test items, which potentially reduces the content-based validity evidence for the test. We pointed out that the test assembler assumes the responsibility for the quality of the test. We discussed the importance for test developers to have a clear definition of the purpose of a test, as well as of the intended uses and inferences to be made from test scores. Since items should be developed to respond to the purposes of the test, we emphasized the importance of the purpose of the test in guiding the test development activities. We discussed the steps to follow when test developers have a set of specifications, a pool of items, and the assignment to assemble a test. At this stage, the assembler concerns are essentially of two types: 1) those related to meeting the content specifications and 2) those related to meeting the statistical specifications. Which of these should have a priority is an open question depending on the purpose of the test. In practice, compromise is often necessary and it is not uncommon to have to give slightly more weight to one or the other, depending upon the circumstances. The test needs to adhere to a detailed set of content specifications and to a set of statistical specifications, such as mean difficulty level, standard deviation of difficulties, and a level of discrimination. A set number of items for each difficulty level may even be specified. Moreover, the equating block in the final form, if used, must reflect the content and the statistical specifications too. We also discussed testing time issues. The extent to which the candidates may not have sufficient time to respond to the items in the test is another factor that enters into the assembly of a test. Speededness may reduce the number of items available for future equating blocks. Although many of the discussed details of assembling tests into final forms might seem routine and mundane, we emphasized the fact that the product of this step is the test that the examinees will encounter. Test assembly errors reduce validity by introducing construct irrelevant variation to the test and might lead to the invalidation of the score for some test items, which potentially reduces the content-based validity evidence for the test. We pointed out that the test assembler assumes the responsibility for the quality of the test.

48. Accomplishments In the second part of the workshop, we discussed different item review processes In the third part of the workshop, we discussed the importance of training the item writers. each of them aimed to produce a test that conforms to professional and organizational quality standards. During this part of the discussion we analyzed the current SSAC item review process and we suggested some areas where the process could be improved. We discussed the minimum standards, below which SSAC should not go, when developing a test. We also discussed the importance of documenting the procedures followed during the item review process. we presented and discussed a definition of an item which emphasizes the compromise between the content and the psychometric purposes of the item. We discussed the value of the item as a unit of measurement and its function for inferring the existence of a construct and the relative degree to which knowledge of the construct might be exhibited by a particular examinee. Then, we discussed some criteria for developing good quality items. In particular, a description was given of why the skilled item constructor must possess a deep understanding of how examinees respond to tests items, including awareness of how a particular test item may be perceived by different examinees. Then, we discussed the necessity of trying to minimize measurement errors. We talked about statistical techniques to detect bias, but also about judgmental approaches to detect bias. We then discussed the psychometric assumptions items should meet. At the end of this part of the workshop, we discussed some examples of graphical item analysis.each of them aimed to produce a test that conforms to professional and organizational quality standards. During this part of the discussion we analyzed the current SSAC item review process and we suggested some areas where the process could be improved. We discussed the minimum standards, below which SSAC should not go, when developing a test. We also discussed the importance of documenting the procedures followed during the item review process. we presented and discussed a definition of an item which emphasizes the compromise between the content and the psychometric purposes of the item. We discussed the value of the item as a unit of measurement and its function for inferring the existence of a construct and the relative degree to which knowledge of the construct might be exhibited by a particular examinee. Then, we discussed some criteria for developing good quality items. In particular, a description was given of why the skilled item constructor must possess a deep understanding of how examinees respond to tests items, including awareness of how a particular test item may be perceived by different examinees. Then, we discussed the necessity of trying to minimize measurement errors. We talked about statistical techniques to detect bias, but also about judgmental approaches to detect bias. We then discussed the psychometric assumptions items should meet. At the end of this part of the workshop, we discussed some examples of graphical item analysis.

49. Accomplishments In the next part of the workshop, we discussed the guidelines for developing good multiple-choice items We described the guidelines for correctly reviewing items We described the purpose of the fairness review We also discussed the importance of including an editorial review early in the item review process We described the very good reasons why multiple choice items have traditionally been used in high stakes tests. A total of 53 guidelines and many examples for writing good items were described and discussed Because the person who writes the item has great difficulty seeing all the flaws in the item, and even experienced item writers will occasionally produce grossly flawed items or often produce items that can be improved by a good reviewer, we described the guidelines for correctly reviewing items not only to catch errors in bad items but also to suggest improvements in acceptable items. A checklist with 38 questions to be answered while reviewing items was provided and discussed. which is to identify any construct irrelevant factors that might plausibly prevent the members of a group of examinees from responding to the test in ways that allow appropriate inferences about the examinee’s knowledge, skills, and abilities We described the very good reasons why multiple choice items have traditionally been used in high stakes tests. A total of 53 guidelines and many examples for writing good items were described and discussed Because the person who writes the item has great difficulty seeing all the flaws in the item, and even experienced item writers will occasionally produce grossly flawed items or often produce items that can be improved by a good reviewer, we described the guidelines for correctly reviewing items not only to catch errors in bad items but also to suggest improvements in acceptable items. A checklist with 38 questions to be answered while reviewing items was provided and discussed. which is to identify any construct irrelevant factors that might plausibly prevent the members of a group of examinees from responding to the test in ways that allow appropriate inferences about the examinee’s knowledge, skills, and abilities

50. Accomplishments In the last part of the workshop we briefly discussed some of the work that test developers do when using Evidence Centered Design (ECD) In particular we discussed how ECD provides a strong foundation for the validity argument by requiring documented, explicit linkage among the claims to be made about the examinees, the evidence supporting those claims, and the examinees responses to items that provide the evidence. We also discussed how the documentation requirement of ECD helps increase the clarity of communication among the many people who run a complicated testing program.In particular we discussed how ECD provides a strong foundation for the validity argument by requiring documented, explicit linkage among the claims to be made about the examinees, the evidence supporting those claims, and the examinees responses to items that provide the evidence. We also discussed how the documentation requirement of ECD helps increase the clarity of communication among the many people who run a complicated testing program.

52. Decision-making environment

59. Think globally, act locally

  • Login