Diapositiva 1
1 / 72

Measuring Success in English for Young People - PowerPoint PPT Presentation

  • Uploaded on

Measuring Success in English for Young People. Annabelle G. Simpson Director, Channel Management, ETS Global Division. Outline. Who is ETS? Two Families of Products: TOEFL® & TOEIC® How does ETS develop quality tests? What is TOEIC® Bridge? What is TOEFL® Junior?. ETS: Our Mission

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Measuring Success in English for Young People' - chung

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Diapositiva 1

Measuring Success in English for Young People

Annabelle G. SimpsonDirector,Channel Management, ETS Global Division

Diapositiva 1


  • Who is ETS?

  • Two Families of Products: TOEFL® & TOEIC®

  • How does ETS develop quality tests?

  • What is TOEIC® Bridge?

  • What is TOEFL® Junior?

Diapositiva 1

ETS: Our Mission

To Advance Quality and Equity in Education for All People Worldwide

  • We do this by providing:

  • Fair, valid and reliable assessments

  • Education research

  • Products and services that measure knowledgeand skills, promote learning and educational performance and support education and professional development

Diapositiva 1

Two Families of English Assessments: TOEFL® & TOEIC®



  • TOEFL Junior TOEIC Bridge

  • Coming soon….

Diapositiva 1

The Origins of ETS Work with Young People

  • English proficiency is an increasingly important skill for

  • students and young adults worldwide

    • - Expanding access to educational, personal and professional opportunities

  • EFL instruction is beginning at earlier ages

  • English-medium instructional environments take many forms

  • internationally:

    • - Public and private schools in English-dominant countries

    • - International schools in non English-dominant countries

    • - Schools in any country using bilingual or CLIL approaches

    • - Vocational schools

  • Responds to aspirations of students as they attain

  • English-language proficiency

Diapositiva 1


  • Before discussing how ETS develops quality tests, I will discuss what we mean by “quality” in testing.

  • Then I will discuss the major steps in test development that are required to create a high quality test.

Diapositiva 1

What Is a Quality Test?

  • A quality test must be

  • Reliable

  • Valid

  • Fair

  • Practical

Diapositiva 1


  • A test is only a sample.

  • The items are a sample of all the items that could be asked.

  • The time of testing is a sample of all the times that the test could be given.

  • The person scoring the essay is a sample of all possible scorers.

Diapositiva 1

Reliability Is Consistency

  • If test taker’s knowledge is constant, how consistent would scores be if samples changed and parallel items were used?

  • The test was taken on a different day?

  • Different judges were used for scoring essays?

  • The higher the reliability, the more consistent the scores will be.

Diapositiva 1

Factors That Determine Reliability

  • All other things being equal,

  • the more independently scored items, the higher the reliability

  • the more the items correlate with each other, the higher the reliability

  • the greater the variability of scores, the higher the reliability

Diapositiva 1


  • Most important indicator of test quality

  • Extent to which inferences based on test scores are appropriate & supported by evidence

  • Requires evidence to support the use of the test for the intended purpose

Diapositiva 1

Evidence of Validity

  • Qualifications of test designers

  • Process used to develop test

  • Qualifications of item writers and reviewers

  • Statistical indicators of item quality and fairness

  • Expert judgments of test content

Diapositiva 1

Evidence of Validity

  • Match of items to content standards

  • Relations among parts of the test

  • Relations of scores with other variables

  • Results fit with theories

  • Claims for use of test are met

  • Good consequences

Diapositiva 1

Fairness = Validity for All

  • Fairness is an aspect of validity.

  • Tests that show valid differences across groups are fair.

  • Tests that cause invalid differences across groups are not fair.

Diapositiva 1


  • Tests must be affordable in dollar costs and in time used.

  • Scores must be understandable & helpful to score-users.

  • Items must be acceptable to diverse constituencies.

  • Every test is a compromise among competing demands.

Diapositiva 1

Major Steps in Test Development

  • 1) Make Initial Plan for Test

  • 2) Involve External Experts

  • 3) Write/Review Items

  • 4) Pretest Items (Whenever Possible)

  • Review Data & Revise Items

  • Assemble Final Test

Diapositiva 1

Major Steps (continued)

  • 7) Administer Tests

  • 8) Checks Before Scoring

  • 9) Scaling & Equating

  • 10) Test Analyses

  • 11) Report Scores

  • 12) Begin Planning for Next Form

Diapositiva 1

1) Plan Test

  • Purpose

    • What is test used for?

    • What decisions made on the basis of the scores?

  • Population

    • What are characteristics of test takers?

  • Construct

    • Content & skills

Diapositiva 1

Plan Test

  • What constraints on test design?

    • Time, cost, format, scoring, etc.

  • Initial plan for test development work

    • Major tasks, schedule, staff

  • Evidence-Centered Design

    • What claims about test takers?

    • What evidence supports claims?

    • What tasks provide evidence?

Diapositiva 1

2) Involve External Experts

  • Diverse (demographic, geographic, institutional, point of view) external contributors required in test design, item writing and reviewing.

  • Diverse experts help establish acceptability, validity and fairness.

Diapositiva 1

Tasks of External Experts

  • Set/approve test specifications

    • What content to measure?

    • What skills to measure?

    • What statistical properties?

  • Write and review test items

  • Select items for final form

Diapositiva 1

3) Write/Review items

  • Make item-writing assignments

    • Write items to meet specifications

    • Write overage for attrition

  • Internal & external reviews & revisions

    • At least 2 independent content reviews per item

    • Separate editorial review

    • Separate fairness review

Diapositiva 1

3) Write/Review items

  • Question (Item ) Author

    • Artwork/graphics

  • Content Reviewer 1

  • Content Reviewer 2

  • Content Reviewer 3

  • Edit

  • Fairness

  • Resolver

    • Studio recording

  • Lock

Diapositiva 1

4) Pretest

  • When possible, try out items before operational use. Gives information to :

  • Identify problem items (ambiguous, wrong difficulty, poor discrimination. For MC: no key, multiple keys, bad distracter)

  • Pick most appropriate items to meet specifications

  • Estimate final form characteristics from item data

Diapositiva 1

Use Differential Item Functioning (DIF)

  • DIF = statistical measure of how matched people in different groups perform on an item.

  • DIF helps spot items that may be unfair.

  • DIF is NOT proof of bias.

Diapositiva 1

Uses of DIF

  • If data available, tests assembled with low DIF items.

  • If no data at assembly, DIF calculated after administration.

  • High DIF items reviewed and removed before test is scored, if judged unfair.

  • External people involved in reviews.

Diapositiva 1

5) Review Data & Revise Items

  • Review test items based on data

    • Ensure accuracy, clarity

    • Appropriate difficulty

    • Acceptable discrimination

  • Revise or drop problem items

  • Write new items if necessary to meet specifications

Diapositiva 1

6) Assemble Final Test

  • Choose set of items from pool according to specifications

  • Perform test reviews

    • Meet content, skill, & statistical specifications

    • Check for overlap, cueing of keys

    • Correctness of keys

Diapositiva 1

7) Test Administration

  • Print or format for computer

  • Quality control checks

  • Ship securely

  • Administer test

    • Acceptable conditions (space, comfort, light, temperature)

    • Security (copying, impersonation, prior knowledge)

Diapositiva 1

8) Checks Before Scoring

  • Investigate complaints & reports

  • Preliminary Item Analysis (PIA)

    • Identify “problem” items based on statistics (too hard, too easy, poor discrimination, change from pretest)

    • Review items to decide if keep in test or drop before scoring

  • DIF, if not done previously

Diapositiva 1

Checks Before Scoring

  • Check for anomalies (sudden drops or increases in scores) that may indicate problems

Diapositiva 1

9) Scaling & Equating

  • Raw scores are number right or percent right on a particular test form.

  • 50% right on a hard test form may take more knowledge & skill than 60% right on an easy test form.

  • Raw scores mean different things on different test forms.

  • ETS very rarely reports raw scores

Diapositiva 1

Scaling & Equating

  • Scaling is arbitrary range of numbers used to report scores. e.g., 200-800 for SAT, 150-190 for PPST.

  • Equating is a statistical adjustment for differences in the difficulty of different forms of the same test.

  • Equating allows us to treat the scores on different forms of a test as though they meant the same thing.

Diapositiva 1

Scaling & Equating

  • If a form happens to be a little harder than the others, it will take fewer raw score points to reach a particular scale score point.

  • If a form happens to be a little easier than the others, it will take more raw score points to reach a particular scale score point.

  • Scaled scores, after equating, mean the same on each form

Diapositiva 1

10) Test Analyses

  • Analysis of final form characteristics.

  • Distribution of item difficulty & discrimination

  • Reliability

  • Speededness

  • Did test meet content & statistical specifications? If not, where were problems?

Diapositiva 1

11) Report Scores

  • Explain what scores mean so scores are understandable to test users

  • Indicate Standard Error of Measurement on score report

Diapositiva 1

12) Plan Next Form

  • What was learned from this administration to make the next administration of the test better?

  • What has to change for next form?

Diapositiva 1

About TOEFL®Junior™

Diapositiva 1

A TOEFL® product for a Younger Generation

  • A distinct product within the growing TOEFL® family of products

  • A natural extension of the TOEFL brand, but specifically geared to the language learning needs of middle grade students

    • - Informed by reviews of research and relevant standards

    • - Based on years of experience developing international assessments of English language proficiency for both adults and K12 students

  • Meets ETS Standards for Quality and Fairness

  • Builds upon ETS’s expertise in English language assessment for young learners.

  • TOEFL® products set the standard for English proficiency worldwide

  • Diapositiva 1

    The Paper-Based Test is designed to provide useful Information

    • Purpose is to assess the degree to which students aged 11-15 have attained language proficiency representative of middle school English-medium instruction

    Diapositiva 1

    TOEFL Junior Structure

    • Format:

    • Paper

    • Three Sections:

      • Listening

      • Reading

      • Language Form and Meaning

    Diapositiva 1

    TOEFL Junior Structure

    • Listening Comprehension:

      • This section tests how well students understand spoken English.

      • Number of Questions: 42

      • Section administered by CD. Students are asked to answer questions based on a variety of statements, questions, conversations and talks recorded in English.

      • Total time: approximately 35–40 minutes.

    • Question Types

      • Classroom Instruction

      • Short Conversations

      • Academic Listening

    Diapositiva 1

    Sample Listening Item

    • (Narrator): Listen to a high school principal talking to the school’s students.

    • (Man): I have a very special announcement to make. This year, not just one, but three of our students will be receiving national awards for their academic achievements. Krista Conner, Martin Chan, and Shriya Patel have all been chosen for their hard work and consistently high marks. It is very unusual for one school to have so many students receive this award in a single year.

    •  (Narrator): What is the subject of the announcement?

    • What is the subject of the announcement?

    • (A) The school will be adding new classes.

    • (B) Three new teachers will be working at the school.

    • (C) Some students have received an award.

    • (D) The school is getting its own newspaper.

    Diapositiva 1

    TOEFL Junior PBT Structure

    • Reading Comprehension:

      • - This section tests how well students read and comprehend written English. Students read a variety of materials.

      • - Number of Questions: 42 questions.

      • Total time: 50 minutes.

    • Question Types

      • - Non-academic

      • - Academic

    Diapositiva 1

    Sample Reading Item

    • Questions are about the following announcement.

    What time will the festival begin?

    (A) 10 a.m.

    (B) 11 a.m.

    (C) 1 p.m.

    (D) 2 p.m.

    Diapositiva 1

    TOEFL Junior PBT Structure

    • Language Form and Meaning:

      • This section assesses key language skills such as grammar and vocabulary in context.

      • The section includes 42 questions.

      • Total time: approximately 25 minutes.

    • Question Types:

      • Language Meaning

      • Language Form

    Diapositiva 1

    Sample Language Form and Meaning Item

    • Questions - refer to the following e-mail.

    Diapositiva 1

    Score Report

    • Section scores for Listening, Language Form and Meaning, and Reading

      • Section Scale Scores

      • Listening Comprehension 200-300

      • Language Form & Meaning 200-300

      • Reading Comprehension 200-300

      • Total Score600-900

    • The TOEFL Junior score report provides a description of the English-language abilities typical of test takers scoring around a particular scaled score level. There are four possible descriptions for each section of the test

    • Link to the Common European Framework of Reference

    • Lexile measure

    Diapositiva 1

    Listening Descriptions

    • Test takers who score between 210 and 245 may have the following strengths:

      • They can understand the main idea of a brief classroom announcement if it is explicitly stated.

      • They can understand important details that are explicitly stated and reinforced in short talks and conversations.

      • They can understand direct paraphrases of spoken information when the language is simple and the context is clear.

      • They can understand a speaker’s purpose in a short talk when the language is simple and the context is clear.

    Diapositiva 1

    Common European Framework of Reference for Languages (CEFR)

    Important Note: CEFR levels are context-dependent.

    A B2 for middle school is not the same as a B2 for adults.

    Diapositiva 1

    Appropriate Use of the TOEFL® Junior Test

    • Appropriate for low- to medium-stakes decisions

    • Provides a general standard to measure proficiency levels

    • of proficiency of students aged 11-15 representative of

    • English-medium instructional environments

    • Serves as one piece of information supporting placement

    • into programs designed to increase proficiency levels of

    • these EFL students

    • Provides information about student progress in developing

    • English language proficiency over time

    Diapositiva 1

    The TOEFL® Junior Test is NOT…

    • …based on any specific curriculum

    • …directly linked to TOEFL iBT scores

    • …intended to predict performance on the TOEFL iBT test

    • …for use to support high-stakes decisions such as for

    • admissions purposes or criterion-based exit testing

    • …a substitute for TOEFL iBT, TOEFL pBT or TOEFL ITP

    Diapositiva 1

    Participating Countries

    • Latin America

      • Brazil, Chile

    • Asia

    • - China, Indonesia, Japan, Korea, Vietnam

    • Europe

      • Bulgaria, France, Greece, Italy, Poland, Turkey

    • Middle East

      • Egypt. Gaza/West Bank, Lebanon, Morocco

    Diapositiva 1

    The TOEICBridge™ Test

    Diapositiva 1

    What is the TOEIC Bridge™ Test?

    • A test to measure the emerging competencies of beginning learners of English

    • A tool to help language learners focus on areas for improvement

    Diapositiva 1

    Why use the TOEIC Bridge™ Test?

    • To measure beginner English proficiency

    • To motivate English Language Learners

    • To set language learning goals

    Diapositiva 1

    How is the TOEIC Bridge™ Test different from the TOEIC®

    Listening and Reading Test?

    • The TOEIC Bridge™ test takes only one hour. The TOEIC®

    • test takes two hours.

    • There are 100 questions in the TOEIC Bridge™ test, 200 in

    • the TOEIC® test.

    • The TOEIC Bridge™ has only five parts, the TOEIC® test has

    • seven parts.

    • There is more time between questions in the TOEIC

    • Bridge™ test.

    • In the TOEIC Bridge™ test, the speakers speak more slowly.

    Diapositiva 1


    • TOEIC Bridge™ test questions are easier.

    • TOEIC Bridge™ test questions cover more general topics.

    • The scaled score range on the TOEIC Bridge™ is from 20 to 180; on the TOEIC® test, scores are on a scale of 10 to 990.

    • The TOEIC Bridge™ test is a low-stakes test; the TOEIC® test is a high-stakes test.

    Diapositiva 1

    Test Format

    • Two sections:

    • Section I: Listening Comprehension – Candidates listen to a variety of statements, questions, short conversations, and short talks, and answer 50 questions.(tape mediated)

    • Three Parts:

      • Photo-based (15 questions)

      • Question-Answer (20 questions)

      • Conversations and Short Talks

    • Section II: Reading Comprehension – Candidates read single sentences as well as texts and answer 50 comprehension questions.

      • Two Parts:

        • Incomplete sentences (30 questions)

        • Reading Comprehension (20 questions)

    Diapositiva 1

    TOEIC Bridge™ Content Areas

    • Animals

    • Basic objects

    • Clothing

    • Dates/days/time

    • Entertainment

    • Family members

    • Food/dining out

    • Games

    • Health

    • Housing/residence

    • Measurement

    • Money

    • Months

    • Music

    • Numbers

    • Recreation/hobbies

    • School subjects

    • Shopping

    • Sports

    • Travel/transportation

    • Weather

    • Work

    Diapositiva 1


    • Total scores range from 20 - 180

    • Listening and Reading subscores range from 10 – 90

    • Test administration time is approximately 1.5 hours

    • Test scoring – (under operational conditions) 24-48 hours in most locations

    Diapositiva 1

    • The scores are based on the number of correct responses.

    • The correct responses in each section (Listening and Reading) are converted to a score scale. The range of the scale is from 10 – 90 for each section.

    • Summing the scores of the sections produces a total scaled score. The range of the total score is then 20 – 180.

    Diapositiva 1

    CEFR Ratings

    • The TOEIC Bridge test ranges from the A1 level to the B1 level.

    Diapositiva 1

    For Sample Test Questions for TOEFL Junior and TOEIC Bridge:


    • http://www.ets.org/toeicbridge