Using on line tutoring records to predict end of year exam scores
Download
1 / 34

Using On-line Tutoring Records to Predict End-of-Year Exam Scores - PowerPoint PPT Presentation


  • 144 Views
  • Uploaded on

Using On-line Tutoring Records to Predict End-of-Year Exam Scores. Experience with the Assistments Project and MCAS 8th Grade Mathematics. Neil Heffernan, Ken Koedinger, Brian Junker with Mingyu Feng, Beth Ayers, Nathaniel Anozie, Zach Pardos, and many others http://www.assistment.org.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using On-line Tutoring Records to Predict End-of-Year Exam Scores' - raanan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using on line tutoring records to predict end of year exam scores l.jpg

Using On-line Tutoring Records to Predict End-of-Year Exam Scores

Experience with the Assistments Project and MCAS 8th Grade Mathematics

Neil Heffernan, Ken Koedinger, Brian Junker

with Mingyu Feng, Beth Ayers, Nathaniel Anozie, Zach Pardos, and many others

http://www.assistment.org

Funding from US Department of Education, National Science Foundation (NSF), Office of Naval Research, Spencer Foundation, and the US Army

2006 MSDE / MARCES Conference


The assistments project l.jpg
The ASSISTments Project Scores

  • Web-based 8th grade mathematics tutoring system

  • ASSIST with, and ASSESS, progress toward Massachusetts Comprehensive Assessment System Exam (MCAS)

    • Guide students through problem solving with MCAS released items

    • Predict students’ MCAS scores at end of year

    • Provide feedback to teachers (what to teach next?)

  • (Generalize to other States…)

  • Over 50 workers at Carnegie Mellon, Worcester Polytechnic Institute, Carnegie Learning, Worcester Public Schools

2006 MSDE / MARCES Conference


The assistment tutor l.jpg
The ASSISTment Tutor Scores

  • Main Items: Released MCAS or “morphs”

  • Incorrect Main “Scaffold” Items

    • “One-step” breakdowns of main task

    • Buggy feedback, hints on request, etc.

  • All items coded by source, teacher-generated “curriculum”, and transfer model (Q-matrix)

  • Student records contain responses, timing data, bugs/hints, etc.

  • System tracks students through time, provides teacher reports per student & per class.

    • MCAS prediction

    • Skills learned/not-learned, etc.

2006 MSDE / MARCES Conference


The assistment architectures l.jpg
The ASSISTment Architectures Scores

  • Extensible Tutor Architecture

    • Scalable from simple pseudo-tutors with few users to model-tracing tutors and 1000’s of users

    • Curriculum Unit

      • Items organized into multiple curricula

      • Sections within curriculum: Linear, Random, Experimental, etc.

    • Problem & Tutoring Strategy Units

      • Task organization & user interaction (e.g. main item & scaffolds, interface widgets, …)

      • Task components mapped to multiple transfer models

    • Logging Unit

      • Fine-grained human-computer interaction trace

      • Abstracting/coarsening mechanisms

  • Web-based Item Builder

    • Used by classroom teachers to develop content

    • Support for building curricula, mapping tasks to transfer models, etc.

  • Relational Database and Network Architecture supports

    • User Reports (e.g., students, teachers, coaches, administrators)

    • Research Data Analysis

  • Razzaq et al. (to appear) overview

2006 MSDE / MARCES Conference


Goals and complications l.jpg
Goals and Complications Scores

  • Two Assessment Goals

    • To predict end-of-year MCAS scores

    • To provide feedback to teachers (what to teach next?)

  • Some Complications

    • Assessment ongoing throughout the school year as students learn (from teachers & from ASSISTments!)

    • Multiple skills models for different purposes

    • Scaffold questions: For tutoring or for measurement?

    • Deliberate ready-fire-aim user-assisted development

2006 MSDE / MARCES Conference


2004 2005 data l.jpg
2004-2005 Data Scores

  • Tutoring tasks

    • 493 main items

    • 1216 scaffold items

  • Students

    • 912 eighth-graders in two middle schools

  • Skills Models (Transfer Models / Q Matrices)

    • 1 “Proficiency”: Unidimensional IRT

    • 5 MCAS “strands”: Number/Operations, Algebra, Geometry, Measurement, Data/Probability

    • 39 MCAS learning standards: nested in the strands

    • 77 active skills: “WPI April 2005” (106 potential)

2006 MSDE / MARCES Conference


Static prediction models l.jpg
Static Prediction Models Scores

  • Feng et al. (2006 & to appear):

    • Online testing metrics

      • Percent correct on main/scaffold/both items

      • “assistance score” = (errors+hints)/(number of scaffolds)

      • Time spent on (in-)correct answers

      • etc.

    • Compare paper & pencil pre/post benchmark tests

  • Ayers and Junker (2006):

    • Rasch & LLTM (linear decomps of item difficulty)

    • Augmented with online testing metrics

  • Pardos et al. (2006); Anozie (2006):

    • Binary-skills conjunctive Bayes nets

    • DINA models (Junker & Sijtsma, 2001; Maris, 1999; etc.)

2006 MSDE / MARCES Conference


Static models feng et al 2006 to appear l.jpg
Static Models: ScoresFeng et al. (2006 & to appear)

  • What is related to raw MCAS (0-54 pts)?

  • P&P pre/post benchmark tests

  • Online metrics:

    • Pct Correct on Mains

    • Pct Correct on Scaffolds

    • Seconds Spent on Incorrect Scaffolds

    • Avg Number of Scaffolds per Minute

    • Number of Hints Plus Incorrect Main Items

    • etc.

  • All annual summaries

2006 MSDE / MARCES Conference


Static models feng et al 2006 to appear9 l.jpg
Static Models: ScoresFeng et al. (2006 & to appear)

  • Stepwise linear regression

  • Mean Abs Deviation

  • Within-sample

    MAD = 5.533

  • Raw MCAS = 0-54, so

    Within-sample Pct Err = MAD/54 =10.25%

    (uses Sept P&P Test)

2006 MSDE / MARCES Conference


Static models ayers junker 2006 l.jpg
Static Models: ScoresAyers & Junker (2006)

  • Compared two IRT models on ASSISTment main questions:

    • Rasch model for 354 main questions.

    • LLTM: Constrained Rasch model decompose main question difficulty by skills in the WPI April Transfer Model (77 skills).

  • Replace “Percent Correct” with IRT proficiency score in linear predictions of MCAS

2006 MSDE / MARCES Conference


Static models ayers junker 200611 l.jpg
Static Models: ScoresAyers & Junker (2006)

  • Rasch fits much better than LLTM

    • BIC = -3,300

    • df = +277

  • Attributable to

    • Transfer model?

    • Linear decomp of item difficulties?

  • Residual and difficulty plots suggest transfer model fixes.

2006 MSDE / MARCES Conference


Static models ayers junker 200612 l.jpg
Static Models: ScoresAyers & Junker (2006)

  • Focus on Rasch, predict MCAS with

where  = proficiency, Y=online metric

  • 10-fold cross-validation vs. 54-pt raw MCAS:

2006 MSDE / MARCES Conference


Static models pardos et al 2006 anozie 2006 l.jpg
Static Models: ScoresPardos et al. (2006); Anozie (2006)

Conjunctive binary-skills Bayes Net (Maris, 1999; Junker & Sijtsma DINA, 2001; etc.)

2006 MSDE / MARCES Conference


Static models pardos et al 2006 l.jpg
Static Models: ScoresPardos et al. (2006)

  • Compared nested versions of binary skills models (coded both ASSISTments and MCAS):

  • gi = 0.10, si = 0.05, all items; k = 0.5, all skills

  • Inferred skills from ASSISTments; computed expected score for 30-item MCAS subset

2006 MSDE / MARCES Conference


Static models anozie 2006 l.jpg
Static Models: Anozie (2006) Scores

  • Focused on 77 active skills in WPI April Model

  • Estimated k’s, gi’s and si’s using flexible priors

  • Predicted full raw 54-pt MCAS score as linear function of (expected) number of skills learned

2006 MSDE / MARCES Conference


Static models anozie 200616 l.jpg
Static Models: Anozie (2006) Scores

Main Item:

Which graph contains the points in the table?

Slip

Guess

Scaffolds:

  • Quadrant of (-2,-3)?

  • Quadrant of (-1,-1)?

  • Quadrant of (1,3)?

  • [Repeat main]

2006 MSDE / MARCES Conference


Dynamic prediction models l.jpg
Dynamic Prediction Models Scores

  • Razzaq et al. (to appear): evidence of learning over time

  • Feng et al. (to appear): student or item covariates plus linear growth curves (a la Singer & Willett, 2003)

  • Anozie and Junker (2006): changing influence of online metrics over time

2006 MSDE / MARCES Conference


Dynamic models razzaq et al to appear l.jpg
Dynamic Models: ScoresRazzaq et al. (to appear)

  • ASSISTment system is sensitive to learning

  • Not clear what is the source of learning here…

2006 MSDE / MARCES Conference


Dynamic models feng et al to appear l.jpg
Dynamic Models: ScoresFeng et al. (to appear)

  • Growth-Curve Model I: Overall Learning

School was a better predictor (BIC) than Class or Teacher;

possibly because School demographics dominate the intercept.

  • Growth-Curve Model II: Learning in Strands

Sept_Test is a good predictor of baseline proficiency.

Baseline and learning rates varied by Strand.

2006 MSDE / MARCES Conference


Dynamic models anozie and junker 2006 l.jpg
Dynamic Models: ScoresAnozie and Junker (2006)

  • Look at changing influence of online metrics on MCAS prediction over time

    • Compute monthly summaries of all online metrics (not just %-correct)

    • Build linear prediction model for each month, using all current and prev. months’ summaries

  • To enhance interpretation, variable selection

    • by metric, not by monthly summary

    • include/exclude metrics simultaneously in all monthly models

2006 MSDE / MARCES Conference


Dynamic models anozie and junker 200621 l.jpg
Dynamic Models: ScoresAnozie and Junker (2006)

  • More months helps more than more metrics

  • First 5 online metrics retained for final model(s)

2006 MSDE / MARCES Conference


Dynamic models anozie and junker 200622 l.jpg
Dynamic Models: ScoresAnozie and Junker (2006)

2006 MSDE / MARCES Conference


Dynamic models anozie and junker 200623 l.jpg
Dynamic Models: ScoresAnozie and Junker (2006)

  • Recent main question performance dominates – proficiency?

2006 MSDE / MARCES Conference


Dynamic models anozie and junker 200624 l.jpg
Dynamic Models: ScoresAnozie and Junker (2006)

  • Older performance on scaffolds similar to recent – learning?

2006 MSDE / MARCES Conference


Summary of prediction models l.jpg
Summary of Prediction Models Scores

  • Feng et al. (in press) compute the split-half MAD of the MCAS and estimate ideal % Error ~ 11%, or MAD ~ 6 points.

  • Ayers & Junker (2006) compute reliabilities of the ASSISTment sets seen by all students and estimate upper and lower bounds for optimal MAD: 0.67 MAD 5.21.

2006 MSDE / MARCES Conference


New directions l.jpg
New Directions Scores

  • We have some real evidence of learning

    • We are not yet modeling individual student learning

  • Current teacher report: For each skill, report percent correct on all items for which that skill is hardest.

    • Can we do better?

  • Approaches now getting underway:

    • Learning curve models

    • Knowledge-tracing models

2006 MSDE / MARCES Conference


New directions cen koedinger junker 2005 l.jpg
New Directions: ScoresCen, Koedinger & Junker (2005)

  • Inspired by Draney, Pirolli & Wilson (1995)

    • Logistic regression for successful skill uses

    • Random intercept (baseline proficiency)

    • fixed effects for skill and skill*opportunity

      • Difficulty factor: skill but not skill*opportunity

      • Learning factor: skill and skill*opportunity

    • Part of Data Shop at http://www.learnlab.org

  • Feng et al. (to appear) fit similar logistic growth curve models to ASSISTment items

2006 MSDE / MARCES Conference


New directions anozie 2006 l.jpg
New Directions: Anozie (2006) Scores

  • DINA model can be used to infer skills directly

  • Experimental posterior intervals illustrated above right

  • When students’ data contradicts prior or “borrowed info” from other students, intervals widen

2006 MSDE / MARCES Conference


New directions knowledge tracing l.jpg
New Directions: ScoresKnowledge Tracing

  • Combine knowledge tracing approach of Corbett, Anderson and O’Brien (1995) with DINA model of Junker and Sijtsma (2001)

  • Each skill represented by a two state (unlearned/learned) Markov process with absorbing state at “learned”.

  • Can locate time during school year when each skill is learned.

  • Work just getting underway (Jiang & Junker).

2006 MSDE / MARCES Conference


Discussion l.jpg
Discussion Scores

  • ASSISTment system

    • Great testbed for online cognitive modeling and prediction technologies

    • Didn’t mention reporting and “gaming detection” technologies

    • Teachers positive, students impressed

  • Ready-Fire-Aim

    • Important! Got system up and running, lots of user feedback & buy-in

    • But… E.g. lack of control over content and content-rollout (content balance vs MCAS?)

    • Given this, perhaps only crude methods needed/possible for MCAS prediction?

2006 MSDE / MARCES Conference


Discussion31 l.jpg
Discussion Scores

  • Multiple skill codings for different purposes

    • Exam prediction vs. teacher feedback; state to state.

  • Scaffolds

    • Dependence between scaffolds and main items

    • Forced-scaffolding: main right  scaffolds right

    • Content sometimes skills-based, sometimes tutorial

      • We are now building some true one-skill decomps to investigate stability of skills across items

  • Student learning over time

    • Clearly evidence of that!

    • Some experiments not shown here suggest modest but significant value-added for ASSISTments

    • Starting to model learning, time-to-mastery, etc.

2006 MSDE / MARCES Conference


References l.jpg
References Scores

Anozie, N. (2006). Investigating the utility of a conjunctive model in Q-matrix assessment using monthly student records in an online tutoring system. Proposal submitted to the National Council on Measurement in Education 2007 Annual Meeting.

Anozie, N.O. & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.

Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighth-grade mathematics? American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.

Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Working paper.

Corbett, A. T., Anderson, J. R., & O'Brien, A. T. (1995) Student modeling in the ACT programming tutor. Chapter 2 in P. Nichols, S. Chipman, & R. Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.

Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.

Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrocs to measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp 31-40.

Feng, M., Heffernan, N., Mani, M., & Heffernan, C. (2006). Using mixed effects modeling to compare different grain-sized skill models. AAAI06 Workshop on Educational Data Mining, Boston MA.

Feng, M., Heffernan, N. T., & Koedinger, K. R. (in press). Addressing the testing challenge with a web-based E-assessment system that tutors as it assesses. Proceedings of the 15th Annual World Wide Web Conference. ACM Press (Anticipated): New York, 2005.

Hao C., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A*Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005.

Junker, B.W. & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement 25: 258-272.

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika 64, 187-212.

Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan. 2006.

Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., & Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th Artificial Intelligence In Education. Amsterdam: ISO Press. pp 555-562.

Razzaq, L., Feng, M., Heffernan, N. T., Koedinger, K. R., Junker, B., Nuzzo-Jones, G., Macasek, N., Rasmussen, K. P., Turner, T. E. & Walonoski, J. (to appear). A web-based authoring tool for intelligent tutors: blending assessment and instructional assistance. In Nedjah, N., et al. (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series (see http://isebis.eng.uerj.br).

Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.

Websites:

http://www.assistment.org

http://www.learnlab.org

http://www.educationaldatamining.org

2006 MSDE / MARCES Conference



Full set of online metrics l.jpg
Full Set of Online Metrics Scores

2006 MSDE / MARCES Conference


ad