Educational data mining overview introduction to exploratory data analysis with datashop
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop. Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University. Overview. DataShop Overview Logging model DataShop Features

Download Presentation

Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Educational data mining overview introduction to exploratory data analysis with datashop

Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop

Ken Koedinger CMU Director of PSLC

Professor of Human-Computer Interaction & Psychology

Carnegie Mellon University


Overview

Overview

  • DataShop Overview

    • Logging model

    • DataShop Features

  • Quantitative models of learning curves

    • Power law, logistic regression

    • Contrasting KC models

  • Exploratory Data Analysis Exercise (start)

  • Knowledge Component Model Editing


Logging storage models

Logging & Storage Models

  • Education technologies are “instrumented” to produce log data

  • We encourage a standard log format

    • XML format generalized from Ritter & Koedinger (1995)

    • Also convert log data from other formats


Educational data mining overview introduction to exploratory data analysis with datashop

Relational Database -- complex!


Example activity generating click stream data

Example activity generating “click stream” data

  • Geometry Cognitive Tutor: “Making Cans” problem

    • Find the area of scrap metal left over after removing a circular area (the end of a can) from a metal square.

    • Student enters values in worksheet

  • Tutor provides feedback & instruction

    • Records student’s actions & tutor responses

  • Logs stored in files on school server or database at Carnegie Learning

    • Later imported into DataShop


Datashop logging model

DataShop logging model

  • Main constructs:

    • Context message: the student, problem, and session with the tutor

    • Tool message: represents an action in the tool performed by a student or tutor

    • Tutor message: represents a tutor’s response to a student action


Datashop xml format context message

DataShop XML format: Context message

Dataset name

Course unit

<context_message context_message_id="C2badca9c5c:-7fe5" name="START_PROBLEM"> <dataset> <name>Geometry Hampton 2005-2006</name><level type="Lesson"> <name>PACT-AREA</name> <level type="Section"> <name>PACT-AREA-6</name> <problem> <name>MAKING-CANS</name> </problem> </level> </level> </dataset></context_message>

Course section

Problem


Datashop xml format tool tutor messages

DataShop XML format: Tool & Tutor Messages

<tool_message context_message_id="C2badca9c5c:-7fe5"> <semantic_event transaction_id="T2a9c5c:-7fe7" name="ATTEMPT" /> <event_descriptor> <selection>(POG-AREA QUESTION2)</selection> <action>INPUT-CELL-VALUE</action> <input>200.96</input> </event_descriptor></tool_message><tutor_message context_message_id="C2badca9c5c:-7fe5"> <semantic_event transaction_id="T2a9c5c:-7fe7" name="RESULT" /> <event_descriptor> … [as above] … </event_descriptor> <action_evaluation>CORRECT</action_evaluation></tutor_message>


Example stored transactions

Example Stored Transactions

  • Student interactions (or transactions) are stored in a relational database, can be exported as table

    • Example: Student S01 on Making-Cans problem


Transactions

Transactions

  • Info for each transaction

    • student(s), session, time, problem, problem step, attempt number, student action

    • tutor response, number of hints, knowledge component code

  • Logging of on-line tools (e.g., a virtual lab) does not include tutor response


Step transaction definitions

Step & Transaction Definitions

  • A problem-solving activity typically involves many tool & tutor messages.

  • “Steps” represent completion of possible subgoals or pieces of a problem solution

  • “Transactions” are attempts at a step or requests for instructional help


Example data aggregated by student step

Example: data aggregated by student-step


Overview1

Overview

  • DataShop Overview

    • Logging model

    • DataShop Features

  • Quantitative models of learning curves

    • Power law, logistic regression

    • Contrasting KC models

  • Exploratory Data Analysis Exercise (start)

  • Knowledge Component Model Editing


Datashop analysis tools

DataShop Analysis Tools

  • Dataset Info

  • Performance Profiler

  • Learning Curve

  • Error Report

  • Export

  • Sample Selector


Dataset info

Dataset Info

  • Meta data for given dataset

  • PI’s get ‘edit’ privileges, others must request it

Papers and Files storage

Problem Breakdown table

Dataset Metrics

15


Performance profiler

Performance Profiler

Multipurpose tool to help identify areas that are too hard or easy

  • View measures of

  • Error Rate

  • Assistance Score

  • Avg # Hints

  • Avg # Incorrect

  • Residual Error Rate

  • Aggregate by

  • Step

  • Problem

  • KC

  • Dataset Level


Learning curve

Learning Curve

Visualizes changes in student performance over time

View by KC or Student, Assistance Score or Error Rate

Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC


Error report

Error Report

  • Provides a breakdown of problem information (by step) for fine-grained analysis of problem-solving behavior

  • Attempts are categorized by student

View by Problem or KC


Sample selector

Sample Selector

Easily create a sample/filter to view a smaller subset of data

  • Filter by

  • Condition

  • Dataset Level

  • Problem

  • School

  • Student

  • Tutor Transaction

Shared (only owner can edit) and private samples


Export

Export

You can also export the Problem Breakdown table and LFA values!

  • Two types of export available

    • By Transaction

    • By Step

  • Anonymous, tab-delimited file

  • Easy to import into Excel!


Help documentation

Help/Documentation

  • Extensive documentation with examples

  • Contextual by tool/report

  • http://learnlab.web.cmu.edu/datashop/help

Glossary of common terms, tied in with PSLC Theory wiki


New features

New Features

  • Manage Knowledge Component models

    • Create, Modify & Delete KC models within DataShop

  • Addition of Latency Curves to Learning Curve Reporting

    • Time to Correct

    • Assistance Time

  • Problem Rollup & Export

  • Enhanced Contextual Help


Overview2

Overview

  • DataShop Overview

    • Logging model

    • DataShop Features

  • Quantitative models of learning curves

    • Power law, logistic regression

    • Contrasting KC models

  • Exploratory Data Analysis Exercise (start)

  • Knowledge Component Model Editing


Recall learning curve story

Recall learning curve story

Without decomposition, using just a single “Geometry” KC,

no smooth learning curve.

But with decomposition, 12 KCs for area concepts,

a smooth learning curve.

Upshot: A decomposed KC model fits learning & transfer data better than a “faculty theory” of mind


Learning curve analysis

Learning curve analysis

  • The Power Law of Learning (Newell & Rosenbloom, 1993)

    Y = a Xb

    Y – error rate

    X – opportunities to practice a skill

    a – error rate on 1st opportunity

    b – learning rate

    After the log transformation

    “a” is the“intercept” or starting point of the learning curve

    “b” is the “slope” or steepness of the learning curve


More sophisticated learning curve model

More sophisticated learning curve model

  • Generalized Power Law to fit learning curves

    • Logistic regression (Draney, Wilson, Pirolli, 1995)

  • Assumptions

    • Different students may initially know more or less

      => use an intercept parameter for each student

    • Students learn at the same rate

      => no slope parameters for each student

    • Some productions may be more known than others

      => use an intercept parameter for each production

    • Some productions are easier to learn than others

      => use a slope parameter for each production

  • These assumptions are reflected in detailed math model …


More sophisticated learning curve model1

More sophisticated learning curve model

p 

Probability of getting a step correct (p) is proportional to:

  • if student i performed this step = Xi, add overall “smarts” of that student = i

  • if skill j is needed for this step = Yj, add easiness of that skill = jadd product of number of opportunities to learn = Tj & amount gained for each opportunity = j

Use logistic regression because response is discrete (correct or not) Probability (p) is transformed by “log odds” “stretched out” with “s curve” to not bump up against 0 or 1

(Related to “Item Response Theory”, behind standardized tests …)


Different representation same model

Different representation, same model

  • Predicts whether student is correct depending on knowledge & practice

  • Additive Factor Model (Draney, et al. 1995, Cen, Koedinger, Junker, 2006)


The q matrix

The Q Matrix

How to represent relationship between knowledge components and student tasks?

Tasks also called items, questions, problems, or steps (in problems)

Q-Matrix (Tatsuoka. 1983)

2* 8 is a single-KC item

2*8 – 3 is a conjunctive-KC item, involves two KCs

29


Model evaluation

Model Evaluation

  • How to compare cognitive models?

    • A good model minimizes prediction risk by balancing fit with data & complexity (Wasserman 2005)

  • Compare BIC for the cognitive models

    • BIC is “Bayesian Information Criteria”

    • BIC = -2*log-likelihood + numPar * log(numOb)

    • Better (lower) BIC == better predict data that haven’t seen

  • Mimics cross validation, but is faster to compute

  • 30


    Educational data mining overview introduction to exploratory data analysis with datashop

    • Data: the Geometry Area Unit

      • 24 students, 230 items, 15 KCs

    31


    Learning curve constrast in physics dataset

    Learning curve constrast in Physics dataset …


    Educational data mining overview introduction to exploratory data analysis with datashop

    Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.


    Educational data mining overview introduction to exploratory data analysis with datashop

    More detailed cognitive model yields smoother learning curve. Better tracks nature of student difficulties & transfer

    (Few observations after 10 opportunities yields noisy data)


    Educational data mining overview introduction to exploratory data analysis with datashop

    Better than simpler Single-KC model

    And better than more complex Unique-step (IRT) model

    Best BIC (parsimonious fit) for Default (original) KC model


    Overview3

    Overview

    • DataShop Overview

      • Logging model

      • DataShop Features

    • Quantitative models of learning curves

      • Power law, logistic regression

      • Contrasting KC models

    • Exploratory Data Analysis Exercise (start)

    • Knowledge Component Model Editing


    Exploratory data analysis exercise

    Exploratory Data Analysis Exercise

    • Goals: 1) Get familiar with data 2) Learn/practice Excel skills

    • Tasks: 1) create a “step table” 2) graph learning curves


    Two circles in square problem initial screen

    TWO_CIRCLES_IN_SQUARE problem: Initial screen


    Two circles in square problem an error a few steps later

    TWO_CIRCLES_IN_SQUARE problem: An error a few steps later


    Two circles in square problem student follows hint completes prob

    TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob


    Exported file loaded into excel

    Exported File Loaded into Excel


    See handout of exercise do some of in next session

    See handout of exercise …Do some of in next session


    Overview4

    Overview

    • DataShop Overview

      • Logging model

      • DataShop Features

    • Quantitative models of learning curves

      • Power law, logistic regression

      • Contrasting KC models

    • Exploratory Data Analysis Exercise (start)

    • Knowledge Component Model Editing


    Datashop demo

    DataShop Demo

    • Examples of exercise

    • KC model editing


    Educational data mining overview introduction to exploratory data analysis with datashop

    END


  • Login