Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop. Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University. Overview. DataShop Overview Logging model DataShop Features
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop
Ken Koedinger CMU Director of PSLC
Professor of Human-Computer Interaction & Psychology
Carnegie Mellon University
Relational Database -- complex!
<context_message context_message_id="C2badca9c5c:-7fe5" name="START_PROBLEM"> <dataset> <name>Geometry Hampton 2005-2006</name><level type="Lesson"> <name>PACT-AREA</name> <level type="Section"> <name>PACT-AREA-6</name> <problem> <name>MAKING-CANS</name> </problem> </level> </level> </dataset></context_message>
<tool_message context_message_id="C2badca9c5c:-7fe5"> <semantic_event transaction_id="T2a9c5c:-7fe7" name="ATTEMPT" /> <event_descriptor> <selection>(POG-AREA QUESTION2)</selection> <action>INPUT-CELL-VALUE</action> <input>200.96</input> </event_descriptor></tool_message><tutor_message context_message_id="C2badca9c5c:-7fe5"> <semantic_event transaction_id="T2a9c5c:-7fe7" name="RESULT" /> <event_descriptor> … [as above] … </event_descriptor> <action_evaluation>CORRECT</action_evaluation></tutor_message>
Papers and Files storage
Problem Breakdown table
Multipurpose tool to help identify areas that are too hard or easy
Visualizes changes in student performance over time
View by KC or Student, Assistance Score or Error Rate
Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC
View by Problem or KC
Easily create a sample/filter to view a smaller subset of data
Shared (only owner can edit) and private samples
You can also export the Problem Breakdown table and LFA values!
Glossary of common terms, tied in with PSLC Theory wiki
Without decomposition, using just a single “Geometry” KC,
no smooth learning curve.
But with decomposition, 12 KCs for area concepts,
a smooth learning curve.
Upshot: A decomposed KC model fits learning & transfer data better than a “faculty theory” of mind
Y = a Xb
Y – error rate
X – opportunities to practice a skill
a – error rate on 1st opportunity
b – learning rate
After the log transformation
“a” is the“intercept” or starting point of the learning curve
“b” is the “slope” or steepness of the learning curve
=> use an intercept parameter for each student
=> no slope parameters for each student
=> use an intercept parameter for each production
=> use a slope parameter for each production
Probability of getting a step correct (p) is proportional to:
Use logistic regression because response is discrete (correct or not) Probability (p) is transformed by “log odds” “stretched out” with “s curve” to not bump up against 0 or 1
(Related to “Item Response Theory”, behind standardized tests …)
How to represent relationship between knowledge components and student tasks?
Tasks also called items, questions, problems, or steps (in problems)
Q-Matrix (Tatsuoka. 1983)
2* 8 is a single-KC item
2*8 – 3 is a conjunctive-KC item, involves two KCs
Learning curve constrast in Physics dataset …
Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.
More detailed cognitive model yields smoother learning curve. Better tracks nature of student difficulties & transfer
(Few observations after 10 opportunities yields noisy data)
Better than simpler Single-KC model
And better than more complex Unique-step (IRT) model
Best BIC (parsimonious fit) for Default (original) KC model
See handout of exercise …Do some of in next session