discovering optimal training policies a new experimental paradigm n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Discovering Optimal Training Policies: A New Experimental Paradigm PowerPoint Presentation
Download Presentation
Discovering Optimal Training Policies: A New Experimental Paradigm

Loading in 2 Seconds...

play fullscreen
1 / 29

Discovering Optimal Training Policies: A New Experimental Paradigm - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Discovering Optimal Training Policies: A New Experimental Paradigm. Robert V. Lindsey, Michael C. Mozer Institute of Cognitive Science Department of Computer Science University of Colorado, Boulder Harold Pashler Department of Psychology UC San Diego.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Discovering Optimal Training Policies: A New Experimental Paradigm' - bella


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
discovering optimal training policies a new experimental paradigm

Discovering Optimal Training Policies:A New Experimental Paradigm

Robert V. Lindsey, Michael C. Mozer

Institute of Cognitive ScienceDepartment of Computer ScienceUniversity of Colorado, Boulder

Harold Pashler

Department of PsychologyUC San Diego

common experimental paradigm in human learning research
Common Experimental Paradigm In Human Learning Research
  • Propose several instructional conditions to compare based on intuition or theory
      • E.g., spacing of study sessions in fact learning
      • Equal: 1 – 1 – 1
      • Increasing: 1 – 2 – 4
  • Run many participants in each condition
  • Perform statistical analyses to establish reliable differencebetween conditions
what most researchers interested in improving instruction really want to do
What Most Researchers Interested In Improving Instruction Really Want To Do
  • Find the best training policy (study schedule)
  • Abscissa: space of all training policies
  • Performance function definedover policy space
approach
Approach
  • Perform single-participant experiments at selected points in policy space (o)
  • Use function approximationtechniques to estimate shapeof the performance function
  • Given current estimate,select promising policiesto evaluate next.
    • promising = has potentialto be the optimum policy

Gaussian

processregression

linear

regression

gaussian process regression
Gaussian Process Regression
  • Assumes only that functions are smooth
  • Uses data efficiently
  • Accommodates noisy data
  • Produces estimates of both function shape and uncertainty
embellishments on off the shelf gp regression
Embellishments On Off-The-ShelfGP Regression
  • Active selection heuristic: upper confidence bound
  • GP is embedded in generative task model
    • GP represents skill level (-∞ +∞)
    • Mapped to population mean accuracy on test (0  1)
    • Mapped to individual’s mean accuracy, allowing for interparticipant variability
    • Mapped to # correct responses via binomial sampling
  • Hierarchical Bayesian approach to parameter selection
    • Interparticipant variability
    • GP smoothness (covariance function)
glopnor graspability
GLOPNOR = Graspability
  • Ease of picking up & manipulating object with one hand
  • Based on norms from Salmon, McMullen, & Filliter (2010)
two dimensional policy space
Two-Dimensional Policy Space
  • Fading policy
  • Repetition/alternationpolicy
policy space
Policy Space

fading

policy

repetition/

alternation

policy

experiment
Experiment

Training

    • 25 trial sequence generated by chosen policy
    • Balanced positive / negative
  • Testing
    • 24 test trials, ordered randomly, balanced
    • No feedback, forced choice
  • Amazon Mechanical Turk
    • $0.25 / participant
results
Results

# correct of 25

best policy
Best Policy
  • Fade from easy to semi-difficulty
  • Repetitions initially, alternations later

*

final evaluation
Final Evaluation

65.7%

60.9%

N=49

N=53

68.6%

N=48

66.6%

N=50

novel experimental paradigm
Novel Experimental Paradigm
  • Instead of running a few conditions each with many participants, …
  • …run many conditions each with a different participant.
  • Although individual participants provide a very noisy estimate of the population mean, optimization techniques allow us to determine the shape of the policy space.
what next
What Next?
  • Plea for more interesting policy spaces!
  • Other optimization problems
    • Abstract concepts from examples
      • E.g., irony
    • Motivation
      • Manipulations
        • Rewards/points, trial pace, task difficulty, time pressure
      • Measure
        • Voluntary time on task
optimization
Optimization
  • E.g., time-varying repetition/alternation policy
how to do optimization
How To Do Optimization?
  • Reinforcement Learning
    • POMDPs
  • Function Approximation
    • Gaussian Process Surrogate-Based Optimization
approach1
Approach
  • (1) Using the current policy function estimate, choose a promising next policy to evaluate
  • (2) Conduct a small experiment using that policy to obtain a (noisy) estimate of population mean performance for that policy
  • (3) Use data collected so far to reestimate the shape of the policy function
  • (4) Go to step 1