- 163 Views
- Uploaded on
- Presentation posted in: General

Models for the balance scale

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **Models for the balance scale On balance, connectionist networks do not work
Han van der Maas
University of Amsterdam
Brenda Jansen, Hedderik van Rijn, Maartje Raijmakers, Philip Quinlan

**2. **Talk Summary of empirical results
Summary of models
How rule like are children?
Do connectionist networks explain children’s behavior?
New data: transitions
An alternative ACT-R model
Newer data: RT
How to proceed?

**3. **Balance scale task Rule I: only weight
Rule II: also distance when weights equal
Rule III:guess on conflict items
Rule IV: compare torques
Addition: compare sums
Buggy: shift weights until non-conflict item
QP: all conflicts balance
SDD: smallest distance down

**4. **Short history of balance scale research Piaget & Inhelder: proportional reasoning
Siegler: Item types, Rules 1 to 4, role of encoding, overlapping waves
Wilkening: Functional measurement (Kerkman & Wright)
Ferretti & Butterfield: torque difference effect
Others: Addition rule, Buggy rule, QP rule

**5. **Short history of balance scale modeling Production Rule Models
Klahr & Siegler, ’78, representational issues
Sage & Langley, ’83, ’87, construction of rules by discrimination analysis, transition mechanism
Newell, ’90, SOAR
Decision-Tree Models
Schmidt & Ling, ’96, incremental decision tree learning
Connectionist models
McClelland, ’89, ’95, back-propagation
Shultz et al, ’91, ’94, ’95, modification of network topology by cascade-correlation
New ‘symbolic’ model: ACT-R (van Rijn ‘03)

**6. **McClelland’s PDP model & Shultz’s CC model Rule like behavior of children can be explained with implicit, gradual, patterns of activation: non-symbolic
Main points of evidence:
Simulate rule-like data
Stages
Torque difference effect
Addition rule

**7. **Critical evaluation Why?
Connectionist account is dominant
PDP and CC are applied to a large number of cognitive tasks
PDP and CC are extremely simple models
Dynamic system account and connectionist claim to model higher order cognition (i.e., replace information processing accounts)
Progress requires criticism
Note
This is not a dynamical critique !
In the past I criticized the dynamical approach for partly the same reasons (higher order cognition)

**8. **So let’s see.. Do connectionist models really simulate rule-like data?
Analysis of rules
Discontinuities
How important is the torque difference effect?
Can PDP/CC models do the torque rule?

**9. **Rules Criteria of Reese ‘89: regular, consistent, discontinuous, transferable, evidenced from different sources, conscious
Data of Siegler suggest consistency
Analyzed with Rule Assessment Methodology (RAM)
RAM also used for analysis of data simulated with networks

**10. **Rule Assessment Methodology 6 item types
Compare answer patterns with answer patterns according to the rules
Allow certain misfit (20%)
Additional criteria for some rules
If two rules fit equally well then ‘unclassified’
If overall fit >80% then acceptable
This type of procedure is used a lot in research in which subjects have to be classified

**11. **Problems of RAM Informal procedure (no statistical underpinning)
Arbitrary criteria that can not be generalized
Rules are pre-specified
What to do with ties?
Rule III cannot be detected
The necessity of individual rules cannot be assessed
There are no formal fit measures

**12. **Latent class analysis Advanced rule assessment
Established statistical technique
Many books, courses and papers
Commercial and free programs
Hundreds of applications in all sciences in last years
Especially useful in this case
Categorical latent structure model
Can be used explorative (to find new rules) and/or confirmative (to test rules)
Solves problem of ties
Criteria based on statistical theory

**13. **Example of LCA: conflict balance

**14. **Note: LCA is not without problems Requires lots of data
Baysian methods allow analysis of all items at once
Model selection criteria
AIC or BIC, bootstrap statistics
Rule change during test
Hidden Markov models
Nevertheless LCA (and related statistical techniques) are a welcomed step forward

**15. **Evidence for rule use by children LCA models give rise to good fits; at least 80 % of the classes can be ascribed to known rules
Verbal justifications: awareness
From different sources: justifications, RT
Transferable: Siegler ‘81
Discontinuous change from R1 to R2
Hysteresis (Jansen &vdMaas, 2001)
bimodality
Many children and adults master rule IV

**16. **But... Rule III: guessing
Mostly addition
Not perfect fit of rule model
Given strictness of LCA
Rule switching
Is variability in rule use
Measurement error
Variability near transitions
Torque difference effect

**17. **Rule classification depends on torque difference
Ferretti & Butterfield investigated 4 levels:
On non-conflict items: 1, 3, 12 and 24-30 units
On conflict items: 1, 3, 5, and 18-24 units
They found a significant effect of torque difference
Torque difference effect

**18. **Re-analysis Jansen & vdMaas ‘97

**19. **Analysis of PDP model No fitting latent class model (Jansen & vdMaas, ‘97)
Only 19 % of classes fit a known rule
Stages are artifact of scoring method (Raijmakers, ‘96):
only continuous change
no bimodality
No Rule IV

**20. **Analysis of the CC model Quinlan, Rendall, Jansen, Booij, vdMaas (in revision) report on two independent replications of CC model:
Latent class models fit (multigroup LCA)
9 class multigroup model LR=61.24 p(bootstrap) = .54, best BIC

**21. **LCA model interpretations Very clear:
Rule 1
Rule II
Mixture of additive rules
No pure addition
Two ‘odd’ rules
No Rule IV
Developmental order partly incorrect

**22. **Rue IV: torque Jansen & vdMaas ‘02: 11% use Rule IV (LCA), note: no training, no feedback!
Rule IV is easily taught (but hard to discover)
LCA of CC and PDP show (no signs of) Rule IV
Only 10 % of the torque items (which are failed with the addition rule) are solved correctly
Long training, on selective sets of items does not help

**23. **Can PDP and CC multiply? Additive activation function
At best: mimic multiplication with moderate success with careful choice of weights in a limited training and test set

**24. **Evaluation CC Works better than PDP
Rule I, II and weighted addition (Wilkening)
LCA models fit
But
Rule I, II (i.e. weight preference) by bias in input
Weighted addition is in the activation rule
LCA shows no weighted addition by children on balance scale
No Rule IV

**25. **Transition model

**26. **New data We found small but significant evidence for hysteresis in the transition from Rule I to Rule II

**27. **Act-R model of balance scale

**28. **Model evaluation: Rules, rule construction
Developmental order
Product difference effect for extreme PD differences
Rule I to II transition (possibly hysteresis as function of saliency)
Learning without feedback
Disadvantages
elements of rules transferred from other domains
Constraints build in
Rules a bit ‘too good’
Open question: can we model this Rule I-II transition in a neural network?

**29. **Newer data: RT’s We found support for rules with response times
Computer test (10* 7 items), 147 children 44 undergrads.
Rule classification by cluster analysis (agreement with rule assessment method high)
Fit of RT models
Regression models (using R packages lm, nls and nlme)
Linear part (rules + item characteristics)
non-linear (exponential learning function)
mixed effects (individual intercepts & slopes)
Estimates of duration of processing stages, learning and age parameters, and inconsistency parameters + fit measures

**30. **Mean RT’s rules * types

**31. **Compensation: Addition or buggy rule Addition rule: sum weight and distance on each side and compare the sums.
Buggy rule: shift the pile with the largest number of weights until either distances or weights are equal.
Problem: response patterns are the same !

**32. **Addition or buggy

**33. **Model 2 fit (example compensation rule)

**34. **Weight-distance items

**35. **RT: new challenge We think the RT data provide a new challenge for computational modeling of the balance scale task
Lots of new effects to explain
Both connectionist and Act-R models are able to predict response times
Usher & McClelland model

**36. **On balance, they do not work… Connectionist networks may fail to mimic rules but since children’s behavior is not perfectly rule-like, there is always room for discussion
We think behavior is more rule-like than connectionist networks can explain, based on:
Torque difference effect not relevant
LCA results
Humans use Rule IV
Transitions between rules
McClelland ‘95: PDP models leave out explicit rules (which humans can and do use) and are therefore missing an important aspect of human cognition

**37. **Explicit cognition Brain activity is largely implicit, graded, continuous and parallel
But in the end: explicit cognition is what makes us different from our cats and dogs

**38. **Is there a future for connectionist modeling of balance scale? Yes!
Use more complex models
PDP and CC are more than 10 years old
Allow multiplicative activation rules, or allow torque by different input representations
Use networks that show phase transitions in their learning behavior
Use networks with interesting dynamical properties
ART networks provide interesting possibilities for the future
Many more new models
Combine neural with symbolic architectures
Rule extraction from neural networks
Focus on new data