1 / 26

Learning overhypotheses with hierarchical Bayesian models

Learning overhypotheses with hierarchical Bayesian models. Charles Kemp, Amy Perfors, Josh Tenenbaum ( Developmental Science , 2007). Learning word meanings from examples. “horse”. “horse”. “horse”. The “shape bias” in word learning (Landau, Smith, Jones 1988). This is a dax.

betha
Download Presentation

Learning overhypotheses with hierarchical Bayesian models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning overhypotheses with hierarchical Bayesian models Charles Kemp, Amy Perfors, Josh Tenenbaum (Developmental Science, 2007)

  2. Learning word meanings from examples “horse” “horse” “horse”

  3. The “shape bias” in word learning(Landau, Smith, Jones 1988) This is a dax. Show me the dax… • English-speaking children have a “shape bias”, picking the object with the same shape. • The shape bias is a useful inductive constraint or “overhypothesis”: majority of early words are labels for object categories, and shape may be the best cue to object category membership.

  4. What is the relation between y and x?

  5. What is the relation between y and x?

  6. What is the relation between y and x?

  7. Overhypotheses (Chomsky) (Prince, Smolensky) • Syntax: Universal Grammar • Phonology Faithfulness constraints Markedness constraints • Word Learning Shape bias Principle of contrast Whole object bias • Folk physics Objects are unified, bounded and persistent bodies • Predicability M-constraint • Folk biology Taxonomic principle (Markman) (Spelke) (Keil) (Atran) ... ...

  8. Overhypotheses 1. How does overhypotheses guide learning from sparsely observed data? 2. What form do overhypotheses take, across different domains and tasks? 3. How are overhypotheses themselves acquired? 4. How can overhypotheses provide constraints yet maintain flexibility, balancing assimilation and accommodation?

  9. The “shape bias” in word learning(Landau, Smith, Jones 1988) This is a dax. Show me the dax… • English-speaking children have a “shape bias” at 24 months of age, but 20-month-olds do not.…

  10. “lug” “wib” “zup” “div” Is the shape bias learned? • Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories: • After 8 weeks of training (20 min/week), 19-month-olds show the shape bias: “Learned attentional bias” “Transfer learning” Show me the dax… “Transfer Learning” This is a dax.

  11. Transfer to real-world vocabulary The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

  12. Learning about feature variability “lug” “wib” “zup” “div” The intuition: - Shape varies across categories but relatively constant within categories. - Other features (size, color, texture) vary both across and within nameable object categories.

  13. Learning about feature variability Marbles of different colors: … ?

  14. Learning about feature variability Marbles of different colors: … ?

  15. A hierarchical model Color varies across bags but not much within bags Level 2: Bags in general Level 1: Bag proportions mostly yellow mostly green? mostly blue mostly red mostly brown … Data …

  16. Simultaneously infer A hierarchical Bayesian model Level 3: Prior expectations on bags in general Level 2: Bags in general Level 1: Bag proportions … Data …

  17. A hierarchical Bayesian model Level 3: Prior expectations on bags in general Level 2: Bags in general x “Bag 1 is mostly red” Level 1: Bag proportions … Data …

  18. A hierarchical Bayesian model Level 3: Prior expectations on bags in general x Level 2: Bags in general “Bag 2 is mostly yellow” Level 1: Bag proportions … Data …

  19. A hierarchical Bayesian model Level 3: Prior expectations on bags in general Level 2: Bags in general “Color varies across bags but not much within bags” Level 1: Bag proportions … Data …

  20. “lug” “wib” “zup” “div” Learning the shape bias Training Assuming independent Dirichlet-multinomial models for each dimension… “wib” “lug” “div”

  21. “lug” “wib” “zup” “div” Learning the shape bias Training Assuming independent Dirichlet-multinomial models for each dimension, we learn that: • Shape varies across categories but not within categories. • Texture, color, size vary across and within categories. “wib” “lug” “div”

  22. Training Test Learning the shape bias This is a dax. Show me the dax…

  23. Extensions Training Test 5 ? ? Category 1 1 2 2 3 3 4 4 • Learning with weaker shape representations. • Learning to transfer selectively, dependent on knowledge of ontological kinds. • By age ~3, children know that a shape bias is appropriate for solid object categories (ball, book, toothbrush, …), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, …). { Holes 3 3 2 2 3 3 1 2 6 6 5 Curvature 1 1 2 3 4 4 4 4 Shape features 6 6 5 Edges 5 3 4 4 1 1 4 4 6 6 5 Aspect ratio 2 2 1 1 3 1 5 5 6 6 5 { Main color 1 2 2 5 3 3 1 4 6 5 6 Color distribution 4 3 1 5 2 3 2 4 Other features 6 5 6 Oriented texture 5 1 2 2 1 4 5 5 6 5 6 Roughness 2 2 4 2 3 5 4 3 6 5 6

  24. Modeling selective transfer Let be the ontological kind of category i. Given , we could learn a separate Dirichlet-multinomial model for each ontological kind: Variability in solidity, shape, material within kind 1 Variability in solidity, shape, material within kind 2 “toof” material “dax” shape solid non-solid “dax” “zav” “fep” “wif” “wug” “toof”

  25. Learning to transfer selectively “wug” “wif” “wug” Chicken-and-egg problem: We don’t know the partition into ontological kinds. The input: Solution: Define a nonparametric prior over this partition. “dax” “dax” “dax” “wif” “zav” “wif” “zav” solid “zav” “wug” non-solid (c.f. Roy & Kaelbling IJCAI 07)

  26. Summary • Inductive constraints or “overhypotheses” are critical for learning so much so fast. New overhypotheses can be learned by children, often very early in development. • Not just innate, nor the result of gradual abstraction from many specific experiences. • Hierarchical Bayesian models (HBMs) may help explain the role of overhypotheses in learning as well as how overhypotheses may themselves be acquired from experience (even relatively little experience). • The “blessing of abstraction” • Overhypotheses must constrain learning yet also be flexible, capable of revision, extension or growth. • Nonparametric HBMs can navigate this “assimilation vs. accommodation” tradeoff.

More Related