Ling 696b categorical perception perceptual magnets and neural nets
Download
1 / 52

LING 696B: Categorical perception, perceptual magnets and neural nets - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

LING 696B: Categorical perception, perceptual magnets and neural nets. From last time: quick overview of phonological acquisition. Prenatal to newborn 4-day-old French babies prefer listening to French over Russian (Mehler et al, 1988)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' LING 696B: Categorical perception, perceptual magnets and neural nets' - neal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

From last time quick overview of phonological acquisition
From last time: quick overview of phonological acquisition neural nets

  • Prenatal to newborn

    • 4-day-old French babies prefer listening to French over Russian (Mehler et al, 1988)

  • Prepared to learn any phonetic contrast in human languages (Eimas, 71)


A quick overview of phonological acquisition
A quick overview of phonological acquisition neural nets

  • 6 months: Effect of experience begin to surface (Kuhl, 92)

  • 6 - 12 months: from a universal perceiver to a language-specific one (Werker & Tees, 84)


A quick overview of phonological acquisition1
A quick overview of phonological acquisition neural nets

  • 6 - 9 months: knows the sequential patterns of their language

    • English v.s. Dutch (Jusczyk, 93)

      • “avoid” v.s. “waardig”

    • Weird English v.s. better English (Jusczyk, 94)

      • “mim” v.s. “thob”

    • Impossible Dutch v.s. Dutch (Friederici, 93)

      • “bref” v.s. “febr”


A quick overview of phonological acquisition2
A quick overview of phonological acquisition neural nets

  • 6 months: segment words out of speech stream based on transition probabilities (Saffran et al, 96)

    • pabikutibudogolatupabikudaropi…

  • 8 months: remembering words heard in stories read to them by graduate students (Jusczyk and Hohne, 97)


A quick overview of phonological acquisition3
A quick overview of phonological acquisition neural nets

  • From 12-18 months and above: vocabulary growth from 50 words to a “burst”

    • Lexical representation?

  • Toddlers: learning morpho-phonology

    • Example: U-shaped curve went --> goed --> wented/went


Next few weeks learning phonological categories
Next few weeks: learning phonological categories neural nets

  • The innovation of phonological structure requires us to think of the system of vocalizations as combinatorial, in that the concatenation of inherently meaningless phonological units leads to an intrinsically unlimited class of words ... It is a way of systematizing existing vocabulary items and being about to create new ones.

    -- Ray Jackendoff (also citing C. Hockett in “Foundations of Language”)


Categorical perception
Categorical perception neural nets

  • Categorization

  • Discrimination


Eimas et al 1971
Eimas et al (1971) neural nets

  • Enduring results: tiny (1-4 month) babies can hear almost anything

  • Stimuli: 3 pairs along the /p/-/b/ synthetic continuum, differing in 20ms


Eimas et al 19711
Eimas et al (1971) neural nets

  • Babies get

    • Very excited hearing a change cross the category boundary

    • Somewhat excited when the change stay within the boundary

    • Bored when there is no change


Eimas et al 19712
Eimas et al (1971) neural nets

  • Their conclusion: [voice] feature detection is innate

  • But similar things were shown for Chichillas (Kuhl and Miller, 78) and monkeys (Ramus et al)

  • So maybe this is a Mammalian aditory system thing…


Sensitivity to distributions
Sensitivity to distributions neural nets

  • Percetual Magnet effects (Kuhl, 91): tokens near prototypes are harder to discriminate

  • Perhaps a fancier term is “perceptual space warping”


Categorical perception v s perceptual magnets
Categorical perception v.s. perceptual magnets neural nets

  • Gradient effects: Kuhl (91-95) tried hard to demonstrate these are not experimental errors, but systematic

Refererce 1 2 3 4


Categorical perception v s perceptual magnets1
Categorical perception v.s. perceptual magnets neural nets

  • PM is based on a notion of perceptual space

    • Higher dimensions

    • Non-category boundaries


Kuhl et al 1992
Kuhl et al (1992) neural nets

  • American English has /i/, but no /y/

  • Swedish has /y/, but no /i/

  • Test whether infants can detect differences between prototype and neighbors

Percent

of treating

as the same


Kuhl et al 92 s interpretation
Kuhl et al (92)’s interpretation neural nets

  • Lingustic experience changes perception of native and non-native sounds (also see Werker & Tees, 84)

  • Influence of input starts before any phonemic contrast is learned

  • Information stored as prototypes


Modeling attempt guenther and gjaja 96
Modeling attempt: Guenther and Gjaja (96) neural nets

  • Claim: CP/PM is a matter of auditory cortex, and does not have to involve higher-level cognition

  • Methodology: build a connectionist model inspired by neurons


Self organizing map in guenther and gjaja
Self-organizing map in Guenther and Gjaja neural nets

  • Input coding:

  • So for each (F1,F2) pair, use 4 input nodes

  • This seems to be an artifact of the mechanism (with regard to scaling), but a standard practice in the field


Self organizing map in guenther and gjaja1
Self-organizing map in Guenther and Gjaja neural nets

mj : activation

Connection weights


Self organizing map learning stage
Self-organizing map: learning stage neural nets

mj : activation

Connection weights:initialized to be random

Learning rate

Learning rule:

mismatch

update

activation


Self organizing map learning stage1
Self-organizing map: learning stage neural nets

  • Crucially, only update connections to the most active N cells

    • The “inhibitory connection”

  • N goes down to 1 over time

  • Intuition: each cell selectively encodes certain kind of input. Otherwise the weights are pulled in different directions


Self organizing map testing stage
Self-organizing map: testing stage neural nets

  • Population vector: take a weighted sum of individual votes

  • Inspired by neural firing patterns in monkey reaching

  • How to compute Fvote?

    • X+, X- must point to the same direction as Z+, Z-

    • Algebra exercise: derive a formula to calculate Fvote (homework Problem 1)


The distributin of f vote after learning
The distributin of F neural netsvote after learning

Distribution of Fvote

Trainining stimuli

Recall: activation


Results shown in the paper
Results shown in the paper neural nets

Kuhl

Their model


Results shown in the paper1
Results shown in the paper neural nets

With some choice of

parameters to fit the

“percent generalization”


Local summary
Local summary neural nets

  • Captures a type of pre-linguistic / pre-lexical learning from distributions

    • (next time) Distribution learning can happen in a short time (Maye&Gerken, 00)

  • G&G: No need to have categories in order to get PM effects

    • Also see Lacerda (95)’s exemplar model

  • Levels of view: cognition or cortex

    • Maybe Guenther is right …


Categorization discrimination training guenther et al 99
Categorization / discrimination training (Guenther et al, 99)

  • 1. Space the stimuli according to JND

  • 2. Test whether they can hear the distinctions in control and training regions

  • 3. TRAIN in some way(45 minutes!)

  • 4. Test again as in 2.

Band-passed white noise


Guenther et al 99
Guenther et al, 99 99)

  • Categorization training:

    • Tell people that sounds from the training region belong to a “prototype”

    • Present them sounds from training region and band-edges, ask whether it’s prototype

  • Discrimination training:

    • People say “same” or “different”, and get feedback

    • Made harder by tighter spacing over time


Guenther et al 991
Guenther et al, 99 99)

  • Result of distribution learning depends on what people do with it

control

training


Guenther s neural story for this effect
Guenther’s neural story for this effect 99)

  • Propose to modify G&G’s architecture to simulate effects of different training procedure

  • Did some brain scanto verify the hypothesis(Guenther et al, 04)

  • New model forthcoming?


Guenther s neural story for this effect1
Guenther’s neural story for this effect 99)

  • Propose to modify G&G’s architecture to simulate effects of different training procedure

  • Did some brain scanto verify the hypothesis(Guenther et al, 04)

  • New model forthcoming?(A+ guaranteed if you turn one in)


The untold story of guenther and gjaja
The untold story of Guenther and Gjaja 99)

  • Size of model matters a lot

  • Larger models give more flexibility, but no free lunch:

    • Take longer to learn

    • Must tune the learning rate parameter

    • Must tune the activation neighborhood

100 cells

500 cells

1500 cells


Randomness in the outcome
Randomness in the outcome 99)

  • This is perhaps the nicest

  • Some other runs


A technical view of g g s model
A technical view of G&G’s model 99)

  • “Self-organization”: a type of unsupervised learning

    • Receive input with no information on which categories they belong

    • Also referred to as “clustering” (next time)

  • “Emergent behavior”: spatial input patterns coded connection weights that realize the non-linear mapping


A technical view of g g s model1
A technical view of G&G’s model 99)

  • A non-parametric approximation to the input-percept non-linear mapping

    • Lots of parameters, but individually do not correspond to categories

    • Will look at a parametric approach to clustering next time

  • Difficulties: tuning of parameters, rates of convergence, formal analysis (more later)


Other questions for g g
Other questions for G&G 99)

  • If G&G is a model of auditory cortex, then where do F1 and F2 come from? (see Adam’s presentation)

  • What if we expose G&G to noisy data?

  • How does it deal withspeech segmentation?

  • More next time.


Homework problems related to perceptual magnets
Homework problems related to perceptual magnets 99)

  • Problem 2: warping in one dimension predicted by perceptual magnets

  • Problem 3: program a simplified G&G type of model or any other SOM to simulate the effect shown above

Before

Training data

After?


Commercial cogsci colloquium this year
Commercial: 99)CogSci colloquium this year

  • Many big names in connectionism this semester


A digression to the history of neural nets
A digression to the history of neural nets 99)

  • 40’s -- study of neurons (Hebb, Huxley&Hodgin)

  • 50’s -- self-programming computers, biological inspirations

  • ADALINE from Stanford (Widrow & Hoff)

  • Perceptron (Rosenblatt, 60)


Perceptron
Perceptron 99)

  • Input: x1, …, xn

  • Output: {0, 1}

  • Weights: w1,…,wn

  • Bias: a

  • How it works:


What can perceptrons do
What can perceptrons do? 99)

  • A linear machine: finding a plane that separates the 0 examples from 1 examples


Perceptron learning rule
Perceptron learning rule 99)

  • Start with random weights

  • Take an input, check if the output, when there is an error

    • weight = weight + a*input*error

    • bias = bias + a*error

  • Repeat until no more error

  • Recall G&G learning rule

  • See demo


Criticism from traditional ai minsky papert 69
Criticism from traditional AI 99)(Minsky & Papert, 69)

  • Simple perceptron not capable of learning simple boolean functions

  • More sophisticated machines hard to understand

  • Example: XOR (see demo)


Revitalization of neural nets connectionism
Revitalization of neural nets: connectionism 99)

  • Minsky and Papert’s book stopped the work on neural nets for 10 years …

  • But as computer gets faster and cheaper,

  • CMU parallel distributed processing (PDP) group

  • UC San Diego institute of cognitive science


Networks get complicated more layers
Networks get complicated: more layers 99)

  • 3 layer perceptron can solve the XOR problem


Networks get complicated combining non linearities
Networks get complicated: combining non-linearities 99)

  • Decision surface can be non-linear


Networks get complicated more connections
Networks get complicated: more connections 99)

  • Bi-directional

  • Inhibitory

  • Feedback loops


Key to the revitalization of connectionism
Key to the revitalization of connectionism 99)

  • Back-propagation algorithm that can be used to train a variety of networks

  • Variants developed for other architectures

error


Two kinds of learning with neural nets
Two kinds of learning with neural nets 99)

  • Supervised

    • Classification: patterns --> {1,…,N}

    • Adaptive control: control variables --> observed measurements

    • Time series prediction: past events --> future events

  • Unsupervised

    • Clustering

    • Compression / feature extraction


Statistical insights from the neural nets
Statistical insights from the neural nets 99)

  • Overtraining / early-stopping

  • Local maxima

  • Model complexity -- training error tradeoff

  • Limits of learning: generalization bounds

  • Dimension reduction, feature selection

  • Many of these problems become standard topics in machine learning


Big question neural nets and symbolic computing
Big question: neural nets and symbolic computing 99)

  • Grammar: harmony theory (Smolensky, and later Optimality Theory)

  • Representation: Can neural nets discovery phonemes?


Next week
Next week 99)

  • Reading will be posted on the course webpage


ad