300 likes | 471 Views
Distributed Representations. Psych 85-419/719 Feb 8, 2001. Distributed vs. Localist Representations. With localist representations, a given object is represented as a single unit. With distributed representations, a given object is represented by a pattern over multiple units
E N D
Distributed Representations Psych 85-419/719 Feb 8, 2001
Distributed vs. LocalistRepresentations • With localist representations, a given object is represented as a single unit. • With distributed representations, a given object is represented by a pattern over multiple units • Typically, there is sharing of units between objects.
dog dog cat tree has has is furry cat is legs furry noisy leaves legs Whether A Representation is Localist or Distributed DependsOn What Level We’re Looking At Distributed at sentence level, localist at word level
Same Applies At Different Levels of Analysis • Can represent a word as a set of letter units • Distributed at the word level • Localist at the letter level • Can represent a letter as a set of features • Distributed at the letter level • Localist at the feature level • Do we always have to have some localist level?
Efficiency • Encoding n objects using a localist representation requires n units • Encoding n objects using a distributed representation generally requires fewer • At least log2(n) units (if binary) • In practice, depends on the scheme you use • Chomsky & Halle’s features: 25 features to encode 35 phonemes • Ladefoged’s: 15 features for 35 phonemes • Hare et al: 11 features for 35 phonemes
Tradeoff of Size and Speed • Time to train a network is # of examples times # of weights times # of iterations through training set we need • Localist representations tend to require fewer training iterations • … but generally require more units
Tradeoff of Speed andScalability • With localist representations, must allocate a new unit for each new object we want to learn about. • With distributed units, existing units (and weights) can be used. • Learning new item involves adjusting weights.
Tradeoff of Speed and Robustness • With localist units, when a unit is damaged, the object it represents is lost • Response to damage is not gradual, nor spread across classes of items • Humans don’t exhibit this property • Brain damage tends to result in gradual degradation, and across sets of related things (knowledge of animals, tools, etc.)
Choice of Representation Can Influence Speed of Learning • Ex: past tense of English verbs. • Can end with a /d/ sound (blamed), a /t/ sound (baked), or /Id/ (painted) • Cued by preceedingsound • Representing with localist phonemes: rule is complex (d, g, m, n,v … -> /d/; k, f, s … -> /t/, etc.) • But all the /d/ items are voiced; /t/ ones are unvoiced
Choice of Representation Can Influence Quality of Generalization • Suppose a child had never heard a verb ending in the /CH/ phoneme • (like butch: “Rosie O’Donnel out-butched Tom Sellek in the interview”) • Would not be able to form correct past tense, if using was localist phonemes • But: if representation was distributed, phonemes and encoded voicing, correct generalization could be made.
An Example of Choices MadeWhen Designing a Representation • Observation: in English, the rule for pronouncing the vowel is often keyed by the rhyme (part of word from the vowel on) • Take words with an oo for the vowel: • cook, took, look, book, shook • fool, tool, drool, cool, school
ake ake at at at b c c t t b c t at ake at We Could Use That...
.. But That Has Problems • Sometimes people generalize over other units of analysis: • Ex: wh is usually pronounced like what, where, why • But: who--- is often pronounced with a h sound (who, whom, whole, whose, whore) • How do you pronounce the name of the fish market Wholey’s in Pittsburgh?
.. And Other Problems • Sometimes you encounter rhymes you haven’t seen before (e.g., Dolph) • … or onsets you haven’t seen (e.g., phlange)
One Plan... • Represent things at the lowest level you think might be necessary • Allow system to learn to extract higher level regularities • Requires learning rule that can extract such regularities
Another Plan... • Represent objects at multiple levels of abstraction • Ex: Coding letters, but also onsets and rhymes • Requires learning rule that can attend to different sources of information, without one dominating.
Distributed Representations on Input and Output CAT +animal, +feline, +meows CATS +animal, +feline, +meows, +plural CUP +container, +beverage CUPS +container, +beverage, +plural WUG +pointy_head, ... WUGS +pointy_head, …, +plural
The “Starvation” Problem • If an input unit is never active during training, what happens to the weights projecting from that unit? • Need to choose a representational scheme such that units don’t starve • Using distributed representations helps, but doesn’t guarantee we totally solve the problem
The Binding Problem blue square circle green
A Bad Solution • Conjunctive Coding: allocate a unit for each possible combination. • Way too expensive. Explodes exponentially with number of objects and features you have to represent. • Also doesn’t support decent generalization • Leads to starvation
A Different Solution • Coarse coding: Have different units sensive to properties in different regions of space • Human vision is a bit like this: • Early on in visual stream, units are sensitive to features (oriented line segments, for example) in small receptive fields • Higher up, the size of receptive fields gets larger
Potential Problems • How to know how big to make receptive fields? • Harder to encode a lot of features; resolution/accuracy tradeoff • Generalization?
Another Possibility:Temporal Binding blue square circle green
Using Slots • Try to assign roles to objects by using different collections of units for each role • Problem: generalizing across slots • phone, sphere, phrase 1st Ltr 2nd Ltr 3rd Ltr
mammal hammer carpentry living bird saw dog cat physical social tool Traditional View of Semantics action thing
Hierarchical Knowledge in Distributed Reps • We could instantiate this hierarchy directly • .. Or, allow much more wide connectivity. Let system learn relationships between concepts • .. Or, do it implicitly (see example from text)
Discovered vs. StipulatedRepresentations • Instead of deciding what our representations should be to learn a given task, we could learn what representations work best for that task n <n n
spelling Ex: Plaut & KelloPhonology Model meaning “phonology” sound articulation
For Next Time • Read PDP2, Ch 18, “On Learning the Past Tense of English Verbs” • Skim handout: Pinker & Prince 1988