CS 182 Sections 101 - 102

CS 182Sections 101 - 102 bad puns alert! Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 (http://www2.hi.net/s4/strangebreed.htm)

Announcements • a3 part 1 is due tonight (submit as a3-1) • The second tester file is up, so pls. start part 2. • The quiz is graded (get it after class).

Where we stand • Last Week • Backprop • This Week • Recruitment learning • color • Coming up • Imagining techniques (e.g. fMRI)

The Big (and complicated) Picture Psycholinguistics Experiments Spatial Relation Motor Control Metaphor Grammar Cognition and Language Computation Chang Model Bailey Model Narayanan Model Structured Connectionism abstraction Neural Net & Learning Regier Model SHRUTI Computational Neurobiology Triangle Nodes Visual System Biology Neural Development Quiz Midterm Finals

Quiz • What is a localist representation? What is a distributed representation? Why are they both bad? • What is coarse-fine encoding? Where is it used in our brain? • What can Back-Propagation do that Hebb’s Rule can’t? • Derive the Back-Propagation Algorithm • What (intuitively) does the learning rate do? How about the momentum term?

What are the drawbacks of each representation? Distributed vs Localist Rep’n

What happens if you want to represent a group? How many persons can you represent with n bits? 2^n What happens if one neuron dies? How many persons can you represent with n bits? n Distributed vs Localist Rep’n

… … Visual System • 1000 x 1000 visual map • For each location, encode: • orientation • direction of motion • speed • size • color • depth • Blows up combinatorically!

Coarse Coding info you can encode with one fine resolution unit = info you can encode with a few coarse resolution units Now as long as we need fewer coarse units total, we’re good

Feature 1e.g. Orientation Y-Orientation Y X X-Orientation Y-Dir X-Dir Coarse-Fine Coding Coarse in F2, Fine in F1 but we can run into ghost “images” G G Coarse in F1, Fine in F2 Feature 2e.g. Direction of Motion

yj wij yi xi f ti:target xi = ∑j wij yj yi = f(xi) Back-Propagation Algorithm We define the error term for a single node to be ti - yi Sigmoid:

i2 i1 Gradient Descent global mimimum: this is your goal it should be 4-D (3 weights) but you get the idea

wjk wij yi ti: target learning rate k j i E = Error = ½ ∑i (ti – yi)2 The output layer The derivative of the sigmoid is just

wjk wij yi ti: target k j i E = Error = ½ ∑i (ti – yi)2 The hidden layer

i1 w01 w02 i2 y0 0.4268 b=1 w0b x0 f 0 0 suppose  = 0.5 learning rate Let’s just do an example 0 0.8 1/(1+e^-0.5) 0 0.6 0.5 0.6224 E = Error = ½ ∑i (ti – yi)2 E = ½ (t0 – y0)2 0.5 E = ½ (0 – 0.6224)2 = 0.1937

CS 182 Sections 101 - 102

CS 182 Sections 101 - 102

Presentation Transcript

CS 182 Sections 103 - 104

CS 182 Sections 101 - 104

CS 182 Sections 103 - 104

CS 102

CS 182 Sections 101 - 102

CS 102

CS 102

CS 102

CS 102

CS 182 Sections 103 - 104

CS 102

CS 102

CS 102

CS 102

CS 102

CS 102

CS 102

CS 182 Sections 103 - 104