1 / 26

Learning to make specific predictions using Slow Feature Analysis

Learning to make specific predictions using Slow Feature Analysis. Memory/prediction hierarchy with temporal invariances. Slow: temporally invariant abstractions. Fast: quickly changing input. But… how does each module work: learn, map, and predict?. My (old) module:

petra-knox
Download Presentation

Learning to make specific predictions using Slow Feature Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to make specificpredictions using Slow Feature Analysis

  2. Memory/prediction hierarchy with temporal invariances Slow: temporally invariant abstractions Fast: quickly changing input But… how does each module work: learn, map, and predict?

  3. My (old) module: • Quantize high-dim input space • Map to low-dim output space • Discover temporal sequences in input space • Map sequences to low-dim sequence language • Feedback = same map run backwards • Problems: • Sequence-mapping (step #4) depends on several previous • steps  brittle, not robust • Sequence-mapping not well-defined statistically

  4. New module design: Slow Feature Analysis (SFA) • Pro’s of SFA: • Nearly guaranteed to find some slow features • No quantization • Defined over entire input space • Hierarchical “stacking” is easy • Statistically robust building blocks (simple polynomials, Principal Components Analysis, variance reduction, etc) •  a great way to find invariant functions •  invariants change slowly, hence easily predictable

  5. BUT… ….No feedback! • Can’t get specific output from invariant input • It’s hard to take a low-dim signal and turn it into the right high-dim one (underdetermined) Here’s my solution (straightforward, probably done before somewhere): Do feedback with separate map

  6. First, show it working… … then, show how & why Input space: 20-dim “retina” Input shapes: Gaussian blurs (wrapped) of 3 different widths Input sequences: constant-velocity motion (0.3 pixels/step) T = 0 … T=2 … T=4 Pixel 21 = pixel 1 T = 23 … T=25 … T=27

  7. Sanity-check: slow features extracted match generating parameters: Gaussian std dev. “What” Slow feature #1 “Where” Gaussian center pos’n Slow feature #2 (… so far, this is plain vanilla SFA, nothing new…)

  8. New contribution: Predict all pixels of next image, given previous images… T = 0 … T=2 … T=4 T=5  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Reference prediction is to use previous image (“tomorrow’s weather is just like today’s”) T=4 T=5 

  9. Plot ratio: (mean-squared prediction error ) (mean-squared reference error) Reference prediction Median ratio over all points = 0.06 (including discontinuities) …over high-confidence points = 0.03 (toss worst 20%)

  10. Take-home messages: • SFA can be inverted • SFA can be used to make specific predictions • The prediction works very well • The prediction can be further improved by using confidence estimates So why is it hard, and how is it done?....

  11. Low-dim slow features: S1 = 0.3 x1 + 0.1 x12 + 1.4 x2 x3 + 1.1 x42 +…. + 0.5 x5 x9 + … Why it’s hard: easy High-dim: x1 x2 x3 ……………………………………………..…………………..x20 But given S1 = 1.4 S2 = -0.33 x1= ? x2=? x3=? x4=? x5=? x6=? . . . x20=? HARD • Infinitely many possibilities of x’s • Vastly under-determined • No simple polynomial-inverse formula (e.g. “quadratic formula”)

  12. Very simple, graphable example: (x1, x2) 2-dim S1 1-dim S1(t) = x12 + x22 nearly constant, i.e. slow x1(t), x2(t) approx circular motion in plane Illustrate a series of six clue/trick pairs for learning specific-prediction mapping

  13. Clue #1: The actual input data is a small subset of all possible input data (i.e. on a “manifold”) actual possible Trick #1: Find a set of points which represent where the actual input data is 20-80 “anchor points” Ai  (Found using k-means, k-medoids, etc. This is quantization, but only for feedback)

  14. Clue #2: The actual input data is not distributed evenly about those anchor-points yes no Trick #2: Calculate covariance matrix Ciof data around Ai  data Eigenvectors of Ci

  15. Clue #3: S(x)is locally linear about each anchor point  Trick #3: Construct linear (affine) Taylor-series mappings SLiapproximating S(x) about each Ai (NB: this doesn’t require polynomial SFA, just differentiable)

  16. Good news: Linear SLi can be pseudo-inverted (SVD) Bad news: We don’t want any old (x1,x2), we want (x1,x2)on the data manifold Clue #4: Covariance eigenvectors tell us about the local data manifold Trick #4: • Get SVD pseudo-inverseDX = SLi-1(Snew – S(Ai)) • Then stretch DX onto manifold by multiplying by chopped* Ci Snew DS Stretched DX S(Ai) DX DX …stretch… * Projection matrix, keeping only as many eigenvectors as dimensions of S

  17. Good news: Given Ai and Ci, we can invert Snew Xnew Bad news: How do we choose whichAi andSLi-1 to use? ? ? These three all have the same value of Snew ?

  18. Clue #5: a) We need an anchor Ai such thatS(Ai)is close toSnew Snew Close candidates S(Ai) b) Need a “hint” of which anchors are close in X-space Hint region Trick #5: Choose anchor Ai such that • Ai is “close to” the hint AND • S(Ai) is close to Snew

  19. All tricks together: Map local linear inverse about each anchor point S(Ai) neighbors x Anchors +

  20. Clue #6: The local data scatter can decide if a given point is probable (“on the manifold”) or not improbable probable Trick #6: Use Gaussian hyper-ellipsoid probabilities about closest Ai (this can tell if a prediction makes sense or not) improbable probable

  21. Estimated uncertainty increases away from anchor points -log(P)

  22. Summary of SFA inverse/prediction method: We have X(t-2), X(t-1), X(t)… we wantX(t+1) S • Calculate slow features S(t-2), S(t-1), S(t) t 2. Extrapolate that trend linearly to Snew (NB: S varies slowly/smoothly in time) Snew S t 3. Find candidate S(Ai)’s close to Snew Snew all S(Ai) e.g. candidate i= {1, 16, 3, 7}

  23. Summary cont’d 4. Take X(t)as “hint,” and find candidate Ai’s close to it e.g. candidate i = {8, 3, 5, 17} 5. Find “best” candidate Ai , whose index is high on both candidate lists:

  24. 6. Use chosen Aiand pseudo-inverse (i.e. SLi-1(Snew – S(Ai) ) with SVD) to get DX S(Ai) DX 7. Stretch DX onto low-dim manifold using chopped Ci Stretched DX DX …stretch… 8. Add stretched DX back onto Ai to get final prediction Ai Stretched DX

  25. improbable probable 9. Use covariance hyper-ellipsoids to estimate confidence in this prediction This method uses virtually everything we know about the data; any improvements presumably would need further clues… • Discrete sub-manifolds • Discrete sequence steps • Better nonlinear mappings

  26. Next steps • Online learning • Adjust anchor points and covariance as new data arrive • Use weighted k-medoid clusters to mix in old with new data • Hierarchy • Set output of one layer as input to next • Enforce ever-slower features up the hierarchy • Test with more complex stimuli and natural movies • Let feedback from above modify slow feature polynomials • Find slow features in the unpredicted input (input – prediction)

More Related