Learning to make specific predictions using Slow Feature Analysis

Learning to make specificpredictions using Slow Feature Analysis

Memory/prediction hierarchy with temporal invariances Slow: temporally invariant abstractions Fast: quickly changing input But… how does each module work: learn, map, and predict?

My (old) module: • Quantize high-dim input space • Map to low-dim output space • Discover temporal sequences in input space • Map sequences to low-dim sequence language • Feedback = same map run backwards • Problems: • Sequence-mapping (step #4) depends on several previous • steps  brittle, not robust • Sequence-mapping not well-defined statistically

New module design: Slow Feature Analysis (SFA) • Pro’s of SFA: • Nearly guaranteed to find some slow features • No quantization • Defined over entire input space • Hierarchical “stacking” is easy • Statistically robust building blocks (simple polynomials, Principal Components Analysis, variance reduction, etc) •  a great way to find invariant functions •  invariants change slowly, hence easily predictable

BUT… ….No feedback! • Can’t get specific output from invariant input • It’s hard to take a low-dim signal and turn it into the right high-dim one (underdetermined) Here’s my solution (straightforward, probably done before somewhere): Do feedback with separate map

First, show it working… … then, show how & why Input space: 20-dim “retina” Input shapes: Gaussian blurs (wrapped) of 3 different widths Input sequences: constant-velocity motion (0.3 pixels/step) T = 0 … T=2 … T=4 Pixel 21 = pixel 1 T = 23 … T=25 … T=27

Sanity-check: slow features extracted match generating parameters: Gaussian std dev. “What” Slow feature #1 “Where” Gaussian center pos’n Slow feature #2 (… so far, this is plain vanilla SFA, nothing new…)

New contribution: Predict all pixels of next image, given previous images… T = 0 … T=2 … T=4 T=5  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Reference prediction is to use previous image (“tomorrow’s weather is just like today’s”) T=4 T=5 

Plot ratio: (mean-squared prediction error ) (mean-squared reference error) Reference prediction Median ratio over all points = 0.06 (including discontinuities) …over high-confidence points = 0.03 (toss worst 20%)

Take-home messages: • SFA can be inverted • SFA can be used to make specific predictions • The prediction works very well • The prediction can be further improved by using confidence estimates So why is it hard, and how is it done?....

Low-dim slow features: S1 = 0.3 x1 + 0.1 x12 + 1.4 x2 x3 + 1.1 x42 +…. + 0.5 x5 x9 + … Why it’s hard: easy High-dim: x1 x2 x3 ……………………………………………..…………………..x20 But given S1 = 1.4 S2 = -0.33 x1= ? x2=? x3=? x4=? x5=? x6=? . . . x20=? HARD • Infinitely many possibilities of x’s • Vastly under-determined • No simple polynomial-inverse formula (e.g. “quadratic formula”)

Very simple, graphable example: (x1, x2) 2-dim S1 1-dim S1(t) = x12 + x22 nearly constant, i.e. slow x1(t), x2(t) approx circular motion in plane Illustrate a series of six clue/trick pairs for learning specific-prediction mapping

≠ Clue #1: The actual input data is a small subset of all possible input data (i.e. on a “manifold”) actual possible Trick #1: Find a set of points which represent where the actual input data is 20-80 “anchor points” Ai  (Found using k-means, k-medoids, etc. This is quantization, but only for feedback)

Clue #2: The actual input data is not distributed evenly about those anchor-points yes no Trick #2: Calculate covariance matrix Ciof data around Ai  data Eigenvectors of Ci

Clue #3: S(x)is locally linear about each anchor point  Trick #3: Construct linear (affine) Taylor-series mappings SLiapproximating S(x) about each Ai (NB: this doesn’t require polynomial SFA, just differentiable)

Good news: Linear SLi can be pseudo-inverted (SVD) Bad news: We don’t want any old (x1,x2), we want (x1,x2)on the data manifold Clue #4: Covariance eigenvectors tell us about the local data manifold Trick #4: • Get SVD pseudo-inverseDX = SLi-1(Snew – S(Ai)) • Then stretch DX onto manifold by multiplying by chopped* Ci Snew DS Stretched DX S(Ai) DX DX …stretch… * Projection matrix, keeping only as many eigenvectors as dimensions of S

Good news: Given Ai and Ci, we can invert Snew Xnew Bad news: How do we choose whichAi andSLi-1 to use? ? ? These three all have the same value of Snew ?

Clue #5: a) We need an anchor Ai such thatS(Ai)is close toSnew Snew Close candidates S(Ai) b) Need a “hint” of which anchors are close in X-space Hint region Trick #5: Choose anchor Ai such that • Ai is “close to” the hint AND • S(Ai) is close to Snew

All tricks together: Map local linear inverse about each anchor point S(Ai) neighbors x Anchors +

Clue #6: The local data scatter can decide if a given point is probable (“on the manifold”) or not improbable probable Trick #6: Use Gaussian hyper-ellipsoid probabilities about closest Ai (this can tell if a prediction makes sense or not) improbable probable

Estimated uncertainty increases away from anchor points -log(P)

Summary of SFA inverse/prediction method: We have X(t-2), X(t-1), X(t)… we wantX(t+1) S • Calculate slow features S(t-2), S(t-1), S(t) t 2. Extrapolate that trend linearly to Snew (NB: S varies slowly/smoothly in time) Snew S t 3. Find candidate S(Ai)’s close to Snew Snew all S(Ai) e.g. candidate i= {1, 16, 3, 7}

Summary cont’d 4. Take X(t)as “hint,” and find candidate Ai’s close to it e.g. candidate i = {8, 3, 5, 17} 5. Find “best” candidate Ai , whose index is high on both candidate lists:

6. Use chosen Aiand pseudo-inverse (i.e. SLi-1(Snew – S(Ai) ) with SVD) to get DX S(Ai) DX 7. Stretch DX onto low-dim manifold using chopped Ci Stretched DX DX …stretch… 8. Add stretched DX back onto Ai to get final prediction Ai Stretched DX

improbable probable 9. Use covariance hyper-ellipsoids to estimate confidence in this prediction This method uses virtually everything we know about the data; any improvements presumably would need further clues… • Discrete sub-manifolds • Discrete sequence steps • Better nonlinear mappings

Next steps • Online learning • Adjust anchor points and covariance as new data arrive • Use weighted k-medoid clusters to mix in old with new data • Hierarchy • Set output of one layer as input to next • Enforce ever-slower features up the hierarchy • Test with more complex stimuli and natural movies • Let feedback from above modify slow feature polynomials • Find slow features in the unpredicted input (input – prediction)

Learning to make specific predictions using Slow Feature Analysis

Learning to make specific predictions using Slow Feature Analysis

Presentation Transcript

Promoter Analysis using Bioinformatics, Putting the Predictions to the Test

Learning Objective : Make predictions using evidence from the text

Make Predictions

The Learning to Slow Down Series

Learning Objective : Make predictions using our prior knowledge

Using Stoichiometry to make predictions about reactions

Using specific praise

Ask Questions—Make Predictions

Make Predictions

SPECIFIC LEARNING DISABILTIES

XII. Site Specific Predictions Using Ray Methods

Distributed Feature-Specific Imaging

Using Ratios (Proportions) to Make Predictions

Using the Power Property with Exponential Models to Make Predictions

Slow particles analysis

Focus Skill: Make Predictions

Feature-Level Change Impact Analysis Using Formal Concept Analysis

Using Relationships to Make Predictions

Using Prior Knowledge to Make Predictions

Learning Feature Mappings Using Evolutionary Computation

Learning Instance Specific Distance Using Metric Propagation

Using Your Slow Cooker to Bake