1 / 22

How to win big by thinking straight about relatively trivial problems

How to win big by thinking straight about relatively trivial problems. Tony Bell University of California at Berkeley. Density Estimation. Make the model. like the reality. by minimising the Kullback-Leibler Divergence:. by gradient descent in a parameter of the model :.

jacinda
Download Presentation

How to win big by thinking straight about relatively trivial problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to win big by thinking straight about relatively trivial problems Tony Bell University of California at Berkeley

  2. Density Estimation Make the model like the reality by minimising the Kullback-Leibler Divergence: by gradient descent in a parameter of the model : THIS RESULT IS COMPLETELY GENERAL.

  3. The passive case ( = 0) For a general model distribution written in the ‘Energy-based’ form: energy partition function (or zeroth moment...) the gradient evaluates in the simple ‘Boltzmann-like’ form: learn on data while awake unlearn on data while asleep

  4. The single-layer case Shaping Density Linear Transform Many problems solved by modeling in the transformed space Learning Rule (Natural Gradient) for non-loopy hypergraph The Score Function is the important quantity

  5. Conditional Density Modeling To model use the rules: This little known fact has hardly ever been exploited. It can be used instead of regression everywhere.

  6. Independent Components, Subspaces and Vectors ICA ISA IVA DCA (ie: score function hard to get at due to Z)

  7. IVA used for audio-separation in real room:

  8. Score functions derived from sparse factorial and radial densities:

  9. Results on real-room source separation:

  10. Why does IVA work on this problem? Because the score function, and thus the learning, is only sensitive to the amplitude of the complex vectors, representing correlations of amplitudes of frequency components associated with a single speaker. Arbitrary dependencies can exist between the phases of this vector. Thus all phase (ie: higher-order statistical structure) is confined within the vector and removed between them. • It’s a simple trick, just relaxing the independence assumptions • in a way that fits speech. But we can do much more: • build conditional models across frequency components • make models for data that is even more structured: • Video is [time x space x colour] • Many experiments are [time x sensor x task-condition x trial]

  11. The big picture. Behind this effort is an attempt to explore something called “The Levels Hypothesis”, which is the idea that in biology, in the brain, in nature, there is a kind of density estimation taking place across scales. To explore this idea, we have a twofold strategy: 1. EMPIRICAL/DATA ANALYSIS: Build algorithms that can probe the EEG across scales, ie: across frequencies 2. THEORETICAL: Formalise mathematically the learning process in such systems.

  12. LEVEL UNIT DYNAMICS LEARNING predation, symbiosis ecology society natural selection sensory-motor learning society organism behaviour organism cell spikes synaptic plasticity direct, voltage, Ca, 2nd messenger cell protein molecular change gene expression, protein recycling protein amino acid molecular forces A Multi-Level View of Learning ( = STDP) Increasing Timescale LEARNING at a LEVEL is CHANGE IN INTERACTIONS between its UNITS, implemented by INTERACTIONS at the LEVEL beneath, and by extension resulting in CHANGE IN LEARNING at the LEVEL above. Interactions=fast Learning=slow Separation of timescales allows INTERACTIONS at one LEVEL to be LEARNING at the LEVEL above.

  13. y V1 synaptic weights retina x This SHIFT in looking at the problem alters the question so that if it is answered, we have an unsupervised theory of ‘whole brain learning’. Infomax between Levels. (eg: synapses density-estimate spikes) Infomax between Layers. (eg: V1 density-estimates Retina) 1 2 all neural spikes t synapses, dendites y all synaptic readout • overcomplete • includes all feedback • information flows between levels • arbitrary dependencies • models input and intrinsic activity • square (in ICA formalism) • feedforward • information flows within a level • predicts independent activity • only models outside input pdf of all spike times pdf of all synaptic ‘readouts’ If we can make this pdf uniform then we have a model constructed from all synaptic and dendritic causality

  14. It is easier to live in a world where one can Formalisation of the problem: p is the ‘data’ distribution, q is the ‘model’ distribution w is a synaptic weight, and I(y,t) is the spike synapse mutual information IF THEN if we were doing classical Infomax, we would use the gradient: (1) BUT if one’s actions can change the data, THEN an extra term appears: (2) changing one’s model to fit the world change the world to fit the model, as well as therefore (2) must be easier than (1). This is what we are now researching.

More Related