1 / 37

Boltzman Machines

Boltzman Machines. Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131. Document classification given binary vectors Nuclear power station – dont want positive examples!. Two ways a model can generate data:

ewan
Download Presentation

Boltzman Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131

  2. Document classification given binary vectors Nuclear power station – dont want positive examples!

  3. Two ways a model can generate data: Causal model: First generate latent variables (hidden units), then … Boltzman Machines: …

  4. What to do when the network is big

  5. What is Needed for Learning:

  6. Learning in Boltzman Machines Lecture 12a

  7. Modelling the input vectors There are no labels; we want to build a model of a set of input vectors.

  8. Given that one needs to know about all the other weights, it is very surprising that there is a simple learning algorithm:

  9. How often i and j are on together when v is clamped on visible units How often i and j are on together when v is NOT clamped

  10. First term in the rule saysraise the weights in proportion to the product of activities that the units have (Hebbian learning). But if we only use this rule, the weights will all become positive and the whole system will blow up. So the second term in the rule says to decrease how often the units are on together when you are sampling from the model’s distribution. An alternate view is that the first term is like the storage term for a Hopfield net and the second term term for getting rid of the spurious minima. And this is the correct way of thinking about that (that tells you how much unlearning to do).

  11. Unlearning to get rid of the spurious minima

  12. You expect the energy landscape to have many different minimum that are fairly separated and have about the same energy. • Model a set of images all of which has the same energy and unreasonable images with very high energy. • Sample how often to units are on together = measuring the correlation between two units • Repeat over all the data vectors

  13. Restricted Boltzman Machines Lecture 12c

  14. Much simplified architecture: No connection between hidden units • If visible units are given, equilibrium distribution of hidden units can be computed in one step – because hidden units are all independent from one another given the state of visible units • Proper Boltzman Machine learning alg. is still slow for a restricted Boltzman machine • In 1998, a short cut for Boltzman machines (Hinton) • approx. but works well in practice • caused resurgence in this area

  15. Note that this does not depend on what other units are doing; so can be computed all in parallel.

  16. Fantasy particles == global configurations After each weight update, you update the fantasy particles a little and that should bring them back to close to being in equilibrium. Algorithm works very well at building density models.

  17. Alternate but much faster algorithm:

  18. Hinton 2002 -

  19. Example of Contrastive Divergence Lecture 12d

  20. RBMs for Collaborative Filtering Lecture 12e

More Related