1 / 50

NEURAL VARIATIONAL IDENTIFICATION AND FILTERING

Learn how to perform unsupervised deep learning statistical inference using variational filtering in dynamical systems. This optimization engine utilizes neural networks to perform posterior inference and provides a structure for variational filtering in complex models. Expectation Maximization, Variational Inference, and variance reduction techniques are covered. Suitable for statisticians, ML researchers, and dynamical system experts.

gartht
Download Presentation

NEURAL VARIATIONAL IDENTIFICATION AND FILTERING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NEURAL VARIATIONAL IDENTIFICATION AND FILTERING Henning Lange, Mario Bergés, Zico Kolter

  2. Variational Filtering Dynamical Systems Variational Filtering Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)

  3. Variational Filtering Dynamical Systems This makes it unsupervised Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)

  4. Variational Filtering This provides the structure Dynamical Systems Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)

  5. Variational Filtering Dynamical Systems This is the optimization engine Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)

  6. Variational Filtering For statistician: Expectation Maximization… … but with a Neural Network that tells us where to look.

  7. Variational Filtering For ML researcher: Deep Neural Network… … that learns to perform posterior inference.

  8. Variational Filtering For dynamical system guy: Non-linear Kalman filter… … that is unbiased* and quite fast to evaluate.

  9. Recap • Monte Carlo Integration • Importance sampling with

  10. Outline • 1. Statistics • Expectation Maximization • Variational Inference • 2. Deep Learning • Distributions parameterized by Neural Nets • 3. Dynamical Systems • Additional challenges from intractable joint distributions • 4. Variance Reduction

  11. Expectation Maximization in one slide • EM is a technique to perform ML inference of parameters in a latent variable model (unsupervised learning) • Latent variable : state of appliances on/off • Coordinate Ascent on: • E-Step: • M-Step: Increase Neal, Radford M., and Geoffrey E. Hinton. "A view of the EM algorithm that justifies incremental, sparse, and other variants." Learning in graphical models. 

  12. Example: Non-Intrusive Load Monitoring = some prior, e.g. sparsity • Expectation Maximization allows for learning • could constitute reactive/active power of appliances or waveforms

  13. Intractable posterior distributions • EM requires computation of • For many interesting latent variable models, computing is intractable

  14. Intractable posterior distributions • For many interesting latent variable models, computing is intractable • NILM is one of them : the latent domain grows exponentially with number of appliances

  15. Variational Inference in two slides • Expectation Maximization: • Variational Inference: Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. 

  16. Variational Inference in two slides • Variational Inference: Evidence Lower BOund (ELBO)

  17. Variational Inference in two slides • Variational Inference: Extract waveforms that best explain data!

  18. Variational Inference in two slides • Variational Inference: Posterior Inference!

  19. Connection Deep Learning • We choose to be parameterized by a Neural Networks • More detail:

  20. Connection: Dynamical Systems • Appliances evolve over time • The temporal dynamics are important (overfitting) …

  21. Variational Filtering • Variational Filtering:

  22. Variational Filtering • Variational Filtering:

  23. Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable

  24. Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Intractable for two reasons!

  25. Reason 1: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!

  26. Reason 1: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!

  27. Reason 2: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!

  28. Reason 2: Approximating the data likelihood = Importance sampling and MC integration!

  29. Reason 2: Approximating the data likelihood =

  30. Putting the pieces together

  31. Putting the pieces together This is tractable!

  32. Are we done? Sadly no, the gradient estimator w.r.t. has high variance. However, there is remedy.

  33. VI: Variance

  34. VI: Variance Unbiased but high variance!

  35. VI: Variance • More general if is independent of : =

  36. VI: Variance • More general if is independent of : = What’s an appropriate ?

  37. VI: Variance Reduction • The inability to compute causes high variance • Why don’t we just use an approximation of as a control variate?

  38. Variance reduction • Samples are drawn without replacement from Q This is not a trivial problem!

  39. Variance reduction • Samples are drawn without replacement from Q • In order to reduce the variance of the estimator, we subtract (control variate)

  40. Variance reduction • Samples are drawn without replacement from Q • In order to reduce the variance of the estimator, we subtract (control variate)

  41. Variational Filtering: algorithmically

  42. Variational Filtering: algorithmically

  43. Variational Filtering: algorithmically

  44. Variational Filtering: algorithmically

  45. Variational Filtering: algorithmically

  46. Variational Filtering: algorithmically

  47. Variational Filtering: algorithmically compute

  48. Variational Filtering: algorithmically

  49. NVIF: Results

  50. Performance NVIF

More Related