1 / 58

MURI Update 7/26/12

MURI Update 7/26/12. UC Berkeley Prof. Trevor Darrell (UCB) Prof. Michael Jordan (UCB) Prof. Brian Kulis ( fmr UCB postdoc, now Asst. Prof. OSU) Dr. Stefanie Jegelka (UCB postdoc ). Recent Effort. NPB inference of visual structures - Objects - Trajectories

fleta
Download Presentation

MURI Update 7/26/12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MURI Update 7/26/12 UC Berkeley Prof. Trevor Darrell (UCB) Prof. Michael Jordan (UCB) Prof. Brian Kulis(fmr UCB postdoc, now Asst. Prof. OSU) Dr. Stefanie Jegelka (UCB postdoc)

  2. Recent Effort • NPB inference of visual structures - Objects - Trajectories • Develop richer representations (appearance, shape) • Efficient implementation in constrained environments • Distributed and decentralized variants

  3. NPB Trajectory Models • Extend NPB models to trajectory domains: • find structure in motion trajectories • identify anomalies • Considering Bluegrass data and possibly ARGUS track data via LLNL collaboration • HDP and hard clustering variants • Decentralized / distributed implementation topics topics

  4. Fix Covariances, Take Limit Make Bayesian, Take Limit Mixture of Gaussians DP Mixture ?? k-means New Algorithms for Hard Clustering via Bayesian Nonparametrics Generalized to Hard HDP as well; [Kulis and Jordan ICML2012] • Kulis Presentation

  5. Small Variance Asymptotics, Bayesian Nonparametrics, and k-means • Brian Kulis • Ohio State University • Joint work with Michael Jordan (Berkeley), Ke Jiang (OSU), and Tamara Broderick (Berkeley)

  6. Fix Covariances, Take Limit Make Bayesian, Take Limit Mixture of Gaussians DP Mixture ???? k-means Generalizing k-means [Kulis and Jordan, ICML 2012]

  7. Why Should We Care? • For “k-means people” • Bayesian techniques permit great flexibility • Extensions to multiple data sets would not be obvious without this connection • For Bayesians • Hard clustering methods scale better • Connections to graph cuts and spectral methods • k-means just works

  8. Gaussian Mixture Models • No closed-form for optimizing likelihood • Typically resort to EM algorithm

  9. EM Algorithm E-Step M-Step

  10. Mixture of Gaussians k-means Fix Covariances, Take Limit k-means In the limit as sigma goes to 0, this value is 1 if centroid c is the closest, and 0 otherwise

  11. For mixture of Gaussians: Latent Variables: Data: Likelihood: Linear-Gaussian Latent Feature Models

  12. Log-Likelihood Asymptotics

  13. Let Z have a continuous distribution (Essentially) probabilistic PCA Other Linear-Gaussian Models As sigma goes to 0, get standard PCA

  14. The Polya Urn • Imagine an urn with theta balls for each of k colors • Pick a ball from the urn, replace the ball and add another ball of that color to the urn • Repeat n times • Induces a distribution over Z

  15. The Dirichlet-Multinomial Distribution (If theta an integer, gamma functions are factorials.)

  16. Nothing interesting happens unless: In this case, we obtain: Small-Variance Asymptotics

  17. ... The Chinese Restaurant Process • Customers sequentially enter the restaurant • First customer sits at first table • Subsequent customers sit at occupied table with probability proportional to number of occupants • Start a new table with probability propotional to theta 1 2 5 6 3 4

  18. The exchangeable partition probability function (EPPF): Select theta as before, yields asymptotically: The Chinese Restaurant Process

  19. Algorithmic Perspective: Gibbs Sampling • Suppose we want to sample from p(x), where • Repeatedly sample as follows

  20. Collapsed Gibbs Sampling for CRP/DP Mixtures • Want to sample from the posterior: • Need the following to do Gibbs sampling:

  21. Asymptotics of the Gibbs Sampler • Now, would like to see what happens when sigma goes to 0 • Need 1 additional thing • must be a function of sigma: DP Mixture ????

  22. Asymptotics of the Gibbs Sampler Start a New Cluster Existing Clusters In the limit:

  23. DP-means

  24. Underlying Objective Function Theorem: The DP-means algorithm monotonically minimizes this objective until local convergence.

  25. 3 Gaussians

  26. Clustering with Multiple Data Sets Want to cluster each data set, but also want to share cluster structure Use the Hierarchical Dirichlet Process (HDP)!

  27. The Hard Gaussian HDP: Objective Theorem: The Hard HDP algorithm monotonically minimizes this objective until local convergence.

  28. Exponential Families • What happens when we replace the Gaussian likelihood with an arbitrary exponential family distribution? • Do asymptotics where the mean is fixed but the covariance goes to 0:

  29. Exponential Families • Use the standard EF conjugate prior • Utilize Bregman divergences: • Asymptotics of Gibbs require Laplace’s method on the marginal likelihood • End up with same result as before, with Bregman divergence replacing sq. Euclidean

  30. Illustration: Hard Topic Models

  31. So far, have considered single assignment models What if each point can be assigned to multiple clusters? Overlapping Clusters / Binary Feature Models

  32. Bayesian Bayesian Nonparametric Overlapping Clusters Single Assignment Multi Assignment Non-Bayesian Can perform analogous small-variance asymptotics using the above multi-assignment distributions

  33. Indian Buffet Process • First customer samples dishes • Subsequent customer i samples existing dishes with probability equal to fraction of previous customers sampling that dish • Also samples new dishes

  34. Number of points possessing feature c Number of new dishes sampled by customer i Indian Buffet Process Small-Variance Asymptotics

  35. Indian Buffet Process Small-Variance Asymptotics (Reduces to DP-means obj. for single assignments)

  36. Spectral Clustering Connections • Standard spectral relaxation • Write as • Relax Y to be an arbitrary orthogonal matrix • Take matrix of top k eigenvectors of K

  37. Spectral Relaxation for DP-means • How do we extend this? • Write as • Relax Y to be an arbitrary orthogonal matrix • Take matrix of eigenvectors of K whose eigenvalues are greater than lambda

  38. Graph Clustering • Can further take advantage of connections between k-means and graph clustering • Generalize hard DP objective to weighted and kernel forms • Special cases lead to graph clustering objectives that penalize the number of clusters in the graph

  39. Conclusions and Open Questions • Focus was on obtaining non-probabilistic models from probabilistic ones • Small-variance asymptotics for Bayesian nonparametrics yields models which regularize / penalize • Number of clusters • Number of features • Number of topics • Also yields algorithmic insights

  40. Conclusions and Open Questions • Spectral or semidefinite relaxation for the hard HDP? • Better algorithms? • Local Search • Multilevel Methods • Split/Merge • Incorporate ideas from other inference schemes • Applications? • Other models?

  41. Thanks!

  42. 2) Decentralized models • Distributed Hard Topic Models • Jegelka presentation

  43. Decentralized k-means algorithms • Data distributed • Clusters shared globally • Cluster assignment • Update centers Communication Where store centers?

  44. Decentralized k-means • Local copies, gossip [Datta et al,…] • Local assignment & update • Sharing & averaging with neighbor(s)

  45. Decentralized k-means • Local copies, gossip [Datta et al,…] • Single copy of each mean • Summaries & pruning for restricted comparisons [Papatreou et al.]

  46. New questions • Decentralized cluster creation? • Sequential vs. parallel • Naturally hierarchical • Exploit structure to reduce communication • Partial sharing, e.g. locality • Integration into model? • Generalization to IBP, PYP, …

  47. Decentralized clustering • Observe trajectories in a scene • Cluster locations by local traffic behavior: HDP  anomalies, traffic prediction, … • Clusters are not omnipresent:partial sharing

  48. 3) Trajectory Data • Scalable Topic Models for Trajectories • Apply current results on scalable topic modeling to problems in vision • find structure in motion trajectories • identify anomalies • Considering Bluegrass data and possibly ARGUS track data via LLNL collaboration

  49. Trajectory Video

  50. Experimental Results Track Fragment Vocabulary (Colored by Speed) Road Network

More Related