Probabilistic Reasoning with Permutations

Thesis Proposal 11/24/08 Probabilistic Reasoning with Permutations Jonathan Huang Thesis Committee: Carlos Guestrin (chair) John Lafferty, Drew Bagnell LeonidasGuibas (Stanford University) Alex Smola (Yahoo! Research) Joint work with Carlos Guestrin, LeonidasGuibas, Xiaoye Jiang

Cards • It takes ordinary, imperfect shuffles to mix a deck of cards thoroughly • [Bayer and Diaconis, ‘92] justseven

Ranking & Voting • [Malandro and Rockmore, ‘07],[Lebanon and Mao, ‘07]

Matching • Matching/Correspondence/Identity Management/Data Association is a challenging problem in vision/robotics • Need to deal with: occlusions, noisy sensors, anonymized sensing modalities, etc… • [Shin et al, ‘03]

Identity Management [Shin et al., ‘03] Identity Mixing @Tracks 1,2 Track 1 Track 2 Where is Donald Duck?

Identity Management Mixing @Tracks 1,2 Track 1 Mixing @Tracks 1,3 Track 2 Track 3 Track 4 Mixing @Tracks 1,4 Where is ?

Reasoning with Permutations • We model uncertainty in identity management with distributions over permutations [1 3 2 4] means: “Alice is at Track 1, and Bob is at Track 3, and Cathy is at Track 2, and David is at Track 4 with probability 1/10” Identities Track Permutations Probability of each track permutation

Storage Complexity • There are n!permutations! • Graphical models not effective due to mutual exclusivity constraints (“Alice and Bob cannot both be at Track 1 simultaneously”) • One such constraint for each pair of identities x 1,800,000

Thesis Proposal • In this thesis, we tackle: • Compact representations • Efficient inference • Scaling to huge problems in practice

Fourier approximations • Distributions on the real line: • Smooth distributions can be approximated by weighted sums of low-frequency sinusoid basis functions • Commonly studied as Fourier analysis… • Idea: do the same for permutations 0.4 0.3 0.2 0.1 0 -0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5

1st order summaries • An idea: For each (identity j, track i) pair, store marginal probabilitythat j maps to i Identities “David is at Track 4 with probability: =1/10+1/20+1/5=7/20” Track Permutations

1st order summaries • Summarize a distribution using a matrix of 1st order marginals • Requires storing only n2 numbers! • Example: “Bob is at Track 2 with zero probability” 3/10 0 1/2 1/5 1 1/5 1/2 3/10 0 2 Tracks 3/10 1/5 1/10 9/20 3 “Cathy is at Track 3 with probability 1/20” 1/5 3/10 3/20 7/20 4 A B C D Identities

The problem with 1st order • What 1st order summaries can capture: • P(Alice is at Track 1) = 3/5 • P(Bob is at Track 2) = 1/2 • Now suppose: • Tracks 1 and 2 are close, • Alice and Bob are not next to each other • P({Alice,Bob} occupy Tracks{1,2}) = 0 1st order summaries cannot capture higher order dependencies!

2nd order summaries • Idea #2: store marginal probabilities that ordered pairsof identities (k,l) map to pairs of tracks (i,j) Identities “Cathy is Track 3 and David is in Track 4 with zero probability” Track Permutations

2nd order summaries • Can also store summaries for ordered pairs: • 2nd order summary requires O(n4) storage Identities (A,B) (B,A) (A,C) (C,A) “Bob is at Track 1 and Alice is at Track 3 with probability 1/12” 1/6 1/12 1/8 1/12 (1,2) 1/12 1/6 1/12 1/8 (2,1) Tracks 1/12 1/12 1/8 1/24 (1,3) 1/12 1/12 1/24 1/8 (3,1)

Et cetera… • And so forth… we can define: • 3rd-order marginals • 4th-order marginals • … • nth-ordermarginals • (which recovers the original distribution but requires n! numbers) • By the way, the 0th-ordermarginal is the normalization constant (which equals 1) • Fundamental Trade-off: can capture higher-order dependencies at the cost of storing more numbers

The Fourier interpretation • Marginal summaries are connected to Fourier analysis! • Used for multi-object tracking [Kondor et al, ’07] • Simple marginals are “low-frequency”: intuitively, • 1st order marginalsare the lowest frequency responses (except for DC component) • 2nd order marginals contain higher frequencies than 1st order marginals • 3rd order marginals contain still higher frequency information • Note that higher-order marginals can contain lower-order information

Fourier coefficient matrices • Fourier coefficients on permutations are given as a collection of square matrices ordered by “frequency”: • Marginals are constructed by conjugating Fourier coefficient matrices by a (pre-computed) constant matrix: 0th order 1st order 2nd order nth order First two Fourier matrices 1st-order marginals

Thesis Proposal

Hidden Markov Model Inference Mixing model – “e.g., Tracks 2 and 3 swapped identities with probability ½” • Problem statement: For each timestep, find posterior marginals conditioned on all past observations • Need to rewrite all inference operations completely in the Fourier domain Latent permutations Identity observations Observation model – “e.g., see green blob at Track 3”

Hidden Markov model inference • Two basic inference operations for HMMs: • How can we do these operations without enumerating all n! permutations? (Prediction/Rollup) (Conditioning)

Random walk transition model • We assume that t+1 is generated by the rule: • Draw Q() • Set t+1 = t • For example, Q([2 1 3 4])=½ means that Tracks 1 and 2 swapped identities with probability ½ Mixing Model Track 1 Track 2 Track 3 Track 4

Prediction/Rollup • Inputs: • Prior distribution P(t) • Mixing Model Q() • Prediction/Rollup can be written as a convolution: Convolution (Q*Pt)!

Fourier Domain Prediction/Rollup • Convolutions are pointwise products in the Fourier domain: prior distribution Prediction/Rollup does not increase the representation complexity! mixing model

Hidden Markov model inference • Two basic inference operations for HMMs: • How can we do these operations without enumerating all n! permutations? (Prediction/Rollup) (Conditioning)

Conditioning • Bayes rule is a pointwise product of the likelihood functionand prior distribution: • Example likelihood function: • P(z=green | σ(Alice)=Track 1) = 9/10 • (“Prob. we see green at Track 1 given Alice is at Track 1 is 9/10”) Posterior Likelihood Prior Track 1

Conditioning • Conditioning increases the representation complexity! • Example: Suppose we start with 1st order marginals of the prior distribution: • P(Alice is at Track 1 or Track 2)=.9 • P(Bob is at Track 1 or Track 2)=.9 • … • Then we make a 1st order observation: • “Cathy is at Track 1 or Track 2 with probability 1” • (This means that Alice and Bob cannot both be at Tracks 1 and 2!) • P({Alice,Bob} occupy Tracks{1,2})=0 Need to store 2nd-order probabilities after conditioning!

Kronecker Conditioning • Pointwise products correspond to convolution in the Fourier domain [Willsky, ‘78] • (except with Kronecker Products in our case) • Our algorithm handles any priorand any likelihood, generalizing the previous FFT-based conditioning method [Kondor et al., ‘07] Conditioning can increase representation complexity! prior likelihood posterior

Bandlimiting • After conditioning, we discard “high-frequency” coefficients • Equivalently, we maintain low-order marginals • Example: Keep! Discard

Error analysis • Fourier domain Prediction/Rollup is exact • Kronecker Conditioning introduces error • But… • If enough coefficients are maintained, then Kronecker conditioning is exact at a subset of low-frequency terms! Theorem. If KroneckerConditioning is called using rthorder terms of the priorand sthorder terms of the likelihood, then the (|r-s|)th order marginals of the posteriorcan be reconstructed without error.

Kronecker Conditioning experiments Error of Kronecker Conditioning, n=8 (as a function of diffuseness) 1st order 0.1 2nd order unordered 2nd order ordered 0.08 0.06 Better error (averaged over 10 runs) 0.04 0.02 1 L 0 0 5 10 15 # Mixing Events Measured at 1st order marginals (Keeping 3rd order marginals is enough to ensure zero error for 1st order marginals)

Dealing with bandlimiting errors • Consecutive conditioning steps can propagate errors, • (sometimes causing approximate marginals to be negative!) • Our Solution:Project to relaxed Marginal Polytope (space of Fourier coefficients corresponding to nonnegative marginal probabilities) • Projection can be formulated as an efficient Quadratic Program in the Fourier domain Marginal Polytope

Algorithm Summary • Initialize prior Fourier coefficient matrices • For each timestep t = 1,2,…,T • Prediction/Rollup: • For all coefficient matrices • Conditioning • For all pairs of coefficient matrices • Compute and reproject to the orthogonal Fourier basis • Drop high frequency coefficients of • Project to relaxed Marginal polytope using a Quadratic program • Return marginal probabilities for all timesteps Input: Fourier coefficients of mixing and observation models

Mixing and Observation Models • Fourier-theoretic framework can handle a variety of probabilistic models • But… need to be able to efficiently compute Fourier coefficientsfor mixing/observation models... • Useful family of function “primitives”: • Can efficiently Fourier transform the indicator function of subgroups of the form : • Fourier coefficient matrices of Sk-indicators are diagonal, with all nonzero entries equal to k!

1st order observation model for tracking BlobforTrack j If Identity iis on Track j,prob. zj isGaussianwith mean =appearance Color histogram zj appearancemodel forIdentity i Color histogramidentity i If we make one such observation per track, P(z|) is proportional to and can be represented exactly by 1st-order Fourier parameters

Sk-Indicator Primitives • Most mixing/observation models can be written as (sparse) linear combinations of Sk-indicators! Associated subgroup of the Sn Indicator function of ofSkxSn-k is the convolution of indicators of Sk and Sn-k

Simulated data drawn from HMM Projection to the Marginal polytope versus no projection (n=6) 0.12 Without Projection 0.1 0.08 1st order error at 1st order Marginals With Projection Averaged over 250 timesteps Better 0.06 2nd order 3rd order 0.04 0.02 1 L 0 1st order 2nd order 3rd order Approximation by a uniform distribution

Tracking with a camera network • Camera Network data: • 8 cameras, multi-view, occlusion effects • 11 individuals in lab • Identity observations obtained from color histograms • Mixing events declared when people walk close to each other Omniscient tracker 60 Projections are crucial in practice! 50 40 Better % Tracks correctly Identified 30 20 10 0 time-independent classification w/o Projection with Projection

Scaling • For fixed representation depth, Fourier domain inference is polytime: • But complexity can still be bad… 4 Exact inference 3 Better Running time in seconds 2 Can we exploit some other kind of structure in practice?? 3rd order 2nd order 1 1st order 0 4 5 6 7 8 n

Thesis Proposal

Adaptive Identity Management • In practice, it is often sufficient to reason over smaller subgroups of people independently • Groups join when tracks from two groups mix • Groups split when an observation allows us to reason over smaller groupsindependently Idea: adaptivelyfactor problem into subgroups allowing for higher order representations for smaller subgroups “This is Bob” (and Bob was originally in the Blue group)

Problems • If the joint distribution h factors as a product of distributions f and g: Distribution over tracks {1,…,p} Distribution over tracks {p+1,…,n} (Join problem) What are the Fourier coefficients of the joint h given the Fourier coefficients of factors f and g? (Split problem) What are the Fourier coefficients of factors f and g given the Fourier coefficients of the joint h?

First-order Independence • Let f be a distribution on permutations of {1,…,p}, and g be a distribution on permutations of {p+1,…,n} • Join problem for 1st-order marginals: • Given 1st-order marginals of f and g, what does the matrix of 1st-order marginals of h look like? “Cathy is at Track 4 with probability 1/20” “Cathy is at Track 10 with zero probability” 1st-order marginals f h g zeroes

Joining • Joining for higher-order coefficients gives similar block-diagonal structure • Also get Kronecker product structure for each block

Problems • If the joint distribution h factors as a product of distributions f and g: Distribution over tracks {1,…,p} Distribution over tracks {p+1,…,n} (Join problem) What are the Fourier coefficients of the joint h given the Fourier coefficients of factors f and g? (Split problem) What are the Fourier coefficients of factors f and g given the Fourier coefficients of the joint h?

Splitting • We would like to “invert” the Join process: Consider recovering 2nd Fourier block Need to recover A, B from – only possible to do up to scaling factor Our approach:search for blocks of the form: or Theorem: these blocks always exist! (and are efficient to find)

Marginal Preservation • Now we know how to join/split given the Fourier transform of the input distribution • Problem: In practice, never have entire set of Fourier coefficients! • Marginal preservation guarantee: • Conversely, we get a similar guarantee for splitting • (Usually get some higher order information too) • Theorem: Given mth-order marginals for independent factors, then we exactly recover mth-order marginals for the joint distribution.

Detecting Independence • To adaptively split large distributions, need to be able to detect independent subsets. • Recall what first-order independence looks like: Can use (bi)clustering on matrix of marginalsto discover an appropriate ordering! In practice, get unordered identities, tracks… matrix of marginals with appropriate ordering on identities and tracks

First-order independence • First-order condition is insufficient: “Alice guards Bob” “Alice is in yellow team” “Bob is in white team” Tracking yellow and white teams independently ignores the fact that Alice and Bob are always next to each other!

Handling Near-Independence • We only detect at first-order, but: • We can measure departure from independence at higher orders • And even when higher order independence does not hold, we have the following result: • (we get a marginal distribution for white team and a marginal distribution for yellow team) • When first-order independence does not hold, we obtain approximate marginals. • Theorem: If first-order independence holds, we always obtain exact marginals of each subset of tracks.

Probabilistic Reasoning with Permutations

Probabilistic Reasoning with Permutations

Presentation Transcript

Probabilistic reasoning over time

Probabilistic Reasoning; Network-based reasoning

Probabilistic Reasoning

Ch. 14 – Probabilistic Reasoning

Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011

Probabilistic Reasoning

Probabilistic Reasoning over Time

Probabilistic Reasoning over Time

Probabilistic Reasoning System (II)

Probabilistic Reasoning with Uncertain Data

Probabilistic Reasoning With Bayes’ Rule

Probabilistic Reasoning

Probabilistic Reasoning

Probabilistic Reasoning; Network-based reasoning

Artificial Intelligence Probabilistic reasoning

Probabilistic Reasoning with Uncertain Data

Ch. 14 – Probabilistic Reasoning

Probabilistic Reasoning

Probabilistic Reasoning; Network-based reasoning

Reasoning with Uncertainty; Probabilistic Reasoning