460 likes | 490 Views
This study focuses on estimating statistical dependency in high-dimensional data using factorization tests and significance measures. It explores methods to analyze data dependencies without strong modeling assumptions. The research presents experiments and conclusions on measuring significance in high-dimensional datasets.
E N D
Estimating Dependency and Significance for High-Dimensional Data Michael R. Siracusa* Kinh Tieu*, Alexander T. Ihler §, John W. Fisher *§, Alan S. Willsky § * Computer Science and Artificial Intelligence Laboratory § Laboratory for Information and Decision Systems
Premise : In many high-dimensional data sources, statistical dependency can be well explained by a lower dimensional latent variable: • Intuition: The complexity of the problem is influenced more by the the hypothesis rather than the data. • How do we estimate the dependency? • From a single realization? • How do we avoid strong modeling assumptions? • How do we estimate significance?
Dependency Structure(Graphical Model) Parameterization(Nuisance)
VS Dependence: An example
Asymptotics Statistical Dependence Model Differences Model Differences Statistical Dependence Independent vs Some Dependency: 1. : data is independent 2. We don’t have the true distributions 3. We are only give a single realization
Factorization Test (cont) • Questions: • How do we obtain samples under each factorization? • How do we estimate D(||) when x is high dimensional? • How do we estimate significance?
Drawing Samples From a single realization • Only have 1 realization to estimate the joint But, • Can obtain N! sample draws from H0 permutations
High Dimensional Data VS From the Data Processing Inequality:
High Dimensional Data (cont) Sufficiency: For High dimensional data Maximize left side of bound • Gaussian w/ Linear Projections • Close form solution (Eigenvalue problem): Kullback 68 • Nonparametric • Gradient descent : Ihler and Fisher 03
Swiss Roll PCA 2D Projection MaxKL 2D Optimization 3D Data
Measuring significance p-value
Synthetic data Noise in High Dim Space High Dim Obs Distracter Low Dim Latent Var Dependency via M: Controls that number of dimensions dependency info is uniformly distributed over D: Controls the total dimensionality of our K observations
Experiments • 100 Trial w/ Samples of Dependent Data • 100 Trials w/ Samples of Independent Data • Each trial gives a statistic and significance p-value
Conclusions • We presented a method for estimating statistical dependency across high-dimensional measurements via factorization tests. • Exploited a bound on lower dimensional projections. • We made use of permutations for drawing from the alternate hypothesis given a single realization. • We also made use of permutations to get reliable significance estimates. • This was done using a small number of samples relative to the dimensionality of the data • Finally we presented some brief analysis on synthetic and real data.
Thank You Questions?
Problem Statement Given N i.i.d. observations for K sources Determine if the K sources are independent or not: • Obtain a dependency measure • Estimate the significance of this measurement
Hypothesis Test Two Hypotheses: Assuming we know the distributions: Given N i.i.d. observations:
Factorization Test Two Factorizations: But we don’t we know the distributions: Our best approximation (like GLR): Notation Simplification:
Factorization Test (cont) True Joint Dist Est Joint True Independent Dist Est Prod Est Joint True Independent Dist Est Prod True Independent Dist
Applications • What Vision Problems Can We Solve w/ Accurate Measures of Dependency? • Data Association, Correspondence • Feature Selection • Learning Structure • We will specifically discuss: • Correspondence (for multi-camera tracking) • Audio-visual Association
Audio-Visual Association • Useful For: • Speaker Localization • - Help improve Human-Computer Interaction • - Help Source Separation • Automatic Transcription of Archival Video • - Who is speaking? • - Are they seen by the camera?
VS Hypotheses Camera X Camera Y
Distributions of Transition Times Transition time
Discussion and Future Work • Dependence underlies various vision related problems. • We studied a framework for measuring dependence. • Measure significance (how confident are you) • Make it more robust.
For 2 variable case Math (oh no!)
Outline • Applications: (for computer vision) • Problem Formulation: (Hypothesis Testing) • Computation: (Non-parametric entropy estimation) • Curse of Dimensionality: (Informative Statistics) • Correspondence: (Markov Chain Monte Carlo)
Previous Talks • Greg: Model dependence between features and class • Kristen: Model dependence between features and a scene Ariadna: Model dependency between intra-class features • Wanmei: Dependency between protocol signal and voxel response • Chris: Audio and video dependence with events • Antonio: Contextual Dependence • Corey: “Inferring Dependencies”