150 likes | 275 Views
This paper explores the use of random projections to tackle high-dimensional data problems. By leveraging concepts like the Johnson-Lindenstrauss lemma, it shows how to effectively reduce dimensions while preserving essential geometric properties, such as pairwise distances. Various methods are examined, including sampling and the usage of random subspaces. Orthogonality of random vectors, distance preservation, and concentration of measure are discussed, providing insights into how random projections can maintain the integrity of data in reduced spaces, essential for improving algorithm efficiency in machine learning tasks.
E N D
Topics in Algorithms 2007 Ramesh Hariharan
Solving High Dimensional Problems • How do we make the problem smaller? • Sampling, Divide and Conquer, what else?
Projections • Can n points in m dimensions be projected to d<<m dimensions while maintaining geometry (pairwise distances)? • Johnson-Lindenstrauss: YES for d ~ log n/ε2 , each distance stretches/contracts by only an O(ε) factor So an algorithm with running time f(n,d) now takes f(n,log n) and results don’t change very much (hopefully!)
Which d dimensions? • Any d coordinates? • Random d coordinates? • Random d dimensional subspace
Random Subspaces • How is this defined/chosen computationally? • How do we choose a random line (1-d subspace) • We need to choose m coordinates • Normals to the rescue Choose independent random variables X1…Xm each N(0,1)
Why do Normals work? • Take 2d: Which points on the circle are more likely? e-x2/2 dx X e-y2/2 dy dx dy
A random d-dim subspace • How do we extend to d dimensions? • Choose d random vectors Choose independent random variables Xi1…Xim each N(0,1), i=1..d
Distance preservation • There are nC2 distances • What happens to each after projection? • What happens to one after projection; consider single unit vector along x axis • Length of projection sqrt[(X11/l1 )2+ … + (Xd1/ld)2]
Orthogonality • Not exactly • The random vectors aren’t orthogonal • How far away from orthogonal are they? • What is the expected dot product? 0! (by linearity of expectation)
Assume Orthogonality • Lets assume orthogonality for the moment • How do we determine bounds on the distribution of the projection length sqrt[(X11/l1 )2+ … + (Xd1/ld)2] • Expected value of [(X11)2+ … + (Xd1)2] is d (by linearity of expectation) • Expected value of each li2 is n (by linearity of expectation) • Roughly speaking, overall expectation is sqrt(d/n) • A distance scales by sqrt(d/n) after projection in the “expected” sense; how much does it depart from this value?
What does Expectation give us • But E(A/B) != E(A)/E(B) • And even if it were, the distribution need not be tight around the expectation • How do we determine tightness of a distribution? • Tail Bounds for sums of independent random variables; summing gives concentration
Tail Bounds • P(|Σk Xi2– k| > εk) < 2e-Θ(ε2k) sqrt[(X11/l1 )2+ … + (Xd1/ld)2] • Each li2 is within (1 +/- ε)n with probability inverse exponential in n • [(X11)2+ … + (Xd1)2] is within (1 +/- ε)d with probability inverse exponential in ε2d • sqrt[(X11/l1 )2+ … + (Xd1/ld)2] is within (1 +/- O(ε)) sqrt(d/n) with probability inverse exponential in ε2d (by the union bound)
One distance to many distances • So one distance D has length D (1 +/- O(ε)) sqrt(d/n) after projection with probability inverse exponential in ε2d • How about many distances (could some of them go astray?) • There are nC2 distances • Each has inverse exponential in d probability of failure, i.e., stretching/compressing beyond D (1 +/- O(ε)) sqrt(d/n) • What is the probability of no failure? Choose d appropriately (union bound again)
Orthogonality • How do you fix this? • Exercise….