Topics in Algorithms 2007

Topics in Algorithms 2007 Ramesh Hariharan

Random Projections

Solving High Dimensional Problems • How do we make the problem smaller? • Sampling, Divide and Conquer, what else?

Projections • Can n points in m dimensions be projected to d<<m dimensions while maintaining geometry (pairwise distances)? • Johnson-Lindenstrauss: YES for d ~ log n/ε2 , each distance stretches/contracts by only an O(ε) factor So an algorithm with running time f(n,d) now takes f(n,log n) and results don’t change very much (hopefully!)

Which d dimensions? • Any d coordinates? • Random d coordinates? • Random d dimensional subspace

Random Subspaces • How is this defined/chosen computationally? • How do we choose a random line (1-d subspace) • We need to choose m coordinates • Normals to the rescue Choose independent random variables X1…Xm each N(0,1)

Why do Normals work? • Take 2d: Which points on the circle are more likely? e-x2/2 dx X e-y2/2 dy dx dy

A random d-dim subspace • How do we extend to d dimensions? • Choose d random vectors Choose independent random variables Xi1…Xim each N(0,1), i=1..d

Distance preservation • There are nC2 distances • What happens to each after projection? • What happens to one after projection; consider single unit vector along x axis • Length of projection sqrt[(X11/l1 )2+ … + (Xd1/ld)2]

Orthogonality • Not exactly • The random vectors aren’t orthogonal • How far away from orthogonal are they? • What is the expected dot product? 0! (by linearity of expectation)

Assume Orthogonality • Lets assume orthogonality for the moment • How do we determine bounds on the distribution of the projection length sqrt[(X11/l1 )2+ … + (Xd1/ld)2] • Expected value of [(X11)2+ … + (Xd1)2] is d (by linearity of expectation) • Expected value of each li2 is n (by linearity of expectation) • Roughly speaking, overall expectation is sqrt(d/n) • A distance scales by sqrt(d/n) after projection in the “expected” sense; how much does it depart from this value?

What does Expectation give us • But E(A/B) != E(A)/E(B) • And even if it were, the distribution need not be tight around the expectation • How do we determine tightness of a distribution? • Tail Bounds for sums of independent random variables; summing gives concentration

Tail Bounds • P(|Σk Xi2– k| > εk) < 2e-Θ(ε2k) sqrt[(X11/l1 )2+ … + (Xd1/ld)2] • Each li2 is within (1 +/- ε)n with probability inverse exponential in n • [(X11)2+ … + (Xd1)2] is within (1 +/- ε)d with probability inverse exponential in ε2d • sqrt[(X11/l1 )2+ … + (Xd1/ld)2] is within (1 +/- O(ε)) sqrt(d/n) with probability inverse exponential in ε2d (by the union bound)

One distance to many distances • So one distance D has length D (1 +/- O(ε)) sqrt(d/n) after projection with probability inverse exponential in ε2d • How about many distances (could some of them go astray?) • There are nC2 distances • Each has inverse exponential in d probability of failure, i.e., stretching/compressing beyond D (1 +/- O(ε)) sqrt(d/n) • What is the probability of no failure? Choose d appropriately (union bound again)

Orthogonality • How do you fix this? • Exercise….

Topics in Algorithms 2007

Topics in Algorithms 2007

Presentation Transcript

Selected Topics: External Sorting, Join Algorithms, …

Topics covered in Hill and Tiedeman, 2007 (covered in class)

Selected Topics in Evolutionary Algorithms II

AAE637 Topics for May 3, 2007

Fill-in Topics for Windows HiEd Conference 2007

Advanced Topics in Evolutionary Algorithms

Advanced Topics in Evolutionary Algorithms

CSC 5160 - Topics in Algorithms: Combinatorial Optimization and Approximation Algorithms

Parallel Algorithms and Computing Selected topics

Topics in Algorithms 2005 Linear Programming and Duality

AAE637 Topics for May 8, 2007

AAE637 Topics for May 10, 2007

Topics in Algorithms 2005 Edge Splitting, Packing Arborescences

Topics for Today - February 8, 2007

Macro Topics in Development and Transition 2006-2007

AAE637 Topics for March 22, 2007

CSC5160 Topics in Algorithms Tutorial 1

Two Topics in Adaptive Algorithms: Hulls and Strings

Topics in Algorithms

Topics in Algorithms 2007

AAE637 Topics for Feb. 20, 2007

Topics in Toxicology 2007