Topics in algorithms 2007
Download
1 / 15

Topics in Algorithms 2007 - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Topics in Algorithms 2007. Ramesh Hariharan. Random Projections. Solving High Dimensional Problems. How do we make the problem smaller? Sampling, Divide and Conquer, what else?. Projections.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Topics in Algorithms 2007' - liberty-goodman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Topics in algorithms 2007

Topics in Algorithms 2007

Ramesh Hariharan



Solving high dimensional problems
Solving High Dimensional Problems

  • How do we make the problem smaller?

  • Sampling, Divide and Conquer, what else?


Projections
Projections

  • Can n points in m dimensions be projected to d<<m dimensions while maintaining geometry (pairwise distances)?

  • Johnson-Lindenstrauss: YES

    for d ~ log n/ε2 , each distance stretches/contracts by only an O(ε) factor

    So an algorithm with running time f(n,d) now takes f(n,log n) and results don’t change very much (hopefully!)


Which d dimensions
Which d dimensions?

  • Any d coordinates?

  • Random d coordinates?

  • Random d dimensional subspace


Random subspaces
Random Subspaces

  • How is this defined/chosen computationally?

  • How do we choose a random line (1-d subspace)

  • We need to choose m coordinates

  • Normals to the rescue

    Choose independent random variables X1…Xm each N(0,1)


Why do normals work
Why do Normals work?

  • Take 2d: Which points on the circle are more likely?

    e-x2/2 dx X e-y2/2 dy

dx

dy


A random d dim subspace
A random d-dim subspace

  • How do we extend to d dimensions?

  • Choose d random vectors

    Choose independent random variables Xi1…Xim each N(0,1), i=1..d


Distance preservation
Distance preservation

  • There are nC2 distances

  • What happens to each after projection?

  • What happens to one after projection; consider single unit vector along x axis

  • Length of projection sqrt[(X11/l1 )2+ … + (Xd1/ld)2]


Orthogonality
Orthogonality

  • Not exactly

  • The random vectors aren’t orthogonal

  • How far away from orthogonal are they?

  • What is the expected dot product? 0! (by linearity of expectation)


Assume orthogonality
Assume Orthogonality

  • Lets assume orthogonality for the moment

  • How do we determine bounds on the distribution of the projection length

    sqrt[(X11/l1 )2+ … + (Xd1/ld)2]

  • Expected value of [(X11)2+ … + (Xd1)2] is d (by linearity of expectation)

  • Expected value of each li2 is n (by linearity of expectation)

  • Roughly speaking, overall expectation is sqrt(d/n)

  • A distance scales by sqrt(d/n) after projection in the “expected” sense; how much does it depart from this value?


What does expectation give us
What does Expectation give us

  • But E(A/B) != E(A)/E(B)

  • And even if it were, the distribution need not be tight around the expectation

  • How do we determine tightness of a distribution?

  • Tail Bounds for sums of independent random variables; summing gives concentration


Tail bounds
Tail Bounds

  • P(|Σk Xi2– k| > εk) < 2e-Θ(ε2k)

    sqrt[(X11/l1 )2+ … + (Xd1/ld)2]

  • Each li2 is within (1 +/- ε)n with probability inverse exponential in n

  • [(X11)2+ … + (Xd1)2] is within (1 +/- ε)d with probability inverse exponential in ε2d

  • sqrt[(X11/l1 )2+ … + (Xd1/ld)2] is within (1 +/- O(ε)) sqrt(d/n) with probability inverse exponential in ε2d (by the union bound)


One distance to many distances
One distance to many distances

  • So one distance D has length D (1 +/- O(ε)) sqrt(d/n) after projection with probability inverse exponential in ε2d

  • How about many distances (could some of them go astray?)

  • There are nC2 distances

  • Each has inverse exponential in d probability of failure, i.e., stretching/compressing beyond D (1 +/- O(ε)) sqrt(d/n)

  • What is the probability of no failure? Choose d appropriately (union bound again)


Orthogonality1
Orthogonality

  • How do you fix this?

  • Exercise….