efficient sketches for earth mover distance with applications
Download
Skip this Video
Download Presentation
Efficient Sketches for Earth-Mover Distance, with Applications

Loading in 2 Seconds...

play fullscreen
1 / 20

Efficient Sketches for Earth-Mover Distance, with Applications - PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on

Efficient Sketches for Earth-Mover Distance, with Applications. David Woodruff IBM Almaden. Joint work with Alexandr Andoni, Khanh Do Ba, and Piotr Indyk. (Planar) Earth-Mover Distance. For multisets A , B of points in [ ∆] 2 , | A |=| B |= N ,

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Efficient Sketches for Earth-Mover Distance, with Applications' - maddy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
efficient sketches for earth mover distance with applications

Efficient Sketches for Earth-Mover Distance, with Applications

David Woodruff

IBM Almaden

Joint work with Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

planar earth mover distance
(Planar) Earth-Mover Distance
  • For multisets A, B of points in [∆]2, |A|=|B|=N,

i.e., min cost of perfect matching between A and B

EMD(, ) = 6 + 3√2

geometric representation of emd
Geometric Representation of EMD
  • Map A, B to k-dimensional vectors F(A), F(B)
    • Image space of F “simple,” e.g., k small
    • Can estimate EMD(A,B) from F(A), F(B) via some efficient recovery algorithm E

2 Rk

F

E

≈ EMD(A,B)

geometric representation of emd motivation
Geometric Representation of EMD: Motivation
  • Visual search and recognition:
    • Approximate nearest neighbor under EMD
      • Reduces to approximate NN under simpler distances
      • Has been applied to fast image search and recognition in large collections of images [Indyk-Thaper’03, Grauman-Darrell’05, Lazebnik-Schmid-Ponce’06]
  • Data streaming computation:
    • Estimating the EMD between two point sets given as a stream
      • Need mapping F to be linear: adding new point a to A translates to adding F(a) to F(A)
      • Important open problem in streaming [“Kanpur List ’06”]
prior and new results
Prior and New Results

Geometric representation of EMD:

Main Theorem

For any ε2(0,1), there exists a distribution over linear mappings F: R∆2!R∆εs.t. for multisets A,Bµ [∆]2 of equal size, we can produce an O(1/ε)-approximation to EMD(A,B) from F(A), F(B) with probability 2/3.

implications
Implications
  • Streaming:
  • Approximate nearest neighbor:

* N = number of points

* s = number of data points (multisets) to preprocess

α>1 free parameter

proof outline
Proof Outline
  • Old [Agarwal-Varadarajan’04, Indyk’07]:
    • Extend EMD to EEMD which:
      • Handles sets of unequal size |A| · |B| in a grid of side-length k
      • EEMD(A,B) = min|S|=|A| andS µ B EMD(A,S) + k¢|B\S|
      • Is induced by a norm ||¢||EEMD, i.e., EEMD(A,B) = ||Â(A) – Â(B)||EEMD, where Â(A)2 R∆2 is the characteristic vector of A
    • Decomposition of EEMD into weighted sum of small EEMD’s
      • O(1/ε) distortion
  • New:
    • Linear sketching of “sum-norms”

EMD over [∆]2

EEMD over [∆ε]2

EEMD over [∆ε]2

EEMD over [∆ε]2

+

+ … +

∆O(1) terms

old idea indyk 07
Old Idea [Indyk ’07]

EEMD over [∆ε]2

EEMD over [∆ε]2

EEMD over [∆ε]2

+

+ … +

∆O(1) terms

EMD over [∆]2

EMD over [∆]2

EEMD over [∆1/2]2

EEMD over [∆1/2]2

+ … +

old idea indyk 071
Old Idea [Indyk ’07]

Solve EEMD in each of ¢ cells,

each a problem in [¢1/2]2

EMD over [∆]2

2

old idea indyk 072
Old Idea [Indyk ’07]

Solve one additional

EEMD problem in [¢1/2]2

2

Should also scale edge

lengths by ¢1/2

old idea indyk 073
Old Idea [Indyk ’07]
  • Total cost is the sum of the two phases
  • Algorithm outputs a matching, so its cost is at least the EMD cost
  • Indyk shows that if we put a random shift of the [¢1/2]2 grid on top of the [¢]2 grid,algorithm’s cost is at most a constant factor times the true EMD cost
  • Recursive application gives multiple [¢ε]2 grids on top of each other, and results in O(1/ε)-approximation
main new technical theorem
Main New Technical Theorem

||M||1, X =

+

+ … +

For normed space X = (Rt, ||¢||X) and M2Xn, denote ||M||1,X = ∑i ||Mi||X.

||M1||X

||M2||X

||Mn||X

Given C > 0 and λ > 0, if C/λ· ||M||1, X· C, there is a distribution over linear mappings

μ: Xn!X(λlog n)O(1)

such that we can produce an O(1)-approximation to ||M||1,X from μ(M) w.h.p.

proof outline sum of norms
Proof Outline: Sum of Norms
  • First attempt:
    • Sample (uniformly) a few Mi’s to compute ||Mi||X
    • Problem: sum could be concentrated in 1 block
  • Second attempt:
    • Sample Mi w/probability proportional to ||Mi||X [Indyk’07]
    • Problem: how to do online?
    • Techniques from [JW09, MW10]?
      • Need to sample/retrieve blocks, not just individual coordinates

M2 contains most of mass

M1

M2

M3

Mn

proof outline sum of norms cont
Proof Outline: Sum of Norms (cont.)

M = (M1,

M2,

…,

Mn)

M2

S11

  • Our approach:
    • Split into exponential levels:
      • Assume ||M||1, X· C
      • Sk = {i2[n] s.t. ||Mi||X2(Tk, 2Tk]}, Tk=C/2k
      • Suffices to estimate |Sk| for each level k. How?
    • For each level k, subsample from [n] at a rate

such that event Ek (“isolation” of level k)

holds with probability proportional to |Sk|

    • Repeat experiment several times, count number of successes

M4, M7

S2

S3

M1, M3, M8, M9

Sℓ

M5, M10, Mn

M:

Subsample:

Ek?

Y

N

proof outline event e k
Proof Outline: Event Ek
  • Ek$ “isolation” of level k:
    • Exactly one i 2Sk gets subsampled
    • Nothing from Sk’ for k’<k
  • Verification of trial success/failure
    • Hash subsampled elements
      • Each cell maintains vector sum of

subsampled Mi’s that hash there

    • Ek holds roughly (we “accept”) when:
      • 1 cell has X-norm in (0.9Tk, 2.1Tk]
      • All other cells have X-norm ≤ 0.9Tk
    • Check fails only if:
      • Elements from lighter levels contribute a lot to 1 cell
      • Elements from heavier levels subsampled and collide
    • Both unlikely if hash table big enough
    • Under-estimates |Sk|. If |Sk| > 2k/polylog(n), gives O(1)-approximation
    • Remark: triangle inequality of norm gives control over impact of collisions

Subsample:

M1

M4

M5

M6

M9

M11

Mn–1

sketch and recovery algorithm
Sketch and Recovery Algorithm

Sketch:

  • For every k, the estimator under-estimates |Sk|
  • If |Sk| > 2k/polylog n, the estimator is (|Sk|)
  • For each level k, create t hash tables
  • For each hash table:
    • Subsample from [n], including each i2[n] w.p. pk = 2-k
    • Each cell maintains sum of Mi’s that hash to it

Recovery algorithm:

  • For each level k, count number ck of “accepting” hash tables
  • Return ∑kTk · (ck/t) · (1/pk)

{

emd wrapup
EMD Wrapup
  • We achieve a linear embedding of EMD
    • with constant distortion, namely O(1/ε),
    • into a space of strongly sublinear dimension, namely ∆ε.
  • Open problems:
    • Getting (1+ε)-approximation / proving impossibility
    • Reducing dimension to logO(1)∆ / proving lower bound
what we did
What We Did
  • We showed that in a data stream, one can sketch ||M||1,X = ∑i ||Mi||X with space about the space complexity of computing (or sketching) ||¢||X
  • This quantity is known as a cascaded norm, written as L1(X)
  • Cascaded norms have many applications [CM, JW]
  • Can we generalize this? E.g., what about L2(X), i.e., (∑i ||Mi||2X )1/2
cascaded norms jw09
Cascaded Norms [JW09]
  • No!
  • L2(L1), i.e., (∑i ||Mi||21)1/2, requires (n1/2) space, where n is the number of different i, but sketching complexity of L1 is O(log n)
  • More generally, for p ¸ 1, Lp(L1), i.e., (∑i ||Mi||p 1)1/p is £(n1-1/p) space
  • So, L1(X) is very special
ad