Algorithmic Aspects of Finite Metric Spaces

1 / 41

# Algorithmic Aspects of Finite Metric Spaces - PowerPoint PPT Presentation

Algorithmic Aspects of Finite Metric Spaces. Moses Charikar Princeton University. x. z. y. Metric Space. A set of points X Distance function d(x,y) d : X [0…) d(x,y) = 0 iff x=y d(x,y) = d(y,x) Symmetric d(x,z) ≤ d(x,y) + d(y,z) Triangle inequality

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Algorithmic Aspects of Finite Metric Spaces' - Antony

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Algorithmic Aspects of Finite Metric Spaces

Moses Charikar

Princeton University

x

z

y

Metric Space
• A set of points X
• Distance function d(x,y)d : X [0…)
• d(x,y) = 0 iff x=y
• d(x,y) = d(y,x) Symmetric
• d(x,z) ≤ d(x,y) + d(y,z)Triangle inequality
• Metric space M(X,d)
Example Metrics: Normed spaces
• x = (x1, x2, …, xd) y = (y1, y2, …, yd)
• ℓpnorm ℓ1ℓ2(Euclidean)ℓ
• ℓpd : ℓpnorm in Rd
• Hamming cube {0,1}d
Example Metrics: domain specific
• Shortest path distances on graph
• Symmetric difference on sets
• Edit distance on strings
• Hausdorff distance, Earth Mover Distance on sets of n points
Metric Embeddings
• General idea: Map complex metrics to simple metrics
• Why ? richer algorithmic toolkit for simple metrics
• Simple metrics
• normed spaces ℓp
• low dimensional normed spaces ℓpd
• tree metrics
• Mapping should not change distances much (low distortion)
Low Distortion Embeddings

f

• Metric spaces (X1,d1) & (X2,d2),embedding f: X1 X2 has distortion D if ratio of distances changes by ≤ D x,y  X1:

http://humanities.ucsd.edu/courses/kuchtahum4/pix/earth.jpg

http://www.physast.uga.edu/~jss/1010/ch10/earth.jpg

Applications
• High dimensional  Low dimensional(Dimension reduction)
• Algorithmic efficiency (running time)
• Compact representation (storage space)
• Streaming algorithms
• Specific metrics  normed spaces
• Nearest neighbor search
• Optimization problems
• General metrics  tree metrics
• Optimization problems, online algorithms

Solve problems on very large data sets in one pass using a very small amount of storage

A (very) Brief History:fundamental results
• Metric spaces studied in functional analysis
• n point metric embeds into ℓnwithno distortion[Frechet]
• n point metric embeds into ℓp with distortion log n [Bourgain ’85]
• Dimension reduction fornpoint Euclidean metric with distortion 1+ε[Johnson, Lindenstrauss ’84]
A (very) Brief History:applications in Computer Science
• Optimization problems
• Application to graph partitioning [Linial, London, Rabinovich ‘95][Arora, Rao, Vazirani ’04]
• n point metrics into tree metrics[Bartal ’96 ‘98] [FRT ’03]
• Efficient algorithms
• Dimension reduction
• Nearest neighbor search, Streaming algorithms
Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

metric as model

Disclaimer
• This is not an attempt at a survey
• Biased by my own interests
• Much more relevant and related work than I can do justice do in limited time.
• Goal: Give glimpse of different applications of finite metric spaces
• Core ideas, no messy details
Disclaimer: Community Bias
• Theoretical viewpoint
• Focus on algorithmic techniques with performance guarantees
• Worst case guarantees
Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

metric as model

Metric as data
• What is the data ?
• Mathematical representation of objects (e.g. documents, images, customer profiles, queries).
• Sets, vectors, points in Euclidean space, points in a metric space, vertices of a graph.
• Metric is part of data
Johnson Lindenstrauss [JL84]
• n points in Euclidean space (ℓ2 norm) can be mapped down to O((log n)/2) dimensions with distortion at most 1+.
• Quite simple [JL84, FM88, IM98, AV99, DG99, Ach01]
• Project onto random unit vectors
• projection of (u-v) onto one random vector behaves like Gaussian scaled by ||u-v||2
• Need log n dimensionsfor tight concentration bounds
• Even a random {-1,+1} vector works…
Dimension reduction for ℓ2
• Two interesting properties:
• Linear mapping
• Oblivious – choice of linear mapping does not depend on point set
• Many applications …
• Making high dimensional problems tractable
• Streaming algorithms
• Learning mixtures of gaussians [Dasgupta ’99]
• Learning robust concepts [Arriaga,Vempala ’99] [Klivans,Servedio ’04]
Dimension reduction for ℓ1
• [C,Sahai ‘02]Linear embeddings are not good for dimension reduction in ℓ1
• There exist n points in ℓ1in n dimensions, such that any linear mapping with distortion  needs n/2dimensions
Dimension reduction for ℓ1
• [C, Brinkman ‘03]Strong lower boundsfor dimension reduction in ℓ1
• There exist n points in ℓ1, such that anyembedding with constant distortion  needs n1/2dimensions
• Alternate, simpler proof [Lee, Naor ’03]
Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

Solve problems on very large data sets in one pass using a very small amount of storage

metric as model

Frequency Moments

[Alon,Matias,Szegedy ‘99]

Data stream is sequence of elements in [n]

ni : frequency of element i

Fk =nik: kth frequency moment

F0 = number of distinct elements

F2= skewness measure of data stream

Goal:

Given a data stream, estimate Fk in one pass and sub-linear space

Estimating F2
• Consider a single counter c and randomly chosen xi{ +1, -1} for each i in [n]
• On seeing each element i, update c += xi
• c =  ni •xi
• Claim: E[c2] = ni2= F2Var[c2]  2(F2)2(4-wise independence)
• Average 1/2 copies of this estimator to get (1+) approximation
Differences between data streams
• ni : frequency of element i in stream 1
• mi : frequency of element i in stream 2
• Goal: measure  (ni –mi)2
• F2 sketches are additive ni •xi -  mi •xi =  (ni –mi)•xi
• Basically, dimension reduction in ℓ2 norm
• Very useful primitivee.g. frequent items [C, Chen, Farach-Colton ’02]
Estimate ℓ1 norms ?

[Indyk ’00]

• p-stable distribution:Distribution over R such that ni •xi distributed as ( |ni|p)1/p X
• Cauchy distribution: c(x)=1/(1+x2) 1-stable
• Gaussian distribution 2-stable
• As before, c =  ni •xi
• Cauchy does not have finite expectation !
• Estimate scale factor by taking median
Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

metric as model

Similarity Preserving Hash Functions
• Similarity function sim(x,y)
• Family of hash functions F with probability distribution such that
Applications
• Compact representation scheme for estimating similarity
• Approximate nearest neighbor search [Indyk,Motwani ’98] [Kushilevitz,Ostrovsky,Rabani ‘98]
Estimating Set Similarity

[Broder,Manasse,Glassman,Zweig,’97]

[Broder,C,Frieze,Mitzenmacher,’98]

• Collection of subsets
Existence of SPH schemes [C ’02]
• sim(x,y) admits an SPH scheme if family of hash functions F such that

Theorem: If sim(x,y) admits an SPH scheme then 1-sim(x,y) satisfies triangle inequality; embeds into ℓ1

• Rounding procedures for LPs and SDPs yield similarity and distance preserving hashing schemes.
Earth Mover Distance (EMD)

LP Rounding algorithms for optimization problem (metric labelling) yield log n approximate estimator for EMD on n points.

Implies that EMD embeds into ℓ1 with distortion log n

P

Q

EMD(P,Q)

Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

metric as model

U

V

Graph partitioning problems
• Given graph, partition into U,V
• Maximum cut maximize |E(U,V)|
• Sparsest cut

minimize

Mr. Rumsfeld

his

The

secretary

he

Correlation clustering

[Cohen,Richman,’02][Bansal,Blum,Chawla,’02]

Similar (+)

Dissimilar (-)

example courtesy Shuchi Chawla

Mr. Rumsfeld

his

The

secretary

he

U

V

1

0

0

Graph partitioning as metric problem
• Partitioning is equivalent to finding appropriate {0,1} metric
• Objective function linear in metric
• Find best {0,1} metric

cut metric

relaxation

Metric relaxation approaches
• Max Cut [Goemans,Williamson ’94]
• map vertices to points on unit sphere (SDP)
• exploit geometry to get good solution(random hyperplane cut)
• Sparsest Cut [Linial,London,Rabinovich ’95]
• LP gives best metric; need ℓ1 metric
• [Bourgain ’84] embeds any metric into ℓ1 with distortion log n
• Existential theorem can be made algorithmic
• log n approximation
• recent SDP based log n approximation[Arora,Rao,Vazirani ’04]
Metric relaxation approaches
• Correlation clustering [C,Guruswami,Wirth,’03] [Emanuel,Fiat,’03] [Immorlica,Karger,’03]
• Find best [0,1] metric from similarity/dissimilarity data via LP
• Use metric to guide clustering
• close points in same cluster
• distant points in different clusters
• “Learning” best metric ?
• Note: In many cases, LP/SDP can be eliminated to yield efficient algorithms
Outline

metric as data

• Dimension reduction
• Streaming data model
• Compact representation
• Finite metrics in optimization
• graph partitioning and clustering

Embedding theorems for finite metrics

metric as model

Some connections to learning
• Dimension reduction in ℓ2 :
• Learning mixtures of Gaussians [Dasgupta ’99]Random projections make skewed gaussians more spherical, making learning easier
• Learning with large margin[Arriaga,Vempala ’99] [Klivans,Servedio ’04]Random projections preserve margin,large margin  few dimensions
• Kernel methods for SVMs
• mappings to ℓ2
Ongoing developments
• Notion of intrinsic dimensionality of metric space[Gupta,Krauthgamer,Lee,’03][Krauthgamer,Lee,Mendel,Naor,’04]
• Doubling dimension: How many balls of radius R needed to cover ball of radius 2R ?
• Complexity measure of metric space
• natural parameter for embeddings
• Open: Can every metric of constant doubling dimension in ℓ2 be embedded into ℓ2 with O(1) dimensions and O(1) distortion ?
• Not true for ℓ1
• related to learning low dimension manifolds, PCA, MDS, LLE, Isomap
Some things I didn’t mention
• Approximating general metrics via tree metrics
• modified notion of distortion
• useful for approximation, online algorithms
• Many mathematically appealing questions
• Embeddings between normed spaces
• Spectral methods for approximating matrices (SVD, LSI)
• PCA, MDS, LLE, Isomap
Conclusions
• Whirlwind tour of finite metrics
• Rich algorithmic toolkit for finite metric spaces
• Synergy between Computer Science and Mathematics
• Exciting area of active research
• range from practical applications to deep theoretical questions
• Many more applications to be discovered