Loading in 5 sec....

Fitting Tree Metrics: Hierarchical Clustering and PhylogenyPowerPoint Presentation

Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

Download Presentation

Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

Loading in 2 Seconds...

- 59 Views
- Uploaded on
- Presentation posted in: General

Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Fitting Tree Metrics:Hierarchical Clustering and Phylogeny

Nir AilonMoses Charikar

Princeton University

u

- Represented by matrix D
- Complete information

10

D(u,v)=1

y

7

v

6

5

3

2

13

8

5

x

w

(big number = high dissimilarity)

- Preserve dissimilarity info

T

- Tree metric dT close to D

v

dT(u,v)

w

y

x

u

Minimize:

cost(T) = || D – dT||p

n

( )-dimensional real vectors

2

- Evolutionary biology
- Molecular phylogeny:Dissimilarity information from DNA

- Gene expression analysis
- Historical linguistics
- ...

(Hierarchical clustering)

T

,`

y

u

v

M=3

x

w

y

u

v

x

w

dT(v,x)=1

dT(u,w)=3

Equivalently: Two largest distances in every equal

- Fitting ultrametrics under ||.|| in P[FKW95]
- Fitting trees under ||.|| APX-Hard[ABFPT99]
- Fitting ultrametrics under ||.||1 APX-Hard[W93] under ||.||2 NP-Hard
- f(n)-approximation algorithm for ultrametrics(3f(n))-approximation algorithm for trees(under any ||.||p) [ABFPT99]

- O(min{n1/p, (k logn)1/p})-approx for trees under ||.||p[HKM05]
- Fitting ultrametrics for M=2under||.||1 :
Correlation Clustering[BBC02, CGW03, ACN05..]

- . . .

- (M+1)– approx for fitting level M ultrametrics under ||.||1
- O)(log n loglog n)1/p)- approx for general weighted trees under||.||p

- Given ultrametricD {1..M}n x n
- Pick pivot vertex u
- Recursively solve for neighbor-classes

M=3

M=2

2

1

u

3

{1..M}n x n

- Same algorithm!
- Pick pivot vertex u(uniformly@random)
- Freeze distances incident to u

- Fix inter-class distances

2

2

X

3

3

X

- Fix intra-class distances

3

2

1

X

1

- (Total cost contribution: 4)

u

3

- Recurse...

- Lemma: no cancellations
- Theorem: M+1 approximation

w

- violating if:1 > 2¸3
- Optimal solution pays¸1-2
- Algorithm chargingscheme:

2

) 1

1

) 2

v

u

) 2) 1

3

2-3+ 1-2

w

1-2

u

v

chosen as pivot ) charged

T

LM

...

...

...

L2

L1

y

u

v

x

w

- D2 R+n £ n
- Fit D to weighted ultrametric

M possible distances:

1 = L1

2 = L1+L2

:

M = L1+ . . . + Lm

Ex: dt(v,w)=L1+L2

T

LM

xMuy = 0

x2uy = 0

x1uy = 1

...

...

...

L2

L1

y

u

v

x

w

Linear

[0,1]

relaxation

- Integer program formulation: xtuv {0,1}
- xtuv = 1 u,v separated at level t
- 0 xMuv xM-1uv ... x1uv=1

- - inequality at each levelxtuv xtuw + xtwv
- Cost:min t=1M Lt ( xtuv + (1-xtuv) )

D(u,v) t

D(u,v) > t

- A divisive (top-down) algorithm
- At each level t=M, M-1,..., 1:
- Solve a multi-cut-like problem
- Cluster so as to separate u,v ’s s.t.
xtuv¸ 2/3

- Danger: High levels influence low ones!

- Similar analysisgives same bound for
||.||pp

- Therefore:
O( logn loglogn )1/p– approximation

- By [ABFPT99], applies also to fitting trees

- O( log n) – algorithm? Better?
- Stronger lower bounds
- Derandomize (M+1)-approx algorithm
- Aggregation [ACN05]
- Applications

Thank You !!!