Loading in 5 sec....

Fitting Tree Metrics: Hierarchical Clustering and PhylogenyPowerPoint Presentation

Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

- 67 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Fitting Tree Metrics: Hierarchical Clustering and Phylogeny' - amena-hanson

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Fitting Tree Metrics:Hierarchical Clustering and Phylogeny

Nir Ailon Moses Charikar

Princeton University

Data with dissimilarity information

u

- Represented by matrix D
- Complete information

10

D(u,v)=1

y

7

v

6

5

3

2

13

8

5

x

w

(big number = high dissimilarity)

Goal: Fit data to tree structure

- Preserve dissimilarity info

T

- Tree metric dT close to D

v

dT(u,v)

w

y

x

u

Applications

- Evolutionary biology
- Molecular phylogeny:Dissimilarity information from DNA

- Gene expression analysis
- Historical linguistics
- ...

Special case: Ultrametrics

(Hierarchical clustering)

T

, `

y

u

v

M=3

x

w

y

u

v

x

w

dT(v,x)=1

dT(u,w)=3

Equivalently: Two largest distances in every equal

Previous results

- Fitting ultrametrics under ||.|| in P[FKW95]
- Fitting trees under ||.|| APX-Hard[ABFPT99]
- Fitting ultrametrics under ||.||1 APX-Hard[W93] under ||.||2 NP-Hard
- f(n)-approximation algorithm for ultrametrics(3f(n))-approximation algorithm for trees(under any ||.||p) [ABFPT99]

Previous results

- O(min{n1/p, (k logn)1/p})-approx for trees under ||.||p[HKM05]
- Fitting ultrametrics for M=2under||.||1 :
Correlation Clustering[BBC02, CGW03, ACN05..]

- . . .

Our results

- (M+1)– approx for fitting level M ultrametrics under ||.||1
- O)(log n loglog n)1/p)- approx for general weighted trees under||.||p

Reconstructing T from ultrametric D

- Given ultrametricD {1..M}n x n
- Pick pivot vertex u
- Recursively solve for neighbor-classes

M=3

M=2

2

1

u

3

Minimizing ||.||1 for inconsistent D

{1..M}n x n

- Same algorithm!
- Pick pivot vertex u([email protected])
- Freeze distances incident to u

- Fix inter-class distances

2

2

X

3

3

X

- Fix intra-class distances

3

2

1

X

1

- (Total cost contribution: 4)

u

3

- Recurse...

- Lemma: no cancellations
- Theorem: M+1 approximation

Proof idea

w

- violating if:1 > 2¸3
- Optimal solution pays¸1-2
- Algorithm chargingscheme:

2

) 1

1

) 2

v

u

) 2) 1

3

2-3+ 1-2

w

1-2

u

v

chosen as pivot ) charged

LM

...

...

...

L2

L1

y

u

v

x

w

General ultrametrics- D2 R+n £ n
- Fit D to weighted ultrametric

M possible distances:

1 = L1

2 = L1+L2

:

M = L1+ . . . + Lm

Ex: dt(v,w)=L1+L2

LM

xMuy = 0

x2uy = 0

x1uy = 1

...

...

...

L2

L1

y

u

v

x

w

Fitting D to M-level weightedUltrametric under || .||1Linear

[0,1]

relaxation

- Integer program formulation: xtuv {0,1}
- xtuv = 1 u,v separated at level t
- 0 xMuv xM-1uv ... x1uv=1

- - inequality at each levelxtuv xtuw + xtwv
- Cost:min t=1M Lt ( xtuv + (1-xtuv) )

D(u,v) t

D(u,v) > t

Rounding the LP:An O(logn loglogn)-approximation

- A divisive (top-down) algorithm
- At each level t=M, M-1,..., 1:
- Solve a multi-cut-like problem
- Cluster so as to separate u,v ’s s.t.
xtuv¸ 2/3

- Danger: High levels influence low ones!

General ||.||p cost

- Similar analysisgives same bound for
||.||pp

- Therefore:
O( logn loglogn )1/p– approximation

- By [ABFPT99], applies also to fitting trees

Future work

- O( log n) – algorithm? Better?
- Stronger lower bounds
- Derandomize (M+1)-approx algorithm
- Aggregation [ACN05]
- Applications

Thank You !!!

Download Presentation

Connecting to Server..