The population haplotyping problem
This presentation is the property of its rightful owner.
Sponsored Links
1 / 222

The Population Haplotyping problem PowerPoint PPT Presentation


  • 33 Views
  • Uploaded on
  • Presentation posted in: General

The Population Haplotyping problem. NOTATION : each SNP only two values in a population ( bio ). Call them 0 and 1 . Also , call * the fact that a site is heterozygous. HAPLOTYPE : string over 0, 1 GENOTYPE : string over 0, 1, *

Download Presentation

The Population Haplotyping problem

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The population haplotyping problem

The PopulationHaplotyping problem


The population haplotyping problem

NOTATION: each SNP onlytwovalues in a population (bio).

Call them0and 1. Also, call *the factthat a site isheterozygous

HAPLOTYPE: string over 0, 1

GENOTYPE: string over 0, 1, *

where0={0}, 1={1}, *={0,1}

01

0*

00

10

1*

11

11

11

11

01

**

10

10

*0

00

10

10

10

*0

10

00


The population haplotyping problem

ALGEBRA OF HAPLOTYPES:

0 +

0 =

---

0

1 +

1 =

---

1

0 + 1 +

1 = 0 =

--- ---

* *

Heterozygous (ambiguous) sites

Homozygous sites

01

00

0*

10

1*

11

01

**

11

10

**

00

10

00

10

*0

10

10


The population haplotyping problem

Phasing the alleles

11101

10000

11100

10001

11001

10100

11000

10101

1**0*

For k heterozygous (ambiguous) sites, there are 2k-1 possible phasings


The population haplotyping problem

THE PHASING (or HAPLOTYPING) PROBLEM

It is too expensive to determine haplotypes directly

Much cheaper to determine genotypes, and then infer haplotypes in silico:

Given genotypes of k individuals, determine the phasings

of all heterozygous sites.

This yields a set H, of (at most) 2k haplotypes. H is a resolution of G.


The population haplotyping problem

The input is GENOTYPE data

*1**1

11**1

11011

****1

00011

INPUT: G = { 11**1, ****1, 11011, *1**1, 00011 }


The population haplotyping problem

The input is GENOTYPE data

11011

01101

*1**1

11011

11101

11**1

11011

11011

11011

00011

11101

****1

00011

00011

00011

INPUT: G = { 11**1, ****1, 11011, *1**1, 00011 }

OUTPUT: H = { 11011, 11101, 00011, 01101}

Each genotype is resolved by two haplotypes

We will define some objectives for H


The population haplotyping problem

OBJECTIVES

-without objectives/constraints, the

haplotyping problem would be

(mathematically)trivial

E.g., always put 0 above and 1 below

**0*1  00001

11011

1*0** 10000

11011

-the objectives/constraints must be

“driven by biology”


The population haplotyping problem

OBJECTIVES

1°) Clark’s inference rule

2°) Perfect Phylogeny

3°) DiseaseAssociation

4°) (parsimony): minimize |H|


The population haplotyping problem

1st

Obj: Clark’s rule


The population haplotyping problem

Inference Rule

for a compatible pair h , g

known haplotypeh

1011001011 +

?????????? =

1**1001*1*

known (ambiguos) genotype g


The population haplotyping problem

Inference Rule

for a compatible pair h , g

known haplotypeh

1011001011 +

1101001110 =

1**1001*1*

new (derived) haplotype h’

known (ambiguos) genotype g

We write h + h’ = g


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**

1100


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**

1100

SUCCESS

1111


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**

0100


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

0000

1000

**00

11**

0100

FAILURE (can’t resolve 1122 )


The population haplotyping problem

1st Objective (Clark, 1990)

1. Start with H = “bootstrap” haplotypes

2. while Clark’s rule applies to a pair (h, g) in H x G

3. apply the rule to any such (h, g) obtaining h’

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic: the algorithm could end without explaining

all genotypes even if an explanation was possible.

The number of genotypes solved depends on order of application.

OBJ: find order of application rule that leaves the fewest elements in G


The population haplotyping problem

The problem was studied by Gusfield

(ISMB 2000, and Journal of Comp. Biol., 2001)

- problem is APX-hard

- it corresponds to finding largest forest in a graph with

haplotypes as nodes and arcs for possible derivations

-solved via ILP of exponential-size

(practical for small real instances)


The population haplotyping problem

2nd

Obj: Perfect Phylogeny


The population haplotyping problem

- Parsimony does not take into account

mutations/evolution of haplotypes

- parsimony is very relialable on “small”

haplotype blocks

- when haplotypes are large (span several

SNPs, we should consider evolutionionary

events and recombination)

- the cleanest model for evolution is the

perfect phylogeny


The population haplotyping problem

3rd objective is based on perfect phylogeny

- A phylogeny expalains set of binary features (e.g. flies, has fur…)

with a tree

- Leaf nodes are labeled with species

- Each feature labels an edge leading to a subtree that possesses it


The population haplotyping problem

3rd objective is based on perfect phylogeny

- A phylogeny expalains set of binary features (e.g. flies, has fur…)

with a tree

- Leaf nodes are labeled with species

- Each feature labels an edge leading to a subtree that possesses it

has 2 legs

has tail

flies


The population haplotyping problem

But…a new species may come along so that no

Perfect phylogeny is possible…

- A phylogeny expalains set of binary features (e.g. flies, has fur…)

with a tree

- Leaf nodes are labeled with species

- Each feature labels an edge leading to a subtree that possesses it

has 2 legs

has tail

flies


The population haplotyping problem

flies

two legs

tail

Human 1 0 0

Mouse 0 1 0

Spider 0 0 0

Eagle 1 0 1

Theorem: such matrix has p.p. iff there is not a 00 4x2 minor

10

01

11


The population haplotyping problem

flies

two legs

tail

Human 1 0 0

Mouse 0 1 0

Spider 0 0 0

Eagle 1 0 1

Mickey mouse 1 1 0

Theorem: such matrix has p.p. iff there is not a 00 4x2 minor

10

01

11


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)

0 1 * 0

* 1 0 *

* 0 * 0


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)

0 1 0 0

0 1 1 0

1 1 0 1

0 1 0 0

1 0 0 0

0 0 1 0

0 1 * 0

* 1 0 *

* 0 * 0


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)

0 1 0 0

0 1 1 0

1 1 0 1

0 1 0 0

1 0 0 0

0 0 1 0

0 1 * 0

* 1 0 *

* 0 * 0

NOT a perfect phylogeny solution !


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)

0 1 * 0

0 1 0 *

0 0 0 *


The population haplotyping problem

We can consider each SNP as a binary feature

Objective: We want the solution to admit a perfect phylogeny

(Rationale : we assume haplotypes have evolved independently

along a tree)

0 1 0 0

0 1 1 0

0 1 0 0

1 1 0 1

0 0 0 0

0 0 0 1

0 1 * 0

0 1 0 *

0 0 0 *

A perfect phylogeny


The population haplotyping problem

Theorem: The Perfect Phylogeny Haplotyping problem is polynomial


The population haplotyping problem

Theorem: The Perfect Phylogeny Haplotyping problem is polynomial

Algorithms are of combinatorial nature

- There is a graph for which SNPs are columns and edges are of two types

(forced and free)

- forced edges connect pairs of SNPs that must be phased in the same way

** 00 + 11 or ** 01 + 10

- a complex visit of the graph decides how to phase free SNPs


The population haplotyping problem

3rd

Obj: Disease Association


The population haplotyping problem

Some diseases may be due to a gene which has “faulty” configurations

RECESSIVE DISEASE (e.g. cystic fibrosis, sickle cell anemia): to be diseased

one must have both copies faulty. With one copy one is a carrier of the disease

DOMINANT DISEASE (e.g. Huntington’s disease, Marfan’s syndrome): to be

diseased it is enough to have one faulty copy

Two individuals of which one is healthy and the other diseased

may have the same genotype.

The explanation of the disease lies in a difference in their haplotypes


The population haplotyping problem

11**1

*1**1

0*011

11**1

0**01

00011

INPUT: GD = {11**1,*1**1,0*011},

GH ={11**1,0**01,00011}


The population haplotyping problem

11011

01101

11011

11101

01011

00011

11**1

*1**1

0*011

11001

11111

01101

00001

00011

00011

11**1

0**01

00011

INPUT: GD = {11**1,*1**1,0*011},

GH ={11**1,0**01,00011}

OUTPUT: H = { 11011,01011,00001,11111,11101,00011,01101}

H contains HD, s.t. each diseased has >=1 haplotype in HD and each healty none


The population haplotyping problem

Theorem 1 is proved via a reduction from 3 SAT

Theorem 2 has a mathematical proof (coloring argument) with

little relation to biology:

There is R (depending on input) s.t. a haplotype is healthy

if the sum of its bits is congruent to R modulo 3

This means the model must be refined!


The population haplotyping problem

4th

Obj: MaxParsimony

 See separate slides…


The population haplotyping problem

Summary:

- haplotyping in-silico needed for economical reasons

- several objectives, all biologically driven

- nice combinatorial problems

(mostly due to binary nature of SNPs)

- these problems are technology-dependant and may

become obsolete (hopefully after we have retired)


The population haplotyping problem

111

111

011

101

011

000

111

111

010

001

010

011

022

222

012

221

222

011

012

022

022

211

111

012


The population haplotyping problem

2nd Objective (parsimony) :

minimize |H|


The population haplotyping problem

1. The problem is APX-Hard

Reduction from VERTEX-COVER


The population haplotyping problem

B

A

C

D

E


The population haplotyping problem

A B C D E *

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB

BC

AE

DE

AD

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB

BC

AE

DE

AD

A

B

C

D

E

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB 2 2

BC 2 2

AE 22

DE 2 2

AD 22

A

B

C

D

E

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB 2 2

BC 2 2

AE 22

DE 2 2

AD 22

A 0

B 0

C 0

D 0

E 0

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB 2 2 2

BC 2 2 2

AE 222

DE 2 22

AD 222

A 00

B 00

C 00

D 00

E 00

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB 2 2 1 1 1 2

BC 1 2 2 1 1 2

AE 2 1 1 1 22

DE 1 1 1 2 22

AD 2 1 1 2 1 2

A 0 1 1 1 1 0

B 1 0 1 1 1 0

C 1 1 0 1 1 0

D 1 1 1 0 1 0

E 1 1 1 1 00

B

A

C

D

E


The population haplotyping problem

A B C D E *

AB 2 2 1 1 1 2

BC 1 2 2 1 1 2

AE 2 1 1 1 22

DE 1 1 1 2 22

AD 2 1 1 2 1 2

A 0 1 1 1 1 0

B 1 0 1 1 1 0

C 1 1 0 1 1 0

D 1 1 1 0 1 0

E 1 1 1 1 00

B

A

C

D

E

G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that

explain all genotypes


The population haplotyping problem

A B C D E *

AB 2 2 1 1 1 2

BC 1 2 2 1 1 2

AE 2 1 1 1 22

DE 1 1 1 2 22

AD 2 1 1 2 1 2

A 0 1 1 1 1 0

B 1 0 1 1 1 0

C 1 1 0 1 1 0

D 1 1 1 0 1 0

E 1 1 1 1 00

B

A

C

D

E

G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that

explain all genotypes


The population haplotyping problem

A B C D E *

AB 2 2 1 1 1 2

BC 1 2 2 1 1 2

AE 2 1 1 1 22

DE 1 1 1 2 22

AD 2 1 1 2 1 2

A 0 1 1 1 1 0

B 1 0 1 1 1 0

C 1 1 0 1 1 0

D 1 1 1 0 1 0

E 1 1 1 1 00

A’ 0 1 1 1 1 1

B’ 1 0 1 1 1 1

E’ 1 1 1 1 0 1

B

A

C

D

E

G = (V,E) has a node cover X of size k there is a set H of |V | + k haplotypes that

explain all genotypes


The population haplotyping problem

A basic ILP formulation


The population haplotyping problem

A basic ILP formulation

Expand your input G in all possible ways

220

022

120


The population haplotyping problem

A basic ILP formulation

Expand your input G in all possible ways

220

022

120

010 + 100, 000 + 110

000 + 011, 001 + 010

100 + 110


The population haplotyping problem

A basic ILP formulation

Expand your input G in all possible ways

220

022

120

010 + 100, 000 + 110

000 + 011, 001 + 010

100 + 110


The population haplotyping problem

The resulting Integer Program (IP1):


The population haplotyping problem

Other ILP formulation are possible. E.g. POLY-SIZE ILP formulations


The population haplotyping problem

The input is GENOTYPE data

11011

01101

*1**1

11011

11101

11**1

11011

11011

11O11

00011

11101

****1

00011

00011

OOO11

INPUT: G = { 11**1, ****1, 11011, *1**1, 00011}

OUTPUT: H = { 11011, 11101, 00011, 01101}

Each genotype is explained by two haplotypes

We will define some objectives for H


The population haplotyping problem

1st Objective (open research problem):

minimize |H|

2nd Objective based on inference rule:


The population haplotyping problem

1st Objective (parsimony) :

minimize |H|

An easy SQRT(n) approximation: k haplotypes can explain at most k(k-1)/2 genotypes, hence, we need at least LB = SQRT(n) haplotypes.

BUT any greedy algorithm can find 2 haplotypes to explain a genotype, giving a

solution of <= 2n haplotypes, i.e. <= SQRT(n) * LB

It’s difficult, but not impossible, to come up with better approximations, like constants

(Lancia, Pinotti, Rizzi ’02)


The population haplotyping problem

2nd Objective based on inference rule:


The population haplotyping problem

Inference Rule

known haplotypeh

xoxxooxoxx +

********** =

x??xoox?x?

known (ambiguos) genotype g


The population haplotyping problem

Inference Rule

known haplotypeh

xoxxooxoxx +

xxoxooxxxo =

x??xoox?x?

new (derived) haplotype h’

known (ambiguos) genotype g


The population haplotyping problem

Inference Rule

known haplotypeh

xoxxooxoxx +

xxoxooxxxo =

x??xoox?x?

new (derived) haplotype h’

known (ambiguos) genotype g

We write h + h’ = g

g and h must be compatible to derive h’


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??

xxoo


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??

xxoo

SUCCESS

xxxx


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??

oxoo


The population haplotyping problem

2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes

2. while exists ambiguos genotype g in G

3. take h in H compatible with g and let h + h’ = g

4. set H = H + {h’} and G = G - {g}

5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

oooo

xooo

??oo

xx??

oxoo

FAILURE (can’t resolve xx?? )

OBJ: find order of application rule that leaves the fewest elements in G


The population haplotyping problem

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)


The population haplotyping problem

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

x??o?

1. expand

genotypes


The population haplotyping problem

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

xoooo

xooox

xoxoo

x??o?

xoxox

xxooo

1. expand

genotypes

xxoox

xxxoo

xxxox


The population haplotyping problem

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

xoooo

xooox

xoxoo

2. create (h, h’) if exists g s.t. h’ can be

derived from g and h

x??o?

xoxox

xxooo

1. expand

genotypes

xxoox

3. Largest number of nodes in forest

rooted at unambiguos genotpes =

= largest number of ambiguous genotypes

resolved

xxxoo

xxxox

Hence, find largest number of nodes in forest rooted at unambiguos genotpes. Use I.P. model with vars x(ij).

This reduction is exponential. Is there a better practical approach?


The population haplotyping problem

3rd Objective (open research problem)

Disease Detection:

?x??x

xx??x

??oxx

????x

oooxx

INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }


The population haplotyping problem

3rd Objective (open research problem)

Disease Detection:

xxoxx

oxxox

?x??x

xxoxx

xxxox

xx??x

xxoxx

oooxx

??oxx

oooxx

xxxox

????x

oooxx

oooxx

oooxx

INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }

OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}

H contains H’, s.t. each diseased has one haplotype in H’ and each healty none

minimize | H|


Genome rearrangements and evolutionary distances

Genome Rearrangements and Evolutionary Distances


The population haplotyping problem

Each species has a genome (organized in pairs of chromosomes)

tcgtgatggat………………ttgatggattga

tcgattatggat………………ttttgatatcca

Genomes evolve by means of

  • Insertions

  • Deletions

  • Inversions

  • Transpositions

  • Translocations

of DNA regions


The population haplotyping problem

deletion


The population haplotyping problem

insertion

deletion


The population haplotyping problem

insertion

deletion

translocation


The population haplotyping problem

insertion

deletion

translocation

inversion


The population haplotyping problem

insertion

deletion

transposition

translocation

inversion


The population haplotyping problem

Combinatorial problem: given 2 permutations P, Q and operators in a set F find a

shortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))

Very difficult problem! We focus on operators all of the same type (e.g. inversions)

(…still difficult…)

Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions,

transpositions…)

We focus on inversions, that are the most important in Nature

5 6 4 8 3 2 1 9 7

Example:

1 2 3 8 4 6 5 9 7

1 2 3 8 4 5 6 9 7

1 2 3 6 5 4 8 9 7

1 2 3 6 5 4 8 7 9

1 2 3 4 5 6 8 7 9

1 2 3 4 5 6 7 8 9


The population haplotyping problem

Combinatorial problem: given 2 permutations P, Q and operators in a set F find a

shortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))

Very difficult problem! We focus on operators all of the same type (e.g. inversions)

(…still difficult…)

Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions,

transposition…)

We focus on inversions, that are the most important in Nature

+5 +6 -4 -8 -3 -2 -1 -9 +7

Example:

+1 +2 +3 +8 +4 -6 -5 -9 +7

+1 +2 +3 +8 +4 +5 +6 -9 +7

+1 +2 +3 -6 -5 -4 -8 -9 +7

+1 +2 +3 -6 -5 -4 -8 -7 +9

+1 +2 +3 +4 +5 +6 -8 -7 +9

+1 +2 +3 +4 +5 +6 +7 +8 +9

There is also a SIGNED VERSION of the problem !


The population haplotyping problem

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautifultheory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown


The population haplotyping problem

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautifultheory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown

The concept of breakpoint

Breakpoint at position i if | p(i) - p(i+1) | > 1

0

5 7 8 2 1 4 3 6 9

10


The population haplotyping problem

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautifultheory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown

The concept of breakpoint

Breakpoint at position i if | p(i) - p(i+1) | > 1

0

5 7 8 2 1 4 3 6 9

10

d(p) = inversion distance

b(p) = # breakpoints

TRIVIAL BOUND: d(p) >= b(p) / 2

Example: d(p) >= 6 / 2 = 3


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10

Each node has degree...

10

4

6

0 2 or 4 …

hence the graph can be decomposed in cycles!


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10

Alternating cycle decomposition


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10

Alternating cycle decomposition


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10

Alternating cycle decomposition

c(p) = max # cycles in alternating decomposition

VERY STRONG BOUND : d (p) >= b(p) - c(p)

Example: c(p)= 2 and d (p) >= 6 - 2 = 4


The population haplotyping problem

The Breakpoint Graph

0

5 7 8 2 1 4 3 6 9

10

The best algorithm for this problem is based on an Integer Programming

formulation of the max cycle decomposition

A variable xC for each cycle (exponential # of vars…)

A constraint SxC = 1 for each edge e

C containing e

Objective: maximize SCxC


The population haplotyping problem

PRIMAL

max S xC

C

S xC = 1 for all edges e

C\ni e

xC \in {0,1} for all alt. cycles C

DUAL

min S ye

e

S ye <= 1 for all alt. Cycles C

e\in C

ye \in R for all edges e


The population haplotyping problem

PRIMAL

max S xC

C

S xC = 1 for all edges e

C\ni e

xC \in {0,1} for all alt. cycles C

DUAL

min S ye

e

S ye <= 1 for all alt. Cycles C

e\in C

ye \in R for all edges e


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

Pricing out the cycles for which y*(C) < 1


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

Split the graph in two copies


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

Connect twins


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

A perfect matching corresponds to (a set of) alternating cycles


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

A perfect matching corresponds to (a set of) alternating cycles


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

A perfect matching corresponds to (a set of) alternating cycles


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

A perfect matching corresponds to (a set of) alternating cycles


The population haplotyping problem

0

5 7 8 2 1 4 3 6 9

10

0

5 7 8 2 1 4 3 6 9

10

A perfect matching corresponds to (a set of) alternating cycles


The population haplotyping problem

.6

1

.4

0

5 7 8 2 1 4 3 6 9

10

0

0

5 7 8 2 1 4 3 6 9

10

.5

.2

The weight of the matching is the y*-weight of the cycles


The population haplotyping problem

.6

1

.4

0

5 7 8 2 1 4 3 6 9

10

100000

0

5 7 8 2 1 4 3 6 9

10

.5

.2

Forcing a cycle to use a certain node


The population haplotyping problem

- These cycles would not use the same node twice, but with simple trick is possible to model (OMISSIS)

BRANCH&PRICE algorithm by Caprara, Lancia, Ng (1999,2001)

BRANCH&BOUND combinatorial algorithm by Kececioglu, Sankoff (1996)

KS can solve at most n=40. Take days for n=50

CLN can solve for n=200. Takes few seconds (say 5) for n=100

NP-hard problem practically solved to optimality!


Statistical view of evolution

Statistical view of evolution

  • Genome evolve by random inversions

  • It’s like a random walk on a huge graph with an edge for each permutation an edge for each inversion

  • It is not clear why the shortest solution should be the one followed by Nature (in fact, often it isn’t)

  • We want to find the most likely number of inversions that lead from (1 2 … n ) to p

  • We use the expected number of breakpoints after k inversions as a way to guess the # of inversions


The population haplotyping problem

Given a p obtained by h random inversions from (1 … n ) we want to estimate h

The inversion distance is only a lower bound: h >= d(p) but the gap could be big

Let B(k)be the (r.v.) number of breakpoint after k random inversions from (1..n)

We estimate E[B(k)]. Then, faced with some p, we pick h such that E[B(h)] is as

close as possible to b(p) (maximum likelihood). CL ,2000, have shown:

E[B(k)] = ( n - 1 ) ( 1 - ())

k

n - 3

n - 1

Question: estimate E[D(k)], the (r.v.) inversion distance after k random inversions


The population haplotyping problem

Example: n = 200, k (u.a.r. in 1…n) inversions

k k’ d(p) b

88816

19191934

68676798

697368104

737973109

859183120

868583115

879084119

118117109138

184184135168


Protein structure alignments the maximum contact map overlap problem

Protein Structure Alignments: the Maximum Contact Map Overlap Problem


The population haplotyping problem

A Protein is a complex molecule with a primary,

linear structure (a sequence of aminoacids) and a

3-Dimensional structure (the protein fold).

Protein STRUCTURE determines its FUNCTION

For instance, the Drug Design problem

calls for constructing peptides with a 3D

shape complementary to a protein, so as

to dock onto it.


Motivation

Problem:

Motivation:

Align two 3D protein structures

Structure Alignment is Important for:

- Discovery of Protein Function (shape determines function)

- Search in 3D data bases

- Protein Classification and Evolutionary Studies

- ...


Contact maps

Contact Maps


Contact maps1

CONTACT MAPS

Unfolded protein


Contact maps2

CONTACT MAPS

Unfolded protein

Folded protein = contacts


Contact maps3

CONTACT MAPS

Unfolded protein

Folded protein = contacts

Contact map = graph


Contact maps4

CONTACT MAPS

Unfolded protein

Folded protein = contacts

Contact map = graph

OBJECTIVE:align 3d folds of proteins

= align contact maps


Contact map alignments

Contact Map Alignments


Non crossing alignments

Non-crossing Alignments

Protein 1

Protein 2

non-crossing map of residues in protein 1 and protein 2


The value of an alignment

The value of an alignment


The population haplotyping problem

The value of an alignment


The population haplotyping problem

The value of an alignment


The population haplotyping problem

The value of an alignment

Value = 3


The population haplotyping problem

The value of an alignment

Value = 3

We want to maximize the value


The population haplotyping problem

The value of an alignment

NP-Hard


Integer programming formulation

Integer Programming Formulation


Integer programming formulation1

Integer Programming Formulation

e

0-1 VARIABLES

yef

yef for e and f contacts

f


Integer programming formulation2

Integer Programming Formulation

e

0-1 VARIABLES

yef

yef for e and f contacts

e’

e

f

CONSTRAINTS

yef + ye’f’<= 1

f’

f


Integer programming formulation3

Integer Programming Formulation

e

0-1 VARIABLES

yef

yef for e and f contacts

e’

e

f

CONSTRAINTS

yef + ye’f’<= 1

f’

f

max SeSf yef

OBJECTIVE


Independent set problem

Independent Set Problem

It’s just a huge max independent set problem in Gy:

  • a node for each sharing

  • an edge for each pair of incompatible sharings

e’

e

e’’

e’

f’

e

f

e’’

f’’

f’’

f

f’


Independent set problem1

Independent Set Problem

It’s just a huge max independent set problem in Gy:

  • a node for each sharing

  • an edge for each pair of incompatible sharings

e’

e

e’’

e’

f’

e

f

e’’

f’’

f’’

f

f’

|Gy|=|E1|*|E2| (approximately 5000 for two proteins with 50 residues and 75 contacts each)

The best exact algorithm for independent set can solve for at most a few hundred

nodes


Node to node variables

Node to Node Variables

New variables x provide an easy check for the non-crossing conditions

e

NEW VARIABLES

i

xij for i and j residues

yef

xij

j

f


Node to node variables1

Node to Node Variables

New variables x provide an easy check for the non-crossing conditions

e

NEW VARIABLES

i

xij for i and j residues

yef

xij

j

f

NEW CONSTRAINTS

i

i’

j’

j

xij + xi’j’ <= 1


Node to node variables2

Node to Node Variables

New variables x provide an easy check for the non-crossing conditions

e

NEW VARIABLES

i

xij for i and j residues

yef

xij

j

f

NEW CONSTRAINTS

i

i’

p

i

q

j

j’

j

xij + xi’j’<= 1

y(ip)(jq) <= xij and y(ip)(jq)<= xpq


Clique constraints

Clique Constraints

Variables x define a graph Gx:

  • A node for each line

  • An edge between each pair of crossing lines

i

i’

i’

j’

i

j

j’

j


Clique constraints1

Clique Constraints

Variables x define a graph Gx:

  • A node for each line

  • An edge between each pair of crossing lines

i

i’

i’

j’

i

j

j’

j

  • Gx is much smaller than Gy

  • Gx has nice proprieties (it’s a perfect graph)

  • It’s easier to find large independent sets in Gx


Clique constraints2

Clique Constraints

Non-crossing constraints can be extended to

CLIQUE CONSTRAINTS

S xij<= 1

[i,j] in M

For all sets M of mutually incompatible (i.e. crossing) lines

All clique constraints satisfied (and Gx perfect) imply a strong bound!


Structure of maximal cliques in g x

Structure of Maximal cliques in Gx

1. Pick two subsets of same size


Structure of maximal cliques in g x1

Structure of Maximal cliques in Gx

2. Connect them in a zig-zag fashion


Structure of maximal cliques in g x2

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x3

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x4

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x5

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x6

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x7

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x8

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x9

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x10

Structure of Maximal cliques in Gx


Structure of maximal cliques in g x11

Structure of Maximal cliques in Gx

3. Throw in all lines included in a zig or a zag


Structure of maximal cliques in g x12

Structure of Maximal cliques in Gx

3. Throw in all lines included in a zig or a zag


Structure of maximal cliques in g x13

Structure of Maximal cliques in Gx

The result is a maximal clique in Gx


Separation of clique inequalities

Separation of Clique Inequalities


Separation of clique inequalities1

Separation of Clique Inequalities

PROBLEM

There exist exponentially many such cliques (O(22n) inequalities).

We need to generate in polynomial time a clique inequality when needed,

i.e., when violated by the current LP solution x*

S x*ij> 1

[i,j] in M

THEOREM

We can find the most violated clique inequality in time O(n2)


Separation of clique inequalities2

Separation of Clique Inequalities

PROOF (sketch)

1) Clique = zigzag path


Separation of clique inequalities3

Separation of Clique Inequalities

PROOF (sketch)

1) Clique = zigzag path

1 2 3 4 5 6 7 8


Separation of clique inequalities4

Separation of Clique Inequalities

PROOF (sketch)

2) Flip one graph: zigzag leftright

1) Clique = zigzag path

1 2 3 4 5 6 7 8

8 7 6 5 4 3 2 1


Separation of clique inequalities5

Separation of Clique Inequalities

PROOF (sketch)

2) Flip one graph: zigzag leftright

1) Clique = zigzag path

1 2 3 4 5 6 7 8

8 7 6 5 4 3 2 1

3) Define a grid with lengths for arcs so that length(P) = x*(clique(P)). Use Dyn. Progr.

to find longest path in grid, time O(n^2)


Separation of cliques

Separation of cliques

2

n1

1

i

1

2

u

n2

Create n1 x n2 grid

Orient all edges and give weights


Separation of cliques1

Separation of cliques

2

n1

1

i

1

2

x*iu

u

x*iu

n2

Create n1 x n2 grid

Orient all edges and give weights


Separation of cliques2

Separation of cliques

B=(n1,1)

A=(1,n2)

Create n1 x n2 grid

Orient all edges and give weights

There is violated clique iff longest A,B path has length > 1


Gx is a perfect graph

Gx is a Perfect Graph

We show why polynomial separation is possible:

Gx is weakly triangulated (no chordless cycles >= 5 in Gx or Gx)

=> Gx is perfect (Hayward, 1985)


Gx is a perfect graph1

Gx is a Perfect Graph

L1 and L3 don’t cross. Wlog RIGHT(L3, L1)

PROOF (Sketch, for Gx)

L1

L7

L2

L6

L3

L5

L4


Gx is a perfect graph2

Gx is a Perfect Graph

L1 and L3 don’t cross. Wlog RIGHT(L3, L1)

L1

L7

L2

L6

L3

L1

L3

L5

L4


Gx is a perfect graph3

Gx is a Perfect Graph

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1

L7

L2

L6

L3

L1

L3

L5

L4


Gx is a perfect graph4

Gx is a Perfect Graph

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1

L7

L2

L6

L3

L4

L1

L3

L5

L4


Gx is a perfect graph5

Gx is a Perfect Graph

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1

L7

L2

L6

L3

L4

L5

L1

L5

L4


Gx is a perfect graph6

Gx is a Perfect Graph

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1

L7

L2

L6

L3

L6

L5

L1

L5

L4


Gx is a perfect graph7

Gx is a Perfect Graph

We get LEFT(L1, {L3, L4, L5, L6})

L1

L7

L2

L6

L3

L6

L1

L5

L4

L3, L4, L5 L6


Gx is a perfect graph8

Gx is a Perfect Graph

A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5

L1

L7

L2

L6

L3

L6

L1

L5

L4

L3, L4, L5 L6


Gx is a perfect graph9

Gx is a Perfect Graph

A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5

L1

L7

L2

L6

L3

L6

L1

L5

L4

L3, L4, L5 L6

L2, L3, L4 L5


Gx is a perfect graph10

Gx is a Perfect Graph

Then {L3, L4, L5} are between L1 and L6

L1

L7

L2

L6

L3

L6

L1

L5

L4

L3, L4, L5L6

L2, L3, L4 L5


Gx is a perfect graph11

Gx is a Perfect Graph

Then {L3, L4, L5} are between L1 and L6

But L7 crosses L1 and L6, and so should cross them all !

L1

L7

L2

L6

L3

L6

L7

L1

L5

L4

L3, L4, L5L6

L2, L3, L4 L5


The population haplotyping problem

The approach just seen is due to Lancia, Carr, Istrail, Walenz (2001)

It can be applied to small or moderate proteins (up to 80 residues/150 contacts)

In 2002, a new approach, by Caprara and Lancia, based on LAGRANGIAN

RELAXATION. Approach borrowed from Quadratic Assignment. With new

approach we can solve important proteins (up to 150 residues/300 contacts)


What about heuristics e g genetic algorithms

What about Heuristics?E.g., genetic algorithms…


Genetic algorithm overview

Population

at time t

Population

at time t+1

Recombination

operators

Evaluation

function

Genetic Algorithm Overview

  • A Population of candidate solutions thatevolve (improve) over time

  • Recombination creates new candidate solutions viacrossover and mutation


Crossover

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

Offspring

Blue Parent

Red Parent


Crossover1

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child


Crossover2

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child


Crossover3

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child

    • Copy as many edges as possible from the other parent


Crossover4

These edges conflict with existing

edges and are not copied

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child

    • Copy as many edges as possible from the other parent


Crossover5

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child

    • Copy as many edges as possible from the other parent

    • Add random edges to fill any remaining space


Crossover6

Crossover

  • Crossover selects pieces from both parents and creates two offspring solutions

    • Select a set of edges in one parent to copy to the child

    • Copy as many edges as possible from the other parent

    • Add random edges to fill any remaining space


Mutation

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints


Mutation1

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints

    • Select a set of endpoints to shift


Mutation2

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints

    • Select a set of endpoints to shift


Mutation3

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints

    • Select a set of endpoints to shift

This edge “fell off” the

end of the contact map

and is removed


Mutation4

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints

    • Select a set of endpoints to shift

    • Randomly add new edges


Mutation5

Mutation

  • Mutation introduces small changes to existing solutions by shifting edge endpoints

    • Select a set of endpoints to shift

    • Randomly add new edges


Computational results

Computational Results


Computational results1

Computational Results

  • 269 proteins

    • 70 -100 residues

    • 80 to 140 contacts

  • Picked 10,000 pairs of proteins out of 36046 possible

  • Took a weekend on PC

  • 500 were solved to optimality

  • 2500 had a gap <= 10 contacts


Skolnick clustering test

Skolnick Clustering Test


Skolnick results

Skolnick Results

  • Four Families

    • Flavodoxin-like fold Che-Y related

    • Plastocyanin

    • TIM Barrel

    • Ferratin

  • alpha-beta

  • 8 structures

  • up to 124 residues

  • 15-30% sequence similarity

  • < 3Å RMSD


Skolnick results1

Skolnick Results

  • Four Families

    • Flavodoxin-like fold Che-Y related

    • Plastocyanin

    • TIM Barrel

    • Ferratin

  • beta

  • 8 structures

  • up to 99 residues

  • 35-90% sequence similarity

  • < 2Å RMSD


Skolnick results2

Skolnick Results

  • Four Families

    • Flavodoxin-like fold Che-Y related

    • Plastocyanin

    • TIM Barrel

    • Ferratin

  • alpha-beta

  • 11 structures

  • up to 250 residues

  • 30-90% sequence similarity

  • < 2Å RMSD


Skolnick results3

Skolnick Results

  • Four Families

    • Flavodoxin-like fold Che-Y related

    • Plastocyanin

    • TIM Barrel

    • Ferratin

  • alpha

  • 6 structures

  • up to 170 residues

  • 7-70% sequence similarity

  • < 4Å RMSD


Skolnick results4

Skolnick Results

  • Four Families

    • Flavodoxin-like fold Che-Y related

    • Plastocyanin

    • TIM Barrel

    • Ferratin


Clustering

Clustering

Define score(P1, P2) as

# shared contacts

Min # of contacts of P1,P2

0 <=

<= 1

Put P1, P2 in same family if score(P1, P2) >= threshold


Clustering1

Clustering

Define score(P1, P2) as

# shared contacts

Min # of contacts of P1,P2

0 <=

<= 1

Put P1, P2 in same family if score(P1, P2) >= threshold

If P1, P2 too big, use G.A. and local search to compute score

L.P. gives then bounds:

HEUR score <= OPT score <= LP bound

and we know how far off OPT we are


Clustering validation

Clustering validation

We got some known families from biologists, PDB.

Experiment: Take a family F of proteins and align them against each

other and against the remaining.


Clustering validation1

Clustering validation

We got some known families from biologists, PDB.

Experiment: Take a family F of proteins and align them against each

other and against the remaining.

TYPICAL BEHAVIOUR

score proteins were…

0.05 MISMATCH

0.1 MISMATCH

0.15 MISMATCH

0.2 MISMATCH

0.25 MISMATCH

0.3 MISMATCH

0.35 MATCH

…… ……

1.0 MATCH


Skolnick results5

Skolnick Results

  • Performance

    • 528 alignments

    • 1.3% false negative

    • 0.0% false positive


Clustering2

Clustering

Computed, for 1st time, provably optimal alignments for 150 pairs

(inter-family)

Used the CMO value to cluster: retrieves the clusters.

Set S(i,j) = 1 if CMO >= a, S(i,j) = 0 otherwise

Use TSP to find a block diagonal structure for S


Clustering3

Clustering


Last open problem

Last Open Problem

?

?


  • Login