Defining gene clusters 24 ways of looking at mount fuji
Download
1 / 64

Defining Gene Clusters: 24 Ways of Looking at Mount Fuji - PowerPoint PPT Presentation


Defining Gene Clusters: 24 Ways of Looking at Mount Fuji. Anne Bergeron, UQAM Dublin, September 19, 2005. 7. Mt Fuji from the Foot. Defining Gene Clusters: 24 Ways of Looking at Mount Fuji. Anne Bergeron, UQAM Dublin, September 19, 2005.

Related searches for Defining Gene Clusters: 24 Ways of Looking at Mount Fuji

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Defining Gene Clusters: 24 Ways of Looking at Mount Fuji

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Defining Gene Clusters:24 Ways of Looking at Mount Fuji

Anne Bergeron, UQAM

Dublin, September 19, 2005

7. Mt Fuji from the Foot


Defining Gene Clusters:24 Ways of Looking at Mount Fuji

Anne Bergeron, UQAM

Dublin, September 19, 2005

"It struck me that it would be good to take one thing in life and regard it from many viewpoints, ... "

Roger Zelazny


Genome A

Genome B

Genome C

The basic problem

We start with a set of genomes, labeled by gene names, domains, or synteny blocks,

and a similarity relation on those labels.

Highlighting a gene means selecting all labels that are similar.

Genes, or other types of signals, can appear in multiple copies in a genome,

or even be missing. In this talk, the similarity relation is "given" and is an

equivalence relation.


{

}

A set of genes :

Genome A

Genome B

Genome C

The basic problem

We are interested in what happens when a set of genes is highlighted.

Boring...


{

}

Another set of genes:

Genome A

Genome B

Genome C

The basic problem

Measures of surprise are studied by Durand, Haque,

Hoberman, Sankoff, Raghupathy, etc.

Interesting ?


The basic problem

Goal : Given a (big) set of genomes,

automatically identify all potentially

interesting sets of genes.


Towards formal models

1. Mount Fuji from Owari


Towards formal models

What do labels stand for?

How many labels and genomes do

we want to compare ?

What do we want to do with the

resulting clusters ?


Towards formal models: Example 1

Definition of labels and similarity:

Large homology segments disrupted only

by local micro-rearrangements.

A total of 281 synteny blocks,

colored in the human genome

by their mouse chromosome

number.

Interesting features:

Chromosome X

Chromosome 17

Chromosome 20

Application:

Genome evolution

From: Eichler and Sankoff, Science (301:793-797), 2003


Towards formal models: Example 2

Definition of labels and similarity:

Gene annotations of chloroplasts.

Interesting features:

Rearrangements

Application:

Phylogeny


Towards formal models: Example 3

Definition of labels and similarity:

PFAM Domain numbers labeling four

bacterial genomes.

Interesting features:

Duplications

Insertions

Rearrangements

Application:

Operon identification

From: Pasek et al, Genome Research (15:867-874), 2005


With such an high E-value,

the potential duplicate would

have been missed by a comparison

based on sequence similarity.

Towards formal models: Example 4

Definition of labels and similarity:

PFAM Domain numbers labeling four

bacterial genomes.

From: Pasek et al, Genome Research (15:867-874), 2005

Application:

Identification of orthologs

and/or duplicate segments.


From: Bérard et al, WABI 2005

Towards formal models: Example 5

Definition of labels and similarity:

Large homology segments disrupted only

by local micro-rearrangements.

Comparing 16 segments of the mouse

and rat chromosome X.

Mouse = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rat = -4 -3 -2 1 -13 -15 14 -16 8 9 10 -11 12 5 6 7

Application:

Reconstructing ancestors


Down to earth details

2. Mt Fuji from a Teahouse at Yoshida


Down to earth details

Do we allow gaps ?

Do we allow rearrangements?

Do we allow duplicates and

missing genes ?

Do we allow multiple genomes or

self-comparison ?

How about "extensions" ?


{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 1

No gaps, no duplications, any rearrangement.


{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 1

No gaps, no duplications, any rearrangement.

What about this gene?

Should we add it ?


Extension

{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 1

No gaps, no duplications, any rearrangement.

What about this gene?

Should we add it ?


{

}

A set of genes:

Genes not in the set

Genome A

Genome B

Genome C

Down to earth details : Model 2

No gaps, duplications, any rearrangement.


{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 3

Gaps, no duplications, any rearrangement.


{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 4

Gaps, missing/inserted genes, any rearrangement.


{

}

A set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 5

Gaps, missing genes, duplications, any rearrangement.

With gap size = 1, we get 4 occurrences.

Reducing the number of genes....


{

}

A smaller set of genes:

Genome A

Genome B

Genome C

Down to earth details : Model 5

... yields 5 occurrences.


A general framework

24. Mount Fuji in a Summer Storm


{

}

A set S of genes:

Occurrence #1

Occurrence #2

A chromosome:

> g

≤ g

> g

> g

S = {

}

T= {

}

is an extension of

A general framework

Given a gap g, an occurrence of S is a maximal run

of genes of S, separated by gaps of at most g genes not in S,

and that contains at least one of each gene of S.

A set of genes S is an extension of a set T, included in S,

if each occurrence of T is contained in an occurrence of S.


A general framework

Given a gap g, an occurrence of S is a maximal run

of genes of S, separated by gaps of at most g genes not in S,

and that contains at least one of each gene of S.

{

}

A set S of genes:

Occurrence #1

Occurrence #2

A chromosome:

> g

≤ g

> g

> g

A set of genes S is an extension of a set T, included in S,

if each occurrence of T is contained in an occurrence of S.

S = {

}

T= {

}

is an extension of


A general framework

Choices

When g = 0, the number of candidates is polynomial in the number of genes.

When g > 0, the number of

candidates can be exponential

in the number of genes.

• g = 0 or g > 0

Even with g = 1, there are problems. For example, with g = 0, the sequence of genes:

a b c d e f

produces one potential cluster that contains both a and f. But with g = 1, there are 8 of them:

a b c d e f

a b c d f

a b c e f

a b d e f

a c d e f

a c e f

a b d f

a c d f

The number of these sequences grows in a Fibonacci progression!


A general framework

Choices

• g = 0 or g > 0

• Duplications or no duplications

Duplications usually means

an exponential number of

candidates but, most of the time,

are unavoidable.

Models without duplications are,

nevertheless, useful in many situations.


A general framework

Choices

• g = 0 or g > 0

• Duplications or no duplications

Filtering is mostly based on the properties of the extension relation.

If the number of candidates is low, filtering is not necessary,

but it can be relevant.

For models with a huge number

of candidates, filtering is a must.

• Three ways of filtering candidates


A general framework

Choices

• g = 0 or g > 0

• Duplications or no duplications

• Three ways of filtering candidates

• Formal or heuristic

Formal models have inherent

computational problems when

applied to real data.

Heuristics will always be useful.


A general framework

Choices

• g = 0 or g > 0

• Duplications or no duplications

• Three ways of filtering candidates

• Formal or heuristic

2 x 2 x 3 x 2 = 24

How convenient!


Common intervals: Voluntary simplicity*

*Voluntary simplicity is a lifestyle considered by its adherents to be a sustainable, ecologically

sensitive alternative to the typical, western consumerist lifestyle. [Ref. Wikipedia]

20. Mount Fuji from Inume Pass


Common intervals: Voluntary simplicity*

*Voluntary simplicity is a lifestyle considered by its adherents to be a sustainable, ecologically

sensitive alternative to the typical, western consumerist lifestyle. [Ref. Wikipedia]

A (partial) list of credits:

Uno and Yagiura (2000)

Heber and Stoye (2001)

Bergeron, Heber and Stoye (2002)

Didier (2003)

Schmidt and Stoye (2004)

Figeac and Varré (2004)

Bérard, Bergeron and Chauve (2004)

Blin, Chauve and Fertin(2005)

Landau, Parida and Weizman (2005)

Tannier and Sagot (2005)

Bérard, Bergeron, Chauve and Paul (2005)

Bergeron, Chauve, de Montgolfier and Raffinot (2005)


Choices

• g = 0

• No duplications

• No filtering

• Formal

Genome A

Genome B

Genome C

Common intervals

The basic model of common intervals often

yields a large number of 'uninteresting clusters'.

However, filtering provides unusual information

on whole genome organization.


Choices

Genome A

• g = 0

Genome B

• No duplications

s

t

• Filtering

Common intervals

u

v

• Formal

s

Strong intervals

v

Common intervals -> Strong Intervals

Both t and u are two different extensions of

the common interval s: Remove them.


From: Bérard et al, WABI 2005

Strong Intervals

This tree displays the strong

intervals between the synteny

blocks of the mouse and rat

chromosomes X.

Mouse = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rat = -4 -3 -2 1 -13 -15 14 -16 8 9 10 -11 12 5 6 7

This kind of tree is known as a PQ-tree. Strong intervals possess a rich combinatorial structure that can be exploited both from the biological and computation perspective.


4 3 2 1 1315 14 16 8 9 10 11 12 5 6 7

1315 14 16 8 9 10 11 12 5 6 7

1315 14 16

4 3 2 1

15 14

8 9 10 11 12

5 6 7

4

3

2

1

13

15

14

16

8

9

10

11

12

5

6

7

Strong Intervals : transforming a rat into a mouse

This tree provides guidelines to possible rearrangement

scenarios that transform the rat chromosome into a mouse

chromosome. These scenarios preserve all common intervals.


4 3 2 1 1315 14 16 8 9 10 11 12 5 6 7

1315 14 16 8 9 10 11 12 5 6 7

1315 14 16

4 3 2 1

15 14

8 9 10 11 12

5 6 7

4

3

2

1

13

15

14

16

8

9

10

11

12

5

6

7

Strong Intervals : transforming a rat into a mouse

Intervals are first labeled (in red) with respect to their relative orientation.


4 3 2 1 1315 14 16 8 9 10 11 12 5 6 7

1315 14 16 8 9 10 11 12 5 6 7

1315 14 16

4 3 2 1

15 14

8 9 10 11 12

5 6 7

4

3

2

1

13

15

14

16

8

9

10

11

12

5

6

7

Strong Intervals : transforming a rat into a mouse

Intervals are first labeled (in red) with respect to their relative orientation.


1315 14 16 8 9 10 11 12 5 6 7

1315 14 16

15 14

8 9 10 11 12

5 6 7

13

15

14

16

8

9

10

11

12

5

6

7

Strong Intervals : transforming a rat into a mouse

4 3 2 11315 14 16 8 9 10 11 12 5 6 7

4 3 2 1 1315 14 16 8 9 10 11 12 5 6 7

4 3 2 1

4 3 2 1

4

3

2

1

1

Then all strong intervals that disagree with their parent are inverted : 1


1315 14 16 8 9 10 11 12 5 6 7

1315 14 16

15 14

8 9 10 11 12

5 6 7

4 3 2 1

1 2 3 4

1

4

3

2

3

2

13

15

14

16

8

9

10

11

12

5

6

7

4

1

Strong Intervals : transforming a rat into a mouse

4 3 2 11315 14 16 8 9 10 11 12 5 6 7

1 2 3 4 1315 14 16 8 9 10 11 12 5 6 7

Then all strong intervals that disagree with their parent are inverted : 4 3 2 1


8 9 10 11 12

5 6 7

1 2 3 4

1

2

3

8

9

10

11

12

5

6

7

4

Strong Intervals : transforming a rat into a mouse

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 1315 14 16 8 9 10 11 12 5 6 7

1315 14 16 8 9 10 11 12 5 6 7

13 15 14 16 8 9 10 11 12 5 6 7

1315 14 16

13 15 14 16

15 14

13

15

14

16

13

Then all strong intervals that disagree with their parent are inverted : 13


8 9 10 11 12

5 6 7

1 2 3 4

1

2

3

8

9

10

11

12

5

6

7

4

Strong Intervals : transforming a rat into a mouse

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 151416 8 9 10 11 12 5 6 7

13 151416 8 9 10 11 12 5 6 7

13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16

13 151416

15 14

1514

15

16

13

14

14

Then all strong intervals that disagree with their parent are inverted : 14


8 9 10 11 12

5 6 7

1 2 3 4

1

2

3

8

9

10

11

12

5

6

7

4

Strong Intervals : transforming a rat into a mouse

1 2 3 4 13 151416 8 9 10 11 12 5 6 7

1 2 3 4 13 1514 16 8 9 10 11 12 5 6 7

13 151416 8 9 10 11 12 5 6 7

13 1514 16 8 9 10 11 12 5 6 7

13 1514 16

13 151416

1514

15

16

13

14

16

Then all strong intervals that disagree with their parent are inverted : 16


8 9 10 11 12

5 6 7

1 2 3 4

14 15

1514

1

2

3

14

15

8

9

10

11

12

5

6

7

4

15

14

Strong Intervals : transforming a rat into a mouse

1 2 3 4 13 1514 16 8 9 10 11 12 5 6 7

1 2 3 4 13 14 15 16 8 9 10 11 12 5 6 7

13 14 15 16 8 9 10 11 12 5 6 7

13 1514 16 8 9 10 11 12 5 6 7

13 1514 16

13 14 15 16

13

16

Then all strong intervals that disagree with their parent are inverted : 14 15


13 14 15 16

13 1514 16

16 15 14 13

13 14 15 16

8 9 10 11 12

5 6 7

1 2 3 4

15 14

1514

14 15

14 15

1

2

3

15

14

14

15

8

9

10

11

12

5

6

7

4

16

13

13

14

14

15

15

13

16

16

Strong Intervals : transforming a rat into a mouse

1 2 3 4 13 14 15 16 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

16 15 14 13 8 9 10 11 12 5 6 7

13 14 15 16 8 9 10 11 12 5 6 7

Then all strong intervals that disagree with their parent are inverted : 13 14 15 16


13 14 15 16

16 15 14 13

5 6 7

1 2 3 4

14 15

15 14

1

2

3

14

15

5

6

7

4

13

16

15

14

13

16

Strong Intervals : transforming a rat into a mouse

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

16 15 14 13 8 9 10 11 12 5 6 7

16 15 14 13 8 9 10 11 12 5 6 7

8 9 10 11 12

8 9 10 11 12

8

9

10

11

12

11

Then all strong intervals that disagree with their parent are inverted : 11


13 14 15 16

16 15 14 13

5 6 7

1 2 3 4

15 14

14 15

8 9 10 11 12

12 11 10 9 8

1

2

3

15

14

8

9

10

12

5

6

7

4

13

16

14

15

13

16

11

12

11

10

8

9

Strong Intervals : transforming a rat into a mouse

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 1312 11 10 9 8 5 6 7

16 15 14 13 8 9 10 11 12 5 6 7

16 15 14 1312 11 10 9 8 5 6 7

Then all strong intervals that disagree with their parent are inverted : 8 9 10 11 12


13 14 15 16

16 15 14 13

5 6 7

1 2 3 4

14 15

15 14

12 11 10 9 8

1

2

3

14

15

5

6

7

4

16

13

15

14

16

13

12

11

10

8

9

7 6 5

7

6

5

Strong Intervals : transforming a rat into a mouse

1 2 3 4 16 15 14 1312 11 10 9 8 5 6 7

1 2 3 4 16 15 14 1312 11 10 9 87 6 5

16 15 14 1312 11 10 9 8 5 6 7

16 15 14 1312 11 10 9 8 7 6 5

Then all strong intervals that disagree with their parent are inverted : 5 6 7


5 6 7 8 9 10 11 12 13 14 15 16

16 15 14 1312 11 10 9 8 7 6 5

13 14 15 16

16 15 14 13

13 14 15 16

8 9 10 11 12

1 2 3 4

5 6 7

14 15

15 14

14 15

12 11 10 9 8

1

2

3

6

14

15

9

10

11

13

4

13

16

5

7

14

15

8

13

16

12

14

15

16

12

11

10

8

9

7 6 5

7

6

5

Strong Intervals : transforming a rat into a mouse

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 2 3 4 16 15 14 1312 11 10 9 87 6 5

Then all strong intervals that disagree with their parent are inverted : 5 6 7 ... 14 15 16


5 6 7 8 9 10 11 12 13 14 15 16

13 14 15 16

8 9 10 11 12

1 2 3 4

5 6 7

14 15

1

2

3

6

9

10

11

13

4

5

7

8

12

14

15

16

Strong Intervals : transforming a rat into a mouse

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 2 3 4 16 15 14 1312 11 10 9 87 6 5

Then all strong intervals that disagree with their parent are inverted : 5 6 7 ... 14 15 16


Domain Teams: The 'eXtreme' model

18. Mt Fuji from the Offing in Kanagawa


Domain Teams: The 'eXtreme' model

A (partial) list of credits:

Bergeron, Corteel and Raffinot (2002)

Luc, Risler, Bergeron and Raffinot (2003)

He and Goldwasser (2004)

Béal, Bergeron, Corteel and Raffinot (2004)

Pasek, Bergeron, Risler, Louis, Ollivier and Raffinot (2005)

Blin, Chauve and Fertin (2005)


Choices

Genome A

• g > 0

Genome B

• Duplications

has an extension.

has an extension.

• Heavy filtering

has an extension.

has an extension.

• Formal

Surviving teams:

Domain Teams

Remove them all!


Domain Teams : Example

67591 Domains

50078 Proteins

16 Chromosomes

Maximum gap: 3

16713 Domain Teams


Domain Teams : Example

From: Pasek et al, Genome Research (15:867-874), 2005


The combinatorial beauty of nature

12. Mt Fuji from Lake Kawaguchiç


The combinatorial beauty of nature

Does nature allow all possible

rearrangements ?


The combinatorial beauty of nature

Promiscuous domains

Six domains can theoretically form 63 potential teams.

If they are labelled as {a, b, c, d, e, f}, the possible teams

with more than one member are:

{a, b}, {a, c}, {a, d}, {a, e}, {a, f}, {b, c}...

{a, b, c}, {a, b, d}, {a, b, e}, ...

...

{a, b, c, d, e, f}

Who are they?

For 6 domains, of the 63 possibilities, we found 35 teams that

had at least two occurrences and no extension.q

PF00005 ABC transporter

PF00072 Response regulator receiver domain

PF00486 Transcriptional regulatory protein

PF00512 His Kinase A

PF00528 Binding-protein-dependent transport system inner membrane

PF00672 HAMP domain


The need for heuristics

21. Mount Fuji from the Totomi Mountains


The need for heuristics

Choices

• g > 0

• Duplications

• No filtering

• Heuristic

Very reasonable approximations

of the general model can be obtained

efficiently -- a few minutes -- in the

case of very large scale comparisons.

From: St-Onge, et al. Poster RECOMB CG 2005


The need for heuristics

An uncertainty principle

With the general model of gene clusters, it is impossible to predict simultaneously the computing time AND

the properties of the output.


Credits

Marie-Pierre Béal, Informatique, Marne-la-Vallée

Sèverine Bérard, INRA, Toulouse

Mathieu Blanchette, McGill University

Sylvie Corteel, PRiSM, Versailles

Steffen Heber, Raleig, USA

Hokusai Katsushika: 1760-1849

Nicolas Luc,Génome et informatique, Evry

Fabien de Montgolfier, LIAFA, Paris

Christophe Paul, LIRMM, Montpellier

Sophie Pasek, Génome et informatique, Evry

Jean-Loup Risler, Génome et informatique, Evry

Mathieu Raffinot, Laboratoire Poncelet, Moscou

Jens Stoye, Technische Facultat, Bielefeld

Cedric Chauve

Annie Chateau

Olivier Gingras

Yannick Gingras

André Levasseur

Jacqueline Rwirangira

Karine St-Onge


Credits

Marie-Pierre Béal, Informatique, Marne-la-Vallée

Sèverine Bérard, INRA, Toulouse

Mathieu Blanchette, McGill University

Sylvie Corteel, PRiSM, Versailles

Steffen Heber, Raleig, USA

Hokusai Katsushika: 1760-1849

Nicolas Luc,Génome et informatique, Evry

Fabien de Montgolfier, LIAFA, Paris

Christophe Paul, LIRMM, Montpellier

Sophie Pasek, Génome et informatique, Evry

Jean-Loup Risler, Génome et informatique, Evry

Mathieu Raffinot, Laboratoire Poncelet, Moscou

Jens Stoye, Technische Facultat, Bielefeld

Cedric Chauve

Annie Chateau

Olivier Gingras

Yannick Gingras

André Levasseur

Jacqueline Rwirangira

Karine St-Onge


ad
  • Login