slowly approaching grass specific gene diversification or we need to fix those phylogenies first n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first! PowerPoint Presentation
Download Presentation
Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first!

Loading in 2 Seconds...

play fullscreen
1 / 36

Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first! - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first! . gene family:. a set of divergent but functionally related genes that descend from the same ancestral gene . species A: 5 copies. species B: 15 copies. retention of duplicated gene copies.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first!' - kasa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
gene family
gene family:

a set of divergent but functionally related genes

that descend from the same ancestral gene

species A:

5 copies

species B:

15 copies

retention of duplicated gene copies
retention of duplicated gene copies

mechanisms increasing gene copy number

• tandem duplication

• segmental duplication

• whole genome duplication

  • large quantities of a gene product are needed
  • • specialization for functions, location, times
lineage specific diversification
lineage specific diversification

species A:

5 copies

species B:

15 copies

species 3 has 5 gene copies

species 3 has 5 gene copies

lineage specific diversification1
lineage specific diversification

species A:

5 copies

species B:

15 copies

species 3 has 5 gene copies

species 3 has 5 gene copies

nbs lrr resistance gene families
NBS-LRR resistance gene families

in Arabidopsis: ~150 - 200 gene copies

in rice: ~500 - 700 gene copies

CC

coiled-coil

domain

NBS

nuclear binding

site domain

LRR

Leucine-rich

repeats

grasses are agronomically very important
grasses are agronomically very important

monocots

dicots

gymnosperms

mosses, ferns

research objectives
Research objectives
  • I will search plant gene families for grass-specific expansions
  • I will identify those containing known resistance genes or their interacting partners
  • I will test for co-evolution of known resistance genes with their interacting partners
  • I will determine whether co-evolution with resistance genes is a new means to identify interacting partners of these genes
phytome
Phytome

protein-coding sequence data from 39 plant species

26,393 families with ≥ 2 members

307,492 singleton families

related families multiple alignments motif and domain

and subfamilies and phylogenies structure information

identifying grass specific expansions
identifying grass specific expansions
  • counting genes per taxon is not sufficient!
  • identify gene family phylogenies that contain many successive grass-specific internal nodes
  • identify duplication and speciation nodes for each gene family
  • label duplication nodes with grass-specific nodes
identify successive grass specific nodes
identify successive grass-specific nodes

in practice: a perl script

  • acesses the Phytome database
  • takes every tree stored in Phytome
  • and, comparing it to the species tree, labels its internal nodes according to the common ancestor of all descendant leaf nodes

species tree gene tree

identify duplication and speciation nodes
identify duplication and speciation nodes

speciation nodes:

duplication nodes: (7)

SDI: speciation duplication inference. Zmasek & Eddy 2001, Bioinformatics

required labeled duplication speciation nodes
required: labeled duplication/speciation nodes

PROBLEM FOR SDI: UNRESOLVED GENE TREES!

required accurate gene phylogenies
required: accurate gene phylogenies

PROBLEM FOR DISTANCE METHODS: NO OVERLAP OF PARTIAL SEQUENCES!

digressing from grass specific expansion
digressing from grass specific expansion:

How can we generate phylogenies from these “partial sequence alignments” ?

 required for grass specific expansion project

 important for Phytome

 necessary for anyone using EST data for

phylogenetic analysis

how can we generate correct phylogenies from partial sequence alignments

matrixA matrixB

How can we generate correct phylogenies from “partial sequence alignments” ?

can’t directly compute a single distance matrix with all sequences

divide alignment into sub-sections, compute separate pairwise distance matrices:

matrixA, matrixB

3. combine these to one single distance matrix, use it for phylogenetic reconstruction

GOAL: define columns and sequences for sub-matrices

the overlapgraph
The OverlapGraph
  • Sequence alignment 2. Overlap matrix

seqAXXXXXXXXXXXXX

seqB XXXXXXXXXXXXX

seqC -------XXXXXX

seqD ------XXXXXXX

seqE XXXXXXX------

seqF XXXXXX-------

3. Overlap graph 4. Find largest cliques (complete subgraps)

the overlapgraph1
The OverlapGraph
  • Sequence alignment 2. Overlap matrix

seqAXXXXXXXXXXXXX

seqB XXXXXXXXXXXXX

seqC -------XXXXXX

seqD ------XXXXXXX

seqE XXXXXXX------

seqF XXXXXX-------

3. Overlap graph 4. Find largest cliques (complete subgraps)

the overlapgraph2
The OverlapGraph
  • Sequence alignment 2. Overlap matrix

seqAXXXXXXXXXXXXX

seqB XXXXXXXXXXXXX

seqC -------XXXXXX

seqD ------XXXXXXX

seqE XXXXXXX------

seqF XXXXXX-------

3. Overlap graph 4. Find largest cliques (complete subgraps)

the overlapgraph3
The OverlapGraph
  • Sequence alignment 2. Overlap matrix

seqAXXXXXXXXXXXXX

seqB XXXXXXXXXXXXX

seqC -------XXXXXX

seqD ------XXXXXXX

seqE XXXXXXX------

seqF XXXXXX-------

3. Overlap graph 4. Find largest cliques (complete subgraps)

problem clique overlap
problem: clique overlap

alignment

overlap graph

problem clique overlap1
problem: clique overlap

Clique A:

1, 2, 3, 4, 5, 7

Clique B:

1, 3, 4, 5, 6, 7

Clique C:

4, 5, 8, 12, 13

Clique D:

4, 5, 8, 9, 10, 11, 12, 13

alignment

overlap graph

new strategy includes merging cliques
new strategy includes merging cliques

1. partial sequence alignment

2. generate OverlapGraph,

find cliques

3. merge overlapping cliques

4. find connected components

validation
Validation

How can we test whether this method will really generate the best phylogeny possible? – Use artificial data!

ROSE - Random model Of Sequence Evolution(Stoye et al. 1998, Bioinformatics)

input: • root sequence,

• tree topology

output: • a family of related sequences, created from the root

sequence by insertion, deletion and substitution

 sequences with a known evolutionary history

• a correct multiple alignment of these sequences

validation1
Validation
  • vary numbers of sequences per alignment (e.g., two alternatives: 10 and 50 sequences)
  • vary tree topologies (e.g., four alternatives: low resolution at deep nodes, high nodes, no low resolution, imbalanced tree)
  • vary alignment lengths (e.g., two alternatives: 50 and 200 aa)
  • vary average branch lengths/distances (two different mutation probabilities)
  • vary masks (e.g., five alternatives, based on deletion-patterns of Phytome families)
gene tree vs species tree
gene tree vs. species tree

species A species B species C

gene tree vs species tree1
gene tree vs. species tree

species A species B species C

gene tree vs species tree2
gene tree vs. species tree

species A species B species C

gene tree vs species tree3
gene tree vs. species tree

species A species B species C

gene tree vs species tree4
gene tree vs. species tree

species A species B species C

gene in

species A

gene in

species B

gene in

species C

A B C

what if gap boundaries aren t so clear what if some cliques are contained within others
what if gap-boundaries aren’t so clear?what if some cliques are contained within others?
grass specific diversification patterns
grass specific diversification: patterns

duplication event prior to diversification of the grass lineage

duplication events after diversification of the grass lineage

lineage specific diversification of an orthologous ancestor

lineage specific genes