The evolution of expression patterns in the arabidopsis genome
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

The evolution of expression patterns in the Arabidopsis genome PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on
  • Presentation posted in: General

The evolution of expression patterns in the Arabidopsis genome. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Driving forces in genome evolution. Proximate vs. ultimate explanations

Download Presentation

The evolution of expression patterns in the Arabidopsis genome

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The evolution of expression patterns in the arabidopsis genome

The evolution of expression patterns in the Arabidopsis genome

Todd Vision

Department of Biology

University of North Carolina at Chapel Hill


Driving forces in genome evolution

Driving forces in genome evolution

  • Proximate vs. ultimate explanations

  • Deleterious mutations are frequent and selection cannot effectively act on all of them

    • Substitutions

    • Insertions and deletions

    • Duplications

    • Transpositions

  • Cellular processes will be affected by this rain of mutations

  • At the molecular level, we must entertain ultimate explanations that do not invokeadaption


An example codon bias

An example: Codon bias

  • Genes differ in the frequency that they use the preferred codon for a given amino acid, thereby affecting

    • Translational efficiency

    • Translational accuracy

  • The strongest codon bias is typically seen in short, highly expressed genes under strong purifying selection

  • Realized codon bias is a balance between selection for preferred codons and a continual rain of mutations toward unpreferred codons


The evolution of expression patterns in the arabidopsis genome

What are the consequences of mutational rain on the regulatory networks that modulate gene expression?


Outline

Outline

  • Arabidopsis gene expression (MPSS)

  • Two evolutionary issues in the evolution of expression profiles:

    • Physical clustering of co-expressed genes

    • Divergence of duplicated genes


Digital expression profiling

Digital expression profiling

  • “Bar-code” counting raises fewer concerns about cross-hybridization, probe selection, background hybridization, etc.

  • Serial Analysis of Gene Expression (SAGE)

    • Count occurrence of 10-12 bp mRNA signatures

    • Long SAGE: 21-22 bp signatures

    • Uses conventional sequencing technology

  • Massively Parallel Signature Sequencing (MPSS)

    • Count occurrence of 17-20 bp mRNA signatures

    • Cloning and sequencing is done on microbeads

    • Commercialized by Lynx Therapeutics


Mpss library construction

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

mRNA

AAAAAAA

extract mRNA from tissue

Convert to

cDNA

TTTTTTT

Add linker

AAAAAAA

Cut w/ Sau3A

TTTTTTT

AAAAAAA

3’ - Add unique 32 bp tag and standard

primer

5’ - Add standard

primer

TTTTTTT

AAAAAAA

(added by cloning)

Anneal to beads coated with unique anti-tag

(32 bp, complementary to tag on mRNA)

PCR

TTTTTTT

AAAAAAA

Remove 3’ primer and expose single stranded unique tag

(digest, 3'  5' exonuclease)

MPSS library construction

Brenner et al., PNAS 97:1665-70.

GATC


Mpss library construction1

AAAAAAA

AAAAAAA

AAAAAAA

MPSS library construction

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Brenner et al., PNAS 97:1665-70.

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Sort by FACS to remove ‘empty’ beads

The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript.

Beads are loaded in a monolayer on a microscope slide for the sequencing of

17 – 20 bp from the 5’ end.


Mpss sequencing

NNNN

4 3 2 1

+

NNNX

CODEX1

RS

NNXN

CODEX2

RS

NXNN

CODEX3

RS

Sequence by hybridization

XNNN

CODEX4

RS

Add adaptors

16 cycles

for 4 bp

Digest with Type IIS enzyme to uncover next 4 bases

13 bp

Repeat Cycle

Steps of four bases; overhang is shifted by four bases in each round

^

GNNN

CODEC4

RS

DECODERED

CNNN

^

4 3 2 1

NNNN

9 bp

8 7 6 5

MPSS Sequencing

Brenner et al., Nat. Biotech. 18:630-4.


Mpss sequencing1

TGA

ATG

MPSS Sequencing

Each bead provides a signature of 17-20 bp

Signature

Sequence

# of Beads

(Frequency)

Tag #

1

2

3

4

5

6

7

8

9

.

.

30,285

GATCAATCGGACTTGTC

GATCGTGCATCAGCAGT

GATCCGATACAGCTTTG

GATCTATGGGTATAGTC

GATCCATCGTTTGGTGC

GATCCCAGCAAGATAAC

GATCCTCCGTCTTCACA

GATCACTTCTCTCATTA

GATCTACCAGAACTCGG

.

.

GATCGGACCGATCGACT

2

53

212

349

417

561

672

702

814

.

.

2,935

Total # of tags: >1,000,000

Two sets of signatures are generated from each sample in different reading frames staggered by two bases


A catalog of signatures in the arabidopsis genome

A catalog of signatures in the Arabidopsis genome

“Hits” At genome % of totalRandom

174820487.407%845057

28839210.326%6134

3110191.287%21

435120.410%0

514520.170%0

68740.102%0

74700.055%0

83260.038%0

92370.028%0

101920.022%0

111580.018%0

12-207070.083%0

21-302470.029%0

31-501240.014%0

> 50860.010%0

Total851,212851,212

All potential signatures (GATC + 13 bp) are identified on both strands of the genomic sequence.

There is one potential signature appx. every 293 bp on each strand of genome

A signature is classified according to its position relative to the 29,084 genes & pseudogenes in the TIGR annotation

Signatures may not be unique. The number of ‘hits’ in the genome is recorded


Classifying signatures

Duplicated: expression may be from other site in genome

Potential alternative splicing or nested gene

Potential alternative termination

Anti-sense transcript or nested gene?

Potential anti-sense

transcript

Potential

un-annotated ORF

Triangles refer to colors used on our web page:

Class 1 - in an exon, same strand as ORF.

Class 2 - within 500 bp after stop codon, same strand as ORF.

Class 3 - anti-sense of ORF (like Class 1, but on opposite strand).

Class 4 - in genome but NOT class 1, 2, 3, 5 or 6.

Class 5 - entirely within intron, same strand.

Class 6 - entirely within intron, anti-sense.

Grey = potential signature NOT expressed

Class 0 - signatures found in the expression libraries but not the genome.

or

or

or

or

or

or

Classifying signatures

Typical

signatures


Arabidopsis signatures

Arabidopsis signatures

Based on TIGR annotation (release 3.0, July 2002)

Class# in genome % of total

1 sense exonic 203,17424.0

2 3’UTR, <500 bp 44,202 5.2

3 anti-sense exonic 197,06523.3

4 inter-genic 288,10934.0

5 intronic 60,817 7.2

6 anti-sense intronic 57,845 6.8

TOTAL 851,212100.5

355 genes lack potential Class 1 or 2 signatures (undetectable)

On average, there are 8.5 class 1 & 2 signatures per gene

8422 genomic signatures have secondary classes due to overlap or near overlap of two genes in the TIGR annotation.


Core arabidopsis mpss libraries sequenced by lynx for blake meyers u of delaware

Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware

SignaturesDistinct

Library sequencedsignatures

Root3,645,41448,102

Shoot2,885,22953,396

Flower1,791,46037,754

Callus1,963,47440,903

Silique2,018,78538,503

TOTAL12,304,362133,377


Genome wide expression profiling arabidopsis

Chr. I

Chr. II

Chr. III

Chr. IV

Chr. V

Genome-wide expression profiling Arabidopsis

Of the 29,084 gene models, 14,674 match unique, expressed signatures


Http www dbi udel edu mpss

http://www.dbi.udel.edu/mpss

  • Query by

  • Sequence

  • Arabidopsis gene identifier

  • chromosomal position

  • BAC clone ID

  • MPSS signature

  • Library comparison

  • Site includes

  • Library and tissue information

  • FAQs and help pages


Outline1

Outline

  • Arabidopsis gene expression (MPSS)

  • Two evolutionary issues in the evolution of expression profiles:

    • Physical clustering of co-expressed genes

    • Divergence of duplicated genes


Physical clustering of co expression

Physical clustering of co-expression

Caenorhabditis elegansRoy et al., (2002) Nature 418, 975

Lercher et al (2003) Genome Research 13, 238

Drosophila melanogasterBoutanaev et al (2002) Nature 420, 666

Spellman and Rubin (2002) J Biology 1, 5

Homo sapiens Caron et al (2001) Science 291, 1289

Lercher et al (2002) Nature Genetics 31, 180

Saccharomyces cerevisiae Cohen et al (2000) Nature Genetics 26, 183

Hurst et al (2002) Trends in Genetics 18, 604

Mannila et al (2002) Bioinformatics 18, 482

  • What are the proximate explanations?

    • shared cis-regulatory elements

    • chromatin packaging, etc.

  • What are the ultimate explanations?

    • Adaptive: greater transcriptional efficiency/accuracy?

    • Maladaptive: mutational rain chipping away at insulators and other mechanisms that over-ride regional controllers of gene expression?


Measuring expression distance

library 2

library 1

library 3

Measuring expression distance


Clustering of tissue specific expression

Clustering of tissue-specific expression

Chromosome 1

Flower (red)Silique (violet)Leaf (green)Root (blue)Callus (white)


Statistical tests of coexpression clustering

Statistical tests of coexpression clustering

  • Measured median pairwise expression distance (MPED) in non-overlapping windows of 20 genes

    • Summed unique class 1 and 2 signatures for each gene

    • Only one gene within each tandemly arrayed family was counted

  • Out of 100 shuffles of gene order

    • Zero shuffles had as many windows with small MPED (less than 1.5) as the unshuffled data

    • Zero shuffles had as large a variance in MPED among windows as the unshuffled data


Coexpression in arabidopsis

Coexpression in Arabidopsis


Coexpression in arabidopsis1

Coexpression in Arabidopsis


Coexpression in arabidopsis2

Coexpression in Arabidopsis


Selection and recombination

Selection and recombination

  • In regions of low recombination

    • deleterious mutations can hitch-hike to high frequency along with favorable ones

    • favorable mutations are kept at low frequency by linkage to deleterious ones

  • Therefore, the effectiveness of natural selection is causally related to recombination rate

  • Are clusters more concentrated in regions of

    • high recombination (i.e. are they adaptive)

    • low (i.e. are they maladaptive)?


Measuring recombination rate

Measuring recombination rate

Chromosome 1


Co expression is greater in low recombination regions

Co-expression is greater in low recombination regions


Co expression clusters

Co-expression clusters

  • MPSS data provides evidence for clusters of co-expression among non-related genes in Arabidopsis

  • Co-expression is greater in regions of low recombination

  • Thus, co-expression clusters may be maladapative, at least on average


Outline2

Outline

  • Arabidopsis gene expression (MPSS)

  • Two evolutionary issues in the evolution of expression profiles:

    • Physical clustering of co-expressed genes

    • Divergence of duplicated genes


Divergence of duplicated genes

Divergence of duplicated genes

Expression distance

Age of duplication


Duplicated genes in arabidopsis

Duplicated genes in Arabidopsis


Modes of gene duplication

Modes of gene duplication

  • Tandem (unequal crossing-over)

  • Dispersed (transposition)

  • Segmental (polyploidy)


Divergence of duplicated genes1

Divergence of duplicated genes

  • All gene families of size 2 in Arabidopsis were classified as ‘dispersed’, ‘segmental’ or ‘tandem’

  • Expression distance was calculated for each

  • The number of silent (i.e. synonymous) substitutions per site was calculated for each (as a proxy for age since duplication)


Divergence and mode of duplication

Divergence and mode of duplication


Divergence of duplicated genes2

Divergence of duplicated genes

  • Almost all expression divergence occurs during (or immediately following) duplication

  • Initial expression divergence is more extreme for tandem than dispersed duplicates

  • Tandem and dispersed duplicates with the most divergent expression profiles are quickly lost

  • Segmental duplicates plateau at a lower level of expression divergence than dispersed duplicates

  • The average divergence in relative expression level in each tissue is about 8-fold.


Lessons learned

Lessons learned

  • Clusters of co-expression in Arabidopsis may be largely the result of a rain of weakly deleterious mutations that homogenize the expression profiles of neighboring genes

  • Divergence in expression profile between duplicated genes is dependent on the nature of the mutation that gave rise to the duplication


Thanks

Thanks!

  • UNC Chapel Hill

    • Jianhua Hu

  • University of Delaware

    • Blake Meyers

  • NSF Plant Genome Research Program

    • DBI-01103267 (TJV)

    • DBI-0110528 (BCM)


  • Login