the evolution of expression patterns in the arabidopsis genome
Download
Skip this Video
Download Presentation
The evolution of expression patterns in the Arabidopsis genome

Loading in 2 Seconds...

play fullscreen
1 / 38

The evolution of expression patterns in the Arabidopsis genome - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

The evolution of expression patterns in the Arabidopsis genome. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Driving forces in genome evolution. Proximate vs. ultimate explanations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The evolution of expression patterns in the Arabidopsis genome' - jasper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the evolution of expression patterns in the arabidopsis genome

The evolution of expression patterns in the Arabidopsis genome

Todd Vision

Department of Biology

University of North Carolina at Chapel Hill

driving forces in genome evolution
Driving forces in genome evolution
  • Proximate vs. ultimate explanations
  • Deleterious mutations are frequent and selection cannot effectively act on all of them
    • Substitutions
    • Insertions and deletions
    • Duplications
    • Transpositions
  • Cellular processes will be affected by this rain of mutations
  • At the molecular level, we must entertain ultimate explanations that do not invokeadaption
an example codon bias
An example: Codon bias
  • Genes differ in the frequency that they use the preferred codon for a given amino acid, thereby affecting
    • Translational efficiency
    • Translational accuracy
  • The strongest codon bias is typically seen in short, highly expressed genes under strong purifying selection
  • Realized codon bias is a balance between selection for preferred codons and a continual rain of mutations toward unpreferred codons
slide4
What are the consequences of mutational rain on the regulatory networks that modulate gene expression?
outline
Outline
  • Arabidopsis gene expression (MPSS)
  • Two evolutionary issues in the evolution of expression profiles:
    • Physical clustering of co-expressed genes
    • Divergence of duplicated genes
digital expression profiling
Digital expression profiling
  • “Bar-code” counting raises fewer concerns about cross-hybridization, probe selection, background hybridization, etc.
  • Serial Analysis of Gene Expression (SAGE)
    • Count occurrence of 10-12 bp mRNA signatures
    • Long SAGE: 21-22 bp signatures
    • Uses conventional sequencing technology
  • Massively Parallel Signature Sequencing (MPSS)
    • Count occurrence of 17-20 bp mRNA signatures
    • Cloning and sequencing is done on microbeads
    • Commercialized by Lynx Therapeutics
mpss library construction

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

mRNA

AAAAAAA

extract mRNA from tissue

Convert to

cDNA

TTTTTTT

Add linker

AAAAAAA

Cut w/ Sau3A

TTTTTTT

AAAAAAA

3’ - Add unique 32 bp tag and standard

primer

5’ - Add standard

primer

TTTTTTT

AAAAAAA

(added by cloning)

Anneal to beads coated with unique anti-tag

(32 bp, complementary to tag on mRNA)

PCR

TTTTTTT

AAAAAAA

Remove 3’ primer and expose single stranded unique tag

(digest, 3\'  5\' exonuclease)

MPSS library construction

Brenner et al., PNAS 97:1665-70.

GATC

mpss library construction1

AAAAAAA

AAAAAAA

AAAAAAA

MPSS library construction

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Brenner et al., PNAS 97:1665-70.

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Sort by FACS to remove ‘empty’ beads

The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript.

Beads are loaded in a monolayer on a microscope slide for the sequencing of

17 – 20 bp from the 5’ end.

mpss sequencing

NNNN

4 3 2 1

+

NNNX

CODEX1

RS

NNXN

CODEX2

RS

NXNN

CODEX3

RS

Sequence by hybridization

XNNN

CODEX4

RS

Add adaptors

16 cycles

for 4 bp

Digest with Type IIS enzyme to uncover next 4 bases

13 bp

Repeat Cycle

Steps of four bases; overhang is shifted by four bases in each round

^

GNNN

CODEC4

RS

DECODERED

CNNN

^

4 3 2 1

NNNN

9 bp

8 7 6 5

MPSS Sequencing

Brenner et al., Nat. Biotech. 18:630-4.

mpss sequencing1

TGA

ATG

MPSS Sequencing

Each bead provides a signature of 17-20 bp

Signature

Sequence

# of Beads

(Frequency)

Tag #

1

2

3

4

5

6

7

8

9

.

.

30,285

GATCAATCGGACTTGTC

GATCGTGCATCAGCAGT

GATCCGATACAGCTTTG

GATCTATGGGTATAGTC

GATCCATCGTTTGGTGC

GATCCCAGCAAGATAAC

GATCCTCCGTCTTCACA

GATCACTTCTCTCATTA

GATCTACCAGAACTCGG

.

.

GATCGGACCGATCGACT

2

53

212

349

417

561

672

702

814

.

.

2,935

Total # of tags: >1,000,000

Two sets of signatures are generated from each sample in different reading frames staggered by two bases

a catalog of signatures in the arabidopsis genome
A catalog of signatures in the Arabidopsis genome

“Hits” At genome % of total Random

1 748204 87.407% 845057

2 88392 10.326% 6134

3 11019 1.287% 21

4 3512 0.410% 0

5 1452 0.170% 0

6 874 0.102% 0

7 470 0.055% 0

8 326 0.038% 0

9 237 0.028% 0

10 192 0.022% 0

11 158 0.018% 0

12-20 707 0.083% 0

21-30 247 0.029% 0

31-50 124 0.014% 0

> 50 86 0.010% 0

Total 851,212 851,212

All potential signatures (GATC + 13 bp) are identified on both strands of the genomic sequence.

There is one potential signature appx. every 293 bp on each strand of genome

A signature is classified according to its position relative to the 29,084 genes & pseudogenes in the TIGR annotation

Signatures may not be unique. The number of ‘hits’ in the genome is recorded

classifying signatures

Duplicated: expression may be from other site in genome

Potential alternative splicing or nested gene

Potential alternative termination

Anti-sense transcript or nested gene?

Potential anti-sense

transcript

Potential

un-annotated ORF

Triangles refer to colors used on our web page:

Class 1 - in an exon, same strand as ORF.

Class 2 - within 500 bp after stop codon, same strand as ORF.

Class 3 - anti-sense of ORF (like Class 1, but on opposite strand).

Class 4 - in genome but NOT class 1, 2, 3, 5 or 6.

Class 5 - entirely within intron, same strand.

Class 6 - entirely within intron, anti-sense.

Grey = potential signature NOT expressed

Class 0 - signatures found in the expression libraries but not the genome.

or

or

or

or

or

or

Classifying signatures

Typical

signatures

arabidopsis signatures
Arabidopsis signatures

Based on TIGR annotation (release 3.0, July 2002)

Class # in genome % of total

1 sense exonic 203,174 24.0

2 3’UTR, <500 bp 44,202 5.2

3 anti-sense exonic 197,065 23.3

4 inter-genic 288,109 34.0

5 intronic 60,817 7.2

6 anti-sense intronic 57,845 6.8

TOTAL 851,212 100.5

355 genes lack potential Class 1 or 2 signatures (undetectable)

On average, there are 8.5 class 1 & 2 signatures per gene

8422 genomic signatures have secondary classes due to overlap or near overlap of two genes in the TIGR annotation.

core arabidopsis mpss libraries sequenced by lynx for blake meyers u of delaware
Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware

Signatures Distinct

Library sequenced signatures

Root 3,645,414 48,102

Shoot 2,885,229 53,396

Flower 1,791,460 37,754

Callus 1,963,474 40,903

Silique 2,018,785 38,503

TOTAL 12,304,362 133,377

genome wide expression profiling arabidopsis

Chr. I

Chr. II

Chr. III

Chr. IV

Chr. V

Genome-wide expression profiling Arabidopsis

Of the 29,084 gene models, 14,674 match unique, expressed signatures

http www dbi udel edu mpss
http://www.dbi.udel.edu/mpss
  • Query by
  • Sequence
  • Arabidopsis gene identifier
  • chromosomal position
  • BAC clone ID
  • MPSS signature
  • Library comparison
  • Site includes
  • Library and tissue information
  • FAQs and help pages
outline1
Outline
  • Arabidopsis gene expression (MPSS)
  • Two evolutionary issues in the evolution of expression profiles:
    • Physical clustering of co-expressed genes
    • Divergence of duplicated genes
physical clustering of co expression
Physical clustering of co-expression

Caenorhabditis elegans Roy et al., (2002) Nature 418, 975

Lercher et al (2003) Genome Research 13, 238

Drosophila melanogaster Boutanaev et al (2002) Nature 420, 666

Spellman and Rubin (2002) J Biology 1, 5

Homo sapiens Caron et al (2001) Science 291, 1289

Lercher et al (2002) Nature Genetics 31, 180

Saccharomyces cerevisiae Cohen et al (2000) Nature Genetics 26, 183

Hurst et al (2002) Trends in Genetics 18, 604

Mannila et al (2002) Bioinformatics 18, 482

  • What are the proximate explanations?
    • shared cis-regulatory elements
    • chromatin packaging, etc.
  • What are the ultimate explanations?
    • Adaptive: greater transcriptional efficiency/accuracy?
    • Maladaptive: mutational rain chipping away at insulators and other mechanisms that over-ride regional controllers of gene expression?
clustering of tissue specific expression
Clustering of tissue-specific expression

Chromosome 1

Flower (red)Silique (violet)Leaf (green)Root (blue)Callus (white)

statistical tests of coexpression clustering
Statistical tests of coexpression clustering
  • Measured median pairwise expression distance (MPED) in non-overlapping windows of 20 genes
    • Summed unique class 1 and 2 signatures for each gene
    • Only one gene within each tandemly arrayed family was counted
  • Out of 100 shuffles of gene order
    • Zero shuffles had as many windows with small MPED (less than 1.5) as the unshuffled data
    • Zero shuffles had as large a variance in MPED among windows as the unshuffled data
selection and recombination
Selection and recombination
  • In regions of low recombination
    • deleterious mutations can hitch-hike to high frequency along with favorable ones
    • favorable mutations are kept at low frequency by linkage to deleterious ones
  • Therefore, the effectiveness of natural selection is causally related to recombination rate
  • Are clusters more concentrated in regions of
    • high recombination (i.e. are they adaptive)
    • low (i.e. are they maladaptive)?
co expression clusters
Co-expression clusters
  • MPSS data provides evidence for clusters of co-expression among non-related genes in Arabidopsis
  • Co-expression is greater in regions of low recombination
  • Thus, co-expression clusters may be maladapative, at least on average
outline2
Outline
  • Arabidopsis gene expression (MPSS)
  • Two evolutionary issues in the evolution of expression profiles:
    • Physical clustering of co-expressed genes
    • Divergence of duplicated genes
divergence of duplicated genes
Divergence of duplicated genes

Expression distance

Age of duplication

modes of gene duplication
Modes of gene duplication
  • Tandem (unequal crossing-over)
  • Dispersed (transposition)
  • Segmental (polyploidy)
divergence of duplicated genes1
Divergence of duplicated genes
  • All gene families of size 2 in Arabidopsis were classified as ‘dispersed’, ‘segmental’ or ‘tandem’
  • Expression distance was calculated for each
  • The number of silent (i.e. synonymous) substitutions per site was calculated for each (as a proxy for age since duplication)
divergence of duplicated genes2
Divergence of duplicated genes
  • Almost all expression divergence occurs during (or immediately following) duplication
  • Initial expression divergence is more extreme for tandem than dispersed duplicates
  • Tandem and dispersed duplicates with the most divergent expression profiles are quickly lost
  • Segmental duplicates plateau at a lower level of expression divergence than dispersed duplicates
  • The average divergence in relative expression level in each tissue is about 8-fold.
lessons learned
Lessons learned
  • Clusters of co-expression in Arabidopsis may be largely the result of a rain of weakly deleterious mutations that homogenize the expression profiles of neighboring genes
  • Divergence in expression profile between duplicated genes is dependent on the nature of the mutation that gave rise to the duplication
thanks
Thanks!
  • UNC Chapel Hill
    • Jianhua Hu
  • University of Delaware
    • Blake Meyers
  • NSF Plant Genome Research Program
    • DBI-01103267 (TJV)
    • DBI-0110528 (BCM)
ad