Alternative splicing from ests
This presentation is the property of its rightful owner.
Sponsored Links
1 / 69

Alternative Splicing from ESTs PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

Alternative Splicing from ESTs. Eduardo Eyras Bioinformatics UPF – February 2004. Intro ESTs Prediction of Alternative Splicing from ESTs. Transcription. exons. introns. pre-mRNA. Splicing. Mature mRNA. Translation. Peptide. 5’. 3’. 3’. 5’. 5’ CAP. AAAAAAA. Different Splicing.

Download Presentation

Alternative Splicing from ESTs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Alternative splicing from ests

Alternative Splicing from ESTs

Eduardo Eyras

Bioinformatics UPF – February 2004


Alternative splicing from ests

  • Intro

  • ESTs

  • Prediction of

  • Alternative Splicing from ESTs


Alternative splicing from ests

Transcription

exons

introns

pre-mRNA

Splicing

Mature mRNA

Translation

Peptide

5’

3’

3’

5’

5’ CAP

AAAAAAA


Alternative splicing from ests

Different Splicing

Mature mRNA

Translation

Different Peptide

5’

3’

3’

5’

Transcription

exons

introns

pre-mRNA

5’ CAP

AAAAAAA


Alt splicing as a mechanism of gene regulation

Alt splicing as a mechanism of gene regulation

Functional domains can be added/subtracted  protein diversity

Can introduce early stop codons, resulting in truncated proteins or unstable mRNAs

It can modify the activity of the transcription factors, affecting the expression of genes

It is observed nearly in all metazoans

Estimated to occur in 30%-40% of human


Forms of alternative splicing

Forms of alternative splicing

Exon skipping / inclusion

Alternative 3’ splice site

Alternative 5’ splice site

Mutually exclusive exons

Intron retention

Constitutive exon

Alternatively spliced exons


Alternative splicing from ests

  • How to study alternative splicing?


Ests expressed sequence tags

ESTs (Expressed Sequence Tags)

Single-pass sequencing of a small (end) piece of cDNA

Typically 200-500 nucleotides long

It may contain coding and/or non-coding region


Alternative splicing from ests

5’

5’

5’

3’

3’

3’

AAAAAA

AAAAAA

AAAAAA

ESTs

Cells from a specific organ, tissue or developmental stage

mRNA extraction

Add oligo-dT primer

TTTTTT

3’

5’

Reverse transcriptase

RNA

TTTTTT

DNA

3’

5’

Ribonuclease H

TTTTTT

3’

5’

DNA polimerase Ribonuclease H

5’

3’

AAAAAA

Double stranded cDNA

TTTTTT

3’

5’


Alternative splicing from ests

ESTs

5’

3’

AAAAAA

Clone cDNA into a vector

TTTTTT

3’

5’

5’ EST

Single-pass sequence reads

Multiple cDNA clones

3’ EST


Alternative splicing from ests

Alternative Splicing from ESTs

Genomic

Primary transcript

Splicing

Splice variants

cDNA clones

EST sequences

5’ 3’

5’ 3’


Alternative splicing from ests1

Alternative Splicing from ESTs

ESTs can also provide information about potential alternative splicing when aligned to the genome (and when aligned to mRNA data)


Est sequencing

EST sequencing

  • Is fast and cheap

  • Gives direct information about the gene sequence

  • Partial information

Resulting ESTsKnown gene

(DB searches)Similar to known gene

Contaminant

Novel gene


Ests provide expression data

Anatomical System

The tissue, organ or anatomical system from which the sample was prepared. Examples are digestive, lung and retina.

Cell Type

The precise cell type from which a sample was prepared. Examples are: B-lymphocyte, fibroblast and oocyte.

Pathology

The pathological state of the sample from which the sample was prepared.Examples are: normal, lymphoma, and congenital.

Developmental Stage

The stage during the organism's development at which the sample was prepared. Examples are: embryo, fetus, and adult.

Pooling

Indicates whether the tissue used to prepare the library was derived from single or multiple samples.  Examples are pooled, pooled donor and pooled tissue.

ESTs provide expression data

eVOC Ontologieshttp://www.sanbi.ac.za/evoc/


Linking the expression vocabulary to gene annotations

Linking the expression vocabulary to gene annotations

ESTs

Genes


Alternative splicing from ests

Normalized vs. non-normalized libraries


The down side of the ests

The down side of the ESTs

  • Cannot detect lowly/rarely expressed genes or non-expressed sequences (regulatory)

Random sampling: the more ESTs we sequence the less new useful sequences we will get


Gene hunting

Gene Hunting

  • Sequencing of the Human Genome (HGP)

EST Sequencing


Origin of the ests

Origin of the ESTs

  • Science. 1991 Jun 21;252(5013):1651-6

  • Complementary DNA sequencing: expressedsequence tags and human genome project.

  • Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR,Wu A, Olde B, Moreno RF, et al.Section of Receptor Biochemistry and Molecular Biology, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD.

Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.


Est sequencing explosion

EST-sequencing explosion

 non-exclusivity (1992)

  • Merck and WashU (1994)

  •  public ESTs

  •  GenBank

  •  dbEST


Dbest release 20 february 2004

dbEST release 20 February 2004

  • Number of public entries: 20,039,613

  • Summary by organism

  • Homo sapiens (human) 5,472,005

  • Mus musculus + domesticus (mouse) 4,056,481

  • Rattus sp. (rat) 583,841

  • Triticum aestivum (wheat) 549,926

  • Ciona intestinalis 492,511

  • Gallus gallus (chicken) 460,385

  • Danio rerio (zebrafish) 450,652

  • Zea mays (maize) 391,417

  • Xenopus laevis (African clawed frog) 359,901


Est lengths

EST lengths

~ 450 bp

Human EST length distribution

(dbEST Sep. 2003 )


Recover the mrna from the ests

Recover the mRNA from the ESTs


What is an est cluster

What is an EST cluster?

A cluster is a set of fragmented EST data (plus mRNA data if known), consolidated according to sequence similarity

Clusters are indexed by gene such that all expressed data concerning a single gene is in a single index class, and each index class contains the information for only one gene.

(Burke, Davison, Hide, Genome Research 1999).


Est pre processing

EST pre-processing

Vector

Repeats

Mitochondrial

Xenocontaminants


Est clustering

EST Clustering

  • UniGene (NCBI) www.ncbi.nlm.nih.gov/UniGene

  • TIGR Human Gene Index www.tigr.org

  • (The Institute for Genomic Research)

  • StackDB www.sanbi.ac.za

  • (South African Bioinformatics Institute)


Unigene

UniGene

  • Species UniGene Entries

  • Homo sapiens 118,517

  • Mus musculus 82,482

  • Rattus norvegicus 43,942

  • Sus scrofa 20,426

  • Gallus gallus 11,970

  • Xenopus laevis 21,734

  • Xenopus tropicalis 17,102


Alternative splicing from ests

  • ESTs and the Genome


Ests aligned to the genome

ESTs aligned to the genome

  • Some advantages:

  • It defines the location of exons and introns

  • We can verify the splice sites of introns (e.g. GT-AG)

  •  hence also check the correct strand of spliced ESTs

  • It helps preventing chimeras

  • It can avoid putting together ESTs from paralogous genes

  • We can prevent including pseudogenes in our analysis


Aligning ests to the genome

Aligning ESTs to the Genome

  • Many ESTs  Fast programs, Fast computers

  • Nearly exact matchesCoverage>= 97%

  • Percent_id>= 97%

  • Splice sites: GT—AG, AT—AC, GC—AG


Aligning ests to the genome1

Aligning ESTs to the Genome

Extra pre-processing of ESTs:

  • Clip poly A tails/Clip 20bp from either end

  • Best in genome

  • Remove potential processed pseudogenes

  • Give preference to ESTs that are spliced


Human estgenes

Human ESTGenes

Genomic length distribution of aligned human ESTs

~ 400bp

Tail up to ~ 800kb


The problem

The Problem

ESTs

Genome

What are the transcripts represented in this set of mapped ESTs?


Predict transcripts from ests

Predict Transcripts from ESTs

ESTs

Transcript predictions

Merge ESTs according to splicing structure compatibility


Representation

Representation

Every 2 ESTs in a Genomic Cluster may represent the same splicing (redundant) or not

The redundancy relation is a graph:

x

x

Extension

y

y

x

Inclusion

x

z

z

Sort by the smallest coordinate ascending and

by the largest coordinate descending


Criteria of merging

Criteria of merging

Allow edge-exon mismatches

Allow internal mismatches

Allow intron mismatches


Transitivity

Transitivity

x

x

y

y

Extension

z

w

x

Inclusion

z

x

z

w

w

This reduces the number of comparisons needed


Clustermerge graph

ClusterMerge graph

Each node defines an inclusion sub-tree

y

z

y

x

z

x

Extensions form acyclic graphs

x

x

y

y

z

z

w

w


Recovering the solution

Recovering the Solution

Mergeable sets of ESTs can be recovered as

special paths in the graph

1

4

2

3

5

6

7

8

9


Recovering the solution1

Recovering the Solution

Root: does not extend any node

Root

1

4

2

3

5

6

7

Leaves

8

9

Leaf: not-extended and root of an inclusion tree


Recovering the solution2

Recovering the Solution

Any set of ESTs in a path from a root to a leaf is mergeable

Root

1

4

2

3

5

6

7

Leaves

8

9


Recovering the solution3

Recovering the Solution

Add the inclusion tree attached to each node in the path

Root

1

4

2

3

5

6

7

Leaves

8

9


Recovering the solution4

Recovering the Solution

Lists produced: (1,2,3,4,5,6,7,8) ( 1,2,3,4,5,6,7,9)

1

4

2

3

5

6

7

8

9

This representation minimizes the necessary

comparisons between ESTs


How to build the graph

How to build the graph

Mutual Recursion

Inclusion => go up in the tree

Recursion search along extension branch

Search graph (leaves)

Search sub-graph


How to build the graph1

How to build the graph

Example

1

2

3

4

5

6


How to build the graph2

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6


How to build the graph3

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6

7

Leaves


How to build the graph4

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6

7

Inclusion


How to build the graph5

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6

7

Inclusion


How to build the graph6

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6

7

Extension


How to build the graph7

How to build the graph

Example

1

3

1

2

3

2

5

4

5

6

4

6

7

Inclusion


How to build the graph8

How to build the graph

Example

1

3

1

2

3

2

5

7

4

5

6

4

6

7

Place


How to build the graph9

How to build the graph

Example

1

3

1

2

3

2

5

7

4

5

6

4

6

7

Inclusion


How to build the graph10

How to build the graph

Example

1

3

1

2

3

2

5

7

4

5

6

4

6

7

tagged as visited - skip


How to build the graph11

How to build the graph

Example

1

3

1

2

3

2

5

7

4

5

6

4

6

7

Possible sub-trees beyond 1 or 3 remain unseen!

The representation minimizes the necessary comparisons


Deriving the transcripts from the lists

Deriving the transcripts from the lists

Internal Splice Sites:external coordinates of the 5’ and 3’ exons are not allowed to contribute


Deriving the transcripts from the lists1

Deriving the transcripts from the lists

Splice Sites: are set to the most common coordinate

5’ and 3’ coordinates: are set to the exon coordinate that extends the potential UTR the most


Single exon transcripts

Single exon transcripts

Reject resulting single exon transcripts when using ESTs


Annotation with ests

Annotation with ESTs

ESTs aligned to the genome can provide information about

UTRs and alternative splicing


Annotation with ests1

Annotation with ESTs

EST-Transcripts at www.ensembl.org


Annotation with ests2

Annotation with ESTs


Results for human and mouse

Results for Human and Mouse

  • Human EST-genes (assembly ncbi33):

  • 38,581 Genes

  • 122,247Transcripts ( 42% with full CDS )

  • Mouse EST-genes (assembly ncbi30):

  • 32,848 Genes

  • 103,664 Transcripts ( 36% with full CDS )


Alternative splicing from ests

  • How many transcripts are conserved?

  • Is Alternative Splicing conserved?


Est transcript pairs

EST-transcript pairs

  • 42,625 transcript pairs (in 18,242 gene pairs)

gene pairs

78% with one transcript pair conserved

22% with more than one transcript pair conserved

For 22% of the gene pairs

some form of alt. splicing is conserved


Conservation of alt splicing

Conservation of Alt. Splicing

  • Take gene-pairs with more than one transcript-pair

∑ ( number of paired transcripts - 1)

%conservation = -------------------------------------------------------

∑ ( number of transcripts - 1 )

∑ = sum over genes in a gene pair with more than one variant

( subtract the ‘main’ transcript form)

19% of alt. variants in human are conserved in mouse

32% of alt. variants in mouse are conserved in human


Alternative splicing from ests

  • How many predicted ‘novel’ genes

  • are validated by Human-Mouse comparison?


Novel genes

Novel genes

ESTGenes

Not in Ensembl

Human ESTGenes validated by comparison to mouse

13,174

18,242

24,201

ESTGenes with at least one complete ORF


Novel genes1

Novel genes

ESTGenes not in Ensembl

validated by comparison to mouse

984

With a complete ORF


Alternative splicing from ests

  • THE END


  • Login