Comparison of array detected transcription map with
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions. Acknowledgements. AFFX Transcriptome Group Computation M olecular Biology S. Bekiranov P. Kapranov S. Brubaker I. Bell J. Cheng J. Drenkow

Download Presentation

Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Comparison of array detected transcription map with gencode havana annotations in encode regions

Comparison of array detected transcription map with

GENCODE/HAVANA annotations in ENCODE regions


Acknowledgements

Acknowledgements

AFFX Transcriptome Group

Computation Molecular Biology

S. BekiranovP. Kapranov

S. BrubakerI. Bell

J. Cheng J. Drenkow

S. Ghosh D. Kampa-Bailey

G. Helt J. Long

G. Madhavan J. Manak

S. Patel V. Sementchenko

H. Tammana

A. Piccolboni

Harvard Medical School

NCI

K. Struhl

H. Hirsch

H. H. Ng

E. Sekinger

Broad Institute

B. Bernstein

M. Kamal

K. Lindblad-Toh

D. J. Huebert

S. McMahon

E. K. Karlsson

E. J. Kulbokas III

S. L. Schreiber

E. S. Lander

Support:

NCI Contract (21XS019C Phases I- III) 2001-2006

NHGRI ENCODE Grant

AFFYMETRIX


Transcription map modification site generation i

Compute median (M)

of all chip medians

(if multiple arrays in a set)

Median Scaling

Quantile Normalization

Probe Mapping to Genome

Wilcoxon Signed Rank Test

RNA or IP

CEL file

CEL file

CEL file

RNA: Transfrag Generation

Chromation IP: Site Generation

Transcription Map & Modification Site Generation…I

  • Median Scaling: Scale all features on chip such that chip median = M

  • Quantile Normalization(QN):QN Feature intensities within replicates only.QN Treatment and Control separately.

  • Probe Mapping to Genome:Map PM,MM pairs to genome via exact 25-mer alignment of PM.

  • Wilcoxon Signed Rank Test:

    • Perform test on probe-pair signal S = log2(PM-MM)

    • Apply a sliding window to estimate intensity of each probe pair as a pseudo-median of all probes in the window.

    • A Sliding window makes use of neighboring probes; this reduces false positive rate and increases sensitivity.

    • Window size varies w/ experiment: RNA~50bp, IP~250bp

  • Map and Site Generation:

  • RNA

    • Join probes w/ intensity > 5%FPR & maxgap, minrun to generate transcribed fragments

  • Chromatin IP

    • Generate Hodges Lehman Estimator to estimate expression level :logDiff = log2(min(PM-MM)T,1) – log2(PM-MM)C,1)

    • Generate p-Value estimate per probe

    • Join probes w/ p-value  10-5 & maxgap, minrun to generate modification/transcription factor binding sites


Comparison of array detected transcription map with gencode havana annotations in encode regions

  • Filtration of 10 Chromosome Data

  • (Cheng, J., et al. Science Express; March 24, 2005)

  • ( see UCSD Browser for 8 cell line data see Version 33)

  • Low Complexity Repeats

  • Processed Pseudogenes

  • BLAT hits more than itself

  • (lose some members of gene families)

  • Use of all filters this reduces the transfrag

  • by ~20% of transfrags, ~30% of which are

  • pseudogenes. With BLAT data reduction is

  • 14%


Comparison of array detected transcription map with gencode havana annotations in encode regions

RACE Model

(Need isothermal RT for unannotated transfrags)


Comparison of array detected transcription map with gencode havana annotations in encode regions

RACE Analysis of Coding Gene

DeGeorge Critical Region 14 gene


Comparison of array detected transcription map with gencode havana annotations in encode regions

Un-annotated transfrags of PISD are part of at least 9 different, yet overlapping sense-antisense transcripts

Sense Strand

Anti-sense strand


Comparison of array detected transcription map with gencode havana annotations in encode regions

RACE Regions Validated for 768 Loci


Data sets analyzed

Data sets analyzed

  • Part 1 : a) Analysis done on v34 of the human genome. Total number of Encode regions analyzed = 12 ( region Enm006 ignored for this analysis since no annotations are available for v34).

    b) Set of Known/validated exons

    c) Set of predicted exons (from multiple gene predictions)

    d) Array detected transcript maps from HL-60 cell lines at 4 time points after RA stimulation. (i.e one cell line at 4 biological states)

  • Part 2 : a) Analysis done on v35 of the human genome. Total number of Encode regions analyzed = 44

    b) Set of Known/validated exons.

    c) Set of Vega putative exons.

    d) Set of predicted exons outside sets b & c (from multiple gene predictions).

    d) Array detected transcript maps from HL-60 cell lines at 4 time points after RA stimulation.


Comparison of array detected transcription map with gencode havana annotations in encode regions

Repeats (RepeatMasker)

Coverage of interogated

Regions using algorithms used

To call Transfrags

Probes

35 bp avg. distance

Genomic sequence

Annotation (e.g. Vega)

Exon 2 is 100%

Covered

Exon 1 < 100%

Covered

Predicted exons

Analyses done only within interrogated regions

How Comparisons are carried out using arrays,

Annotations and predicted regions


Comparison of array detected transcription map with gencode havana annotations in encode regions

Probes

Genomic sequence

Positive probes

X

Transfrags after

minrun/maxgap parameters

Annotation

Exon 2

Predicted exons


Coverage of annotation by array detected transfrags from hl60 cell line in 13 encode regions

Coverage of Annotation by array detected transfrags from HL60 cell line in 13 ENCODE regions


Comparison of array detected transcription map with gencode havana annotations in encode regions

Analysis results of 12/13 ENCODE Regions


Comparison of array detected transcription map with gencode havana annotations in encode regions

  • Mode size of annotated exons is ~120bp

  • Detection of exons is not dependent upon size (bp) of the exon (i.e.

  • small exons are not biased against)

  • If an exon is detected by transfrag, 65% of these are covered at >75%


Comparison of array detected transcription map with gencode havana annotations in encode regions

  • Mode size of predicted exons is ~120bp

  • Approximately 30.5 % of predicted exons are covered (i.e. at least 1bp coverage)

  • by transfrags.

  • If an exon is detected by transfrag, 48.6% of these are covered at >75%


Coverage of annotation by array detected transfrags from hl60 cell line in all 44 encode regions

Coverage of Annotation by array Detected transfrags from HL60 cell line in all 44 Encode regions


Analysis results of 44 encode regions

Analysis results of 44 ENCODE regions


Comparison of array detected transcription map with gencode havana annotations in encode regions

  • Mode size of annotated exons is ~120bp

  • Detection of exons is not dependent upon size (bp) of the exon (i.e.

  • small exons are not biased against)

  • If an exon is detected by transfrag, 61.4% of these are covered at >75%


Comparison of array detected transcription map with gencode havana annotations in encode regions

  • Mode size of predicted exons is ~80bp

  • Approximately, 18.2% of predicted exons are detected by transfrags ( ie. by at least 1 bp)

  • If an exon is detected by transfrag, 44.6% of these are covered at >75%


Comparison of array detected transcription map with gencode havana annotations in encode regions

Important Caveats To Recall In

Pondering the Prediction vs Array

Results

  • Only one cell line used in this evaluation.

  • We have set very conservative thresholds for transfrag prediction. Other thresholds can be used

  • Strand information not deducible from transfrag map. TUFs (transcripts of unknown function) are collection of transfrags shown to be on the same molecule by RACE-RT/PCR-cloning/sequencing.

  • Array interrogation resolution is 20bp on average

  • for non-repeat portion of the genome and probes are 25mers. Thus, the boundaries of transfrags are not as precise as arrays with 5bp interrogation resolution and some small exons will not not be interrogated or detected

  • Have not included other functional features (e.g.TF binding)

  • which would provide additional confidence to transfrag data. These will be

  • added under ENCODE project.


Conclusions

Conclusions

  • Array based method detects ~53.9% of known/validated exons.

  • Similarly, array based method provides evidence for ~18.2% of predicted exons. These detected exons should be analyzed further to improve the annotation.

  • A combination of array based RNA map generation, followed by RACE experiments can significantly improve the rate of validation of gene predictions.

  • Transfrags that map outside validated and predicted exons can be used to improve gene prediction programs and can form the basis for further experiments.


  • Login