html5-img
1 / 28

Homology Based Analysis of the Human/Mouse lncRNome

Homology Based Analysis of the Human/Mouse lncRNome. Cédric Notredame Giovanni Bussotti Comparative Bioinformics lab CRG. Part 1: GENCODE v10 lncRNA screening vs human and mouse genomes Strategy : PipeR one2many homolog assignment Template : PipeR Parameters : Blast

eilis
Download Presentation

Homology Based Analysis of the Human/Mouse lncRNome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homology Based Analysis of the Human/Mouse lncRNome Cédric Notredame Giovanni Bussotti Comparative Bioinformics lab CRG

  2. Part 1: GENCODE v10 lncRNA screeningvs human andmousegenomes Strategy: PipeR one2many homologassignment Template: PipeRParameters: Blast - Freyhultparametrization - Lower case masking - Low complexitymasking Exonerate - est2genome model - 70% coveragerequired - seedextension 2X (thespan of thegenomicsize of thequeryonbothsides)

  3. PipeR: a pipeline formapping lncRNAs • blast-exonerate based framework to map lncRNAs • against target genomes • algorithmused: lncRNA 2 Blast hits chromosome mapping extension Exonerate spliced transcript

  4. GENECODE lncRNAs Vs Complete Genomes PipeR: lncRNA Homology Mapping • Anchor points: ENCODE vs Mouse with tuned Blast • Extension: Exonerate • Filtering: Id and Coverage • Validation of the GFF annotation Overlap with Annotation Overlap with Cufflink Models RPKM on target genome • FurtherMapping Parameter Space Exploration using Experimental Evidences GFF File Notredame, Bussotti

  5. Mapping overview Gene B Gene A Queryspecies Transcript 1 Transcript 3 Transcript 2 Multiple Homologues Blast/Exonerate failed Homolog 1 Homolog 2 Homolog 3 Homolog 4 Bestreciprocal Conservedexonnumber Highrepeatcoverage Overlapwithprotein Target species

  6. GENCODEv10 vs humangenome • mapped17327 transcripts out of 17547 • many lncRNAs found in multiple copies • (lncRNA families) • - found 144566 homologs • correspondingto501355exons • Annotationsofdiscovered homologs • are readilyavailable

  7. Homologrepeatcoverage • Aboutthe 10% ofallourhomolog • predictions are fullycoveredbyrepeats

  8. Homologrepeatcoverage • Wecould sub-groupthehomologs • in 3 set accordingwiththerepeat • coverage: • <= 20 • < = 80 • < = 100

  9. Mapping statistics HUMAN

  10. GENCODEv10 vs mouse genome • mapped3190 transcripts out of 17547 • representing 2249 human genes • many lncRNAs found in multiple copies • (lncRNA families) • - found 14936 homologs • correspondingto38910exons • Annotationsofdiscovered homologs • are readilyavailable

  11. Human/Mouse Exon Number Conservation • Differencebetweenthenumber of exons in thehumantranscripts and in the mouse homologs • “0” meansthattheexonnumberisthesame • Negativebinsindicate mouse homologshaving more exonsthanthehumanquery • 1160 GENCODE v10 transcriptsfind at least 1 homolog in mouse withthesameexonnumber human > mouse human < mouse

  12. Homologrepeatcoverage • Wecould sub-groupthehomologs • in 3 set accordingwiththerepeat • coverage: • <= 20 • < = 80 • < = 100

  13. Mapping statistics MOUSE BestCandidates: There are 148transcriptsthathave < 20% repeatcoverage, conservedexonstructure, do notoverlapproteincodingexons and are bestreciprocalhomologs withthehumanqueries

  14. GENECODE lncRNAs Vs Complete Genomes PipeR: lncRNA Homology Mapping • Anchor points: ENCODE vs Mouse with tuned Blast • Extension: Exonerate • Filtering: Id and Coverage • Validation of the GFF annotation Overlap with Annotation Overlap with Cufflink Models RPKM on target genome • FurtherMapping Parameter Space Exploration using Experimental Evidences GFF File Notredame, Bussotti

  15. BlastR vs The World

  16. BlastR vs The World

  17. blastnOpt (12487) a) blastn (8749) Figure 2: Exonreadsupport. Venn-diagramindicatingthenumber of exondetected bydifferentmethods (numbers in parentesis) and theirintersection (transcriptsannotatedidenticallybythethreemethods). Averageamount of reads per exons Percent of readscoveredby at leastoneexon all (7492) blastr (12093) b) c)

  18. Part 2: Ensembl.v65 lncRNAs screening vs human and mouse genomes Strategy: PipeR one2many homologassignment Template: PipeRParameters: Blast - Freyhultparametrization - Lower case masking - Low complexitymasking Exonerate - est2genome model - 70% coveragerequired - seedextension 2X (thespan of thegenomicsize of thequeryonbothsides)

  19. Ensembl.v65 vs human genome • mapped1187 transcripts out of 5669 • many lncRNAs found in multiple copies • (lncRNA families) • - found 13193 homologs • correspondingto46770 exons • Annotationsofdiscovered homologs • are readilyavailable

  20. Ensembl.v65 vs mouse genome • mapped5622 transcripts out of 5669 • many lncRNAs found in multiple copies • (lncRNA families) • - found 41005 homologs • correspondingto121515 exons • Annotationsofdiscovered homologs • are readilyavailable

  21. Mouse/Human Exon Number Conservation • Differencebetweenthenumber of exons in the mouse transcripts and in thehumanhomologs • “0” meansthattheexonnumberisthesame • Negativebinsindicatehumanhomologshaving more exonsthanthe mouse query • 481 Ensemblv65 transcriptsfind at least 1 homolog in human withthesameexonnumber mouse < human mouse > human

  22. Homologrepeatcoverage • Notobserved a peak of homolog • predictionsfullycoveredbyrepeats

  23. Ensemble.65 and GENCODEv10 repeat coverage • Input lncRNAdatasetshave similar repeatdistributions

  24. Mapping statistics HUMAN MOUSE

  25. Part 3: GENCODE v10 lncRNA codingpotential check Strategies: 1) GeneId ORF scorecomparisonbetweenmRNAs and lncRNAs 2) BlastXagainst human proteins (ensembl 65) 3) Overlapwithproteincoding gene exonannotations (gencodeV10) 4) PipeRfilteringroutines

  26. 1) ORF scores as returned by GeneID 2) blastXagainst human proteinsindicatesthat1202 GENCODE v10 lncRNAs match proteins Parameters: seglowcomplexityfiltering, repeatfiltering , evalue 10e-10, searchjustthe plus strand. Human Ensembl 65 protein set

  27. 3) • Checkedtheoverlapbetween GENCODE v10 lncRNA exons • and GENCODE v10 proteincodingexons. • - Found846lncRNAhaving at leastoneexonoverlappingwith a proteincoding gene exon Example 1 Example 2

  28. 4)Extensivefiltering • 7813 GENCODE v10 transcriptspassed *ALL* PipeRfilteringroutines • Filtering rules: • - overlapwithproteincodingexons • - geneID ORF score similar totheones of mRNA • - blastXtouniprotdatabase (50% redundancy) • - blastXtonrdatabase • - rpsBlasttopfamdomainfamilies • - blastagainstRfam

More Related