1 / 11

Using cDNA sequence quality value to improve cDNA-genomic sequence alignment

Using cDNA sequence quality value to improve cDNA-genomic sequence alignment. Chaochun Wei Lab Meeting 10/19/2005. Motivation. High quality spliced alignments are critical to many bioinformatics application. (MGC clone quality validate, TWINSCAN_EST) Currently, reliable spliced alignments

rance
Download Presentation

Using cDNA sequence quality value to improve cDNA-genomic sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using cDNA sequence quality value to improve cDNA-genomic sequence alignment Chaochun Wei Lab Meeting 10/19/2005

  2. Motivation • High quality spliced alignments are critical to many bioinformatics application. (MGC clone quality validate, TWINSCAN_EST) • Currently, reliable spliced alignments • Same organism • Low sequencing error rate

  3. pairHMM Using Quality Sequence • Input: • Genomic sequence • cDNA sequence • cDNA quality value sequence • Output • Spliced alignment of cDNA and Genomic Sequence.

  4. An Example of Quality Value Sequence • >gnl|ti|154040434 name:UI-H-EI1-ayz-b-20-0-UI.s1 • 8 9 11 18 19 29 29 34 29 32 32 27 27 26 20 26 27 33 33 39 • 39 33 33 29 32 29 29 37 37 37 40 40 40 40 40 40 37 51 51 35 • 35 35 35 35 33 29 29 30 30 30 35 35 45 40 40 37 37 46 46 46 • 56 40 40 40 40 51 51 51 51 51 51 56 56 56 56 56 56 56 56 56 • 56 56 56 56 56 56 56 56 56 56 40 40 35 35 35 35 35 35 35 35 • 37 37 37 42 40 40 40 46 46 42 42 56 48 48 56 42 46 46 40 35 • 40 45 37 40 51 51 51 51 51 42 51 56 56 42 40 40 40 40 35 42 • 31 31 14 15 23 33 40 46 40 40 40 40 40 40 42 42 56 48 44 44 • 44 42 40 37 37 37 40 42 42 35 35 35 42 42 42 42 42 44 56 42 • 42 40 35 26 25 15 15 25 27 42 44 48 37 37 35 35 35 29 37 33 • 33 33 32 29 29 27 27 29 24 22 25 40 40 29 29 29 25 25 29 29 • 40 40 40 40 34 34 33 34 40 40 40 40 32 32 18 18 18 31 24 20 • 13 18 18 25 31 40 29 29 29 29 13 11 9 11 11 12 12 25 27 27 • 28 24 25 23 29 29 29 29 28 32 23 19 22 23 27 27 32 32 29 29 • 32 34 40 40 40 40 40 40 40 46 44 44 40 27 27 25 19 19 23 28 • 36 40 31 29 22 22 25 19 19 16 16 16 25 22 21 21 29 30 29 29 • 29 32 27 25 22 22 25 27 29 22 25 27 24 20 13 4 0 4 13 13 • 13 11 13 13 13 19 15 15 24 24 16 22 29 29 25 29 27 27 27 27 • 29 25 29 34 29 29 25 26 36 26 25 18 13 13 18 27 13 12 9 9 • 8 16 20 24 29 23 30 21 24 24 29 25 24 16 20 21 11 11 17 18 • 24 25 19 14 21 11 7 6 6 7 6 6 10 12 10 15 15 9 9 10 • 18 16 18 14 12 20 20 21 14 11 10 8 12 13 15 15 10 11 15 8 • 14 13 10 10 10 9 9 8 9 8 8 8 6 6 9 9 9 8 6 6 • 4 0 4 6 8 8 8 8 15 4 0 4 13 13 9 8 8 7 6 6 • 6 7 7 8 8 8 9 8 7 7 10 7 7 9 10 8 6 6 6 7 • 6 6 7 7 4 0 4 6 6 4 0 4 6 4 7 7 8 7 7 7 • 4 0 4 4 4 4 8 4 0 4 7 7 7 7 8 7 7 8 8 8 • 7 7 8 4 0 4 6 6 6 8 6 4 6 4 0 4 7 7 7 7 • 4 0 4 7 7

  5. PairHMM using Quality Value Sequence End Begin

  6. RG EG qual EC RG EG qual EC Graphical Model for States in PairHMM with Quality Value Sequence Model Null-Model RG: Genomic sequence EG: EST/cDNA sequence EC: EST base call qual: Quality value

  7. Graphic Model of the States in PairHMM with Sequence Quality Value Model Null-Model Score

  8. Initial Parameter Estimation • From Phred paper: • From dbSNP human data:Pr(RG|EG) • From human genome: Pr(RG)

  9. Initial Parameter Estimation

  10. Data Sets • NCBI35 Chr20, 21, 22 • Human reads (9/15/2005) aligned to Chr20, 21 and 22 by BLAT. (total 23,753 ESTs )

  11. Results • INTRON# INTbases GGAP# GGbases EGAP EGbases MATCHbase MISMATCHbase EXPLAINED • qpair • 40286 138858553 14502 23940 25945 29386 16158984 24991913376 • est2gen • 35785 229102950 68700 78753 282099 296870 16674021 384138 12591 • sim4 • 40348 476722905 56903 292246 106106 27211366 15347251 254234 12536 • consider mismatches with quality value <=5 as explained • qpair • 40286 138858549 14502 23940 25945 29386 16158984 249919 39683 • est2gen • 35785 229102946 68700 78753 282099 296870 16674021 38413857544 • sim4 • 40348 476722901 56903 292246 106106 27211366 15347251 254234 38794

More Related