Trace recalling on mgc traces
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Trace recalling on MGC traces PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Trace recalling on MGC traces. Dec 21 2005. Why?. There are almost certainly alternately spliced targets in the MGC set that we would like to find Might be able to get some more hits and confirmed hits using trace recalling because of ambiguity sequence alignment. How?.

Download Presentation

Trace recalling on MGC traces

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Trace recalling on mgc traces

Trace recalling on MGC traces

Dec 21 2005


Trace recalling on mgc traces

Why?

  • There are almost certainly alternately spliced targets in the MGC set that we would like to find

  • Might be able to get some more hits and confirmed hits using trace recalling because of ambiguity sequence alignment


Trace recalling on mgc traces

How?

  • Pipeline begins with .blat files generated by Mike

    • Result of BLATing each MGC trace (or the assembled fwd/rev reads) to the human genome

    • Represent a set of loci from which trace sequence could have originated


Trace recalling on mgc traces

How?

  • Extract BLAT aligned sequence + 1000 bp flanking sequence from human genome

  • Run trace recalling between each trace and the corresponding extracted genomic loci

  • Adjust scores of first alignment (ambig sequence to genome) by adding back scores from intron penalties

    • This lessens bias from processed pseudogenes


Trace recalling on mgc traces

How?

  • Select “correct locus” as the locus that aligns with the highest adjusted score

    • For the rest of the analysis this is the only locus that is considered

  • Apply hit criteria to each first align file to the correct locus

    • Spliced alignment (at least 1 intron)

    • > 60% of splice sites have at least 8 matches in a 10 bp window around the splice site

    • Overall percent identity > 75%


Trace recalling on mgc traces

How?

  • Last step classifies each trace as a hit or a non-hit

  • Lift coordinates of alignment to extracted genomic fragment back to genomic coordinates

  • Hit becomes confirmed if there is at least a 1 bp overlap to the targeted predicted gene


Trace recalling on mgc traces

How?

  • As part of trace recalling each read/genomic fragment is flagged if an alternate splice is observed

    • Compare alignment of ambiguity sequence and alignment of recalled sequence to determine if there is an alternate splice

    • example


Trace recalling on mgc traces

How?

  • Analysis splits at this point between:

    • Comparing hit, confirmed hit, non-hit status of reads to original pipeline

    • Trying to find alternate splices in the whole set of traces


Results

Results

  • Comparison of hit, confirmed hit, non-hit status*

    • 120 experiments went from non-hit to confirmed hit

    • 37 experiments went from non confirmed hit to confirmed hits

      * this part isn’t quite done and some of the non-hit  confirmed hit cases look a little funny


Results1

Results

  • Finding alternate splices

  • Trace recalling identifies 622 alternate splicing events in the MGC set

    • Retained intron: 148

    • Alt 3’ ss: 40

    • Alt 5’ ss: 36

    • Alt splice both sides: 56

    • Alternate exon: 103

    • Clean alternate exon: 189

    • Mutex exon: 26

    • Clean mutex exon: 23


Results2

Results

  • Finding alternate splices

  • Trace recalling identifies 622 alternate splicing events in the MGC set

    • Retained intron: 148

    • Alt 3’ ss: 40

    • Alt 5’ ss: 36

    • Alt splice both sides: 56

    • Alternate exon: 103

    • Clean alternate exon: 189

    • Mutex exon: 26

    • Clean mutex exon: 23

    • The projector in Bryan 509 working: priceless


Results3

Results

  • 288 of these are what I consider the “hard” altsplices to get (clean alt exon, clean mutex exon, individual 3’ or 5’ splice sites)

  • Wanted to validate these predictions somehow

    • Would normally go back to known gene but if there was a known gene it wouldn’t be an MGC target!


Results4

Results

  • Look at cases where the same type of altsplice is observed on both reads

  • There were a total of 72 experiments in which the same altsplice is observed on both reads (high confidence altsplices)

  • Example


Trace recalling on mgc traces

Exact same alternate exon represented in both reads


Results5

Results

  • Breakdown of validated altsplices by type

    • Clean alternate exon: 39

    • Alternate exon: 6

    • Retained intron: 16

    • Alt 5’ ss: 3

    • Alt 3’ ss: 6

    • Clean mutex exon: 2


Results6

Results

  • Flagged altsplices which were not validated (low confidence altsplices) could be:

    • Mistakes

    • Reads didn’t overlap

    • Didn’t see both sides of an alternate splice

    • One good read and one read that totally failed

    • Might be slightly different types (eg a clean alternate exon and an alternate exon)

  • Examples


Trace recalling on mgc traces

A slight misalignment causes one read to be flagged as an “alternate exon” and the other to be flagged as a “clean alternate exon”… the black one is probably right


Trace recalling on mgc traces

Recalled sequence picks up where you would expect if the single trace part were corrupted by noise

12 bp alternate splice site


Results7

Results

  • Looked at 40 examples of low confidence hits

    • 26 of them looked like they fell into one of the last 3 categories from before

    • 14 looked like actual miscalled alternate splices


To do

To do

  • Modification to trace recalling which might clean up the alignments a bit more

  • Define something like the hit criteria for MGC alignments to take into account the number of matches in the trace recalling alignments (look at old E-value stuff)


  • Login