1 / 12

Detection of Transcription Factor Binding Sites

Detection of Transcription Factor Binding Sites. Michael Morra CSE 4939W. Project Recap. Implement a method used to accurately and precisely discover the locations of transcription factor binding sites within a DNA sequence. 4 species (Human, Mouse, Fruit Fly & Yeast)

quant
Download Presentation

Detection of Transcription Factor Binding Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detection of Transcription Factor Binding Sites Michael Morra CSE 4939W

  2. Project Recap • Implement a method used to accurately and precisely discover the locations of transcription factor binding sites within a DNA sequence. • 4 species (Human, Mouse, Fruit Fly & Yeast) • 52 Transcription Factors, 524 binding sites Image from: http://www.cs.uiuc.edu/homes/sinhas/work.html

  3. Multiple Sequence Alignment • To be able to analyze the data effectively, each transcription factor’s binding sites need to be aligned • ClustalW2 >s1 GACTTTTCGCT >s2 CGATTTTCTCG >s3 GCATTTTCCCA >s4 AGAGAAAACCC >s5 GAATAACCCAAGAGAAA >s6 ACAGAAAAATC >s7 CGAGAAAATCG >s8 TGGTTTTCCCG >s9 GGGTTTCTCCC

  4. Scoring Berg and von Hippel method l = length of the sequence to be scored j = position in the sequence nj = number of times a base occurs at position j in the alignment tj = base at position j in the sequence to be scored nj(0) = most common base at position j

  5. Implementation • Microsoft Visual Studio - C++ • Input • Multiple Sequence Alignment of a transcription factor’s binding sites (.txt file) • All binding sites of a species (.txt file) • Output • Scores • Results of Leave One Out Cross Validation • Testing and Efficiency purposes

  6. Implementation • Scoring Algorithm • Input: Alignment • Function: Create the scoring matrix • Leave One Out Cross Validation • Input: Alignment and Binding Sites • Function: Test the effectiveness of the scoring matrix

  7. Functionality • Sequence to be scored is shorter than the alignment • Slide the sequence over the alignment and take the highest scoring portion • Sequence to be scored is longer than the alignment • Slide the alignment over the sequence and take the highest scoring portion

  8. TestingScoring Algorithm/LOOCV Unit testing will be done on each function and critical portions of code as they are implemented Once it is determined that the code is functioning correctly and all formulas are providing correct results, implementation can continue

  9. TestingOverall Performance To determine the effectiveness of the algorithm, a cross validation technique is used This technique involves leaving one binding site out when the multiple sequence alignment is performed, and then scoring that left out sequence If the algorithm is effective, the left out sequence should score higher than the majority of other binding sites within that species. (>80-90%)

  10. Progress • Alignments • Complete • Scoring Algorithm • Mostly Complete • Leave One Out Cross Validation • Partially Complete

  11. Remaining Schedule • Nov 15th – Nov 19th • Finish implementation and testing of scoring algorithm • Nov 20th – 29th • Finish implementation of leave one out algorithm • Begin testing of entire program’s effectiveness • Nov 30th – Dec 6th • Complete testing • Tweak program to run more effectively/accurately

  12. Questions?

More Related