Finding approximate palindromes in genomic sequences. Project goals. Implementation of an algorithm for finding approximate palindromes in genomic sequences. Usage of the algorithm for purposes of creating “palindrome fingerprints” .
A region of sequence, that when it’s been read left to right it is complementary to the sequence that been read right to left (A match T, and C match G).
mismatches and allow gap.
“palindrome fingerprints”-Each DNA sequence has
it’s unique number, sizes of palindromes, and location in
p- number of palindromes in a string of length n
1) Sequence, genome of different organisms, text file in a FASTA format .
2) Length of palindrome (one side).
3) Maximum gap between repeated regions.
4) Number of mismatches allowed.
The Algorithm:> Search for the palindrome within a “window”, in the size of MaxSize.> Each iteration incrementing the size of palindrome, until MaxSize is reached. > Shift left of the window.
“Plant” an approximate palindrome in different genomes and compare the results with our expectations.
Compare our formula expectation with the result of several random sequences
Compare the palindrome profile of different organisms and evaluate the results: