finding approximate palindromes in genomic sequences
Download
Skip this Video
Download Presentation
Finding approximate palindromes in genomic sequences.

Loading in 2 Seconds...

play fullscreen
1 / 12

finding approximate palindromes in genomic sequences. - PowerPoint PPT Presentation


  • 258 Views
  • Uploaded on

Finding approximate palindromes in genomic sequences. Project goals. Implementation of an algorithm for finding approximate palindromes in genomic sequences. Usage of the algorithm for purposes of creating “palindrome fingerprints” .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'finding approximate palindromes in genomic sequences.' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
project goals
Project goals
  • Implementation of an algorithm for finding approximate palindromes in genomic sequences.
  • Usage of the algorithm for purposes of creating “palindrome fingerprints” .
  • Develop methods for testing the significance of specific approximate palindromes.
background
Background
  • Palindrome- A double strand DNA locus whose 5\'-to-3\' sequence is identical on each DNA strand. The sequence is the same when one strand is read left to right and the other strand is read right to the left.
  • Alternatively looking on one strand of the DNA the definition for palindrome is:

A region of sequence, that when it’s been read left to right it is complementary to the sequence that been read right to left (A match T, and C match G).

slide4
Approximate Palindromecontain a certain number of

mismatches and allow gap.

“palindrome fingerprints”-Each DNA sequence has

it’s unique number, sizes of palindromes, and location in

sequence.

slide5
Important biological roles:
  • gene annotation.
  • transcription-binding sites.
statistical model
Statistical model
  • n – length of string.
  • l – length of palindrome (not including the gap).
  • G – maximum length of gap.
  • y – max number of mismatches allowed.
  • x- number of mismatches

p- number of palindromes in a string of length n

slide7
Calculating the probability to a find a specific palindrome of length l, k times in a string of length n.
application
Application
  • Implemented in C
  • Input:

1) Sequence, genome of different organisms, text file in a FASTA format .

2) Length of palindrome (one side).

3) Maximum gap between repeated regions.

4) Number of mismatches allowed.

  • Output - all the palindromes within a specified length range and also a range of mismatch.
slide9

The Algorithm:> Search for the palindrome within a “window”, in the size of MaxSize.> Each iteration incrementing the size of palindrome, until MaxSize is reached. > Shift left of the window.

slide10
Algorithm Testing

“Plant” an approximate palindrome in different genomes and compare the results with our expectations.

Compare our formula expectation with the result of several random sequences

slide11
Practical usage of the Algorithm

Compare the palindrome profile of different organisms and evaluate the results:

  • Genome from different bacteria.
  • Same gene, for example: hemoglobin, insulin in different mammals.
  • Gene families, for example: Histones, Immunoglobins.
ad