1 / 21

Genomic Sequence Analysis using Electron-Ion Interaction Potential

Genomic Sequence Analysis using Electron-Ion Interaction Potential. Masumi Kobayashi Performance Evaluation Laboratory University of Aizu. Purpose. To find the gene regions by using Lindley Equation and Electron-Ion Interaction Potential (EIIP).

phiala
Download Presentation

Genomic Sequence Analysis using Electron-Ion Interaction Potential

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Sequence Analysis using Electron-Ion Interaction Potential Masumi Kobayashi Performance Evaluation Laboratory University of Aizu

  2. Purpose • To find the gene regions by using Lindley Equation and Electron-Ion Interaction Potential (EIIP). • To judge similarity of two DNA sequences that shortens the processing time by using Lindley equation and Electron-Ion Interaction Potential (EIIP).

  3. DNA • DNA sequence consists of four nucleotide letters: A(adenine), T(thymine), G(guanine), and C(cytosine). • Base A is always paired with base T, and C is always paired with D, and DNA is double helix.

  4. DNA Sequence and Amino Acid Sequence • A DNA sequence consists of a row of four nucleotides, and each nucleotide triplet is called a codon. And a codon corresponds to an amino acid. DNA Sequence |・・・|ATG|CGA|TAT|AAA|GCT|TTC|・・・| Amino Acid Sequence |・・・| M | R | L | K | A | F |・・・| Codon

  5. Codon • 61 codons are transformed into amino acid.  • For example, both TTT and TTC code for Phenylalanine(F). • 3 codons, TAA, TAG, and TGA are called Stop Codon.

  6. The waiting time of the customer of queuing theory and a DNA sequence • In order to use Lindley equation, we need to describe the relation between the waiting time of the customer of queuing theory and a DNA sequence. • A score is given for the similarity of the amino acid of two target gene sequences, and sum of score is made to correspond to waiting time of queuing theory.

  7. Lindley Equation : The score of the n-th letter. : The sum of the score to the n-th letter. Amino Acid Sequence Negative value

  8. Electron-Ion Interaction Potential (EIIP) • Prof. Toyoizumi and Tuchiya showed a technique to find gene coding regions by using Lindley equation. But there is a problem, the determination of score required for Lindley equation is artificial. • In this research, we decide theoretical score by using Electron-Ion Interaction Potential. Each amino acid is represented by the EIIP value, which describes the average energy states of all valance electrons in particular amino acids.

  9. Gene Finding Experiment • The target sequence of this experiment is the genome data of Escherichia coil O157:H7 Sakai. • Escherichia coil O157:H7 Sakai is a major food-born infection pathogen that causes diarrhea, coilitis, and hemolytic uremia syndrome. • We calculate using Lindley equation and EIIP.

  10. Example of Amino Acid Scores and the Stop Codon Score (1) Score = EIIP - 0.0885 Negative Score Positive Score Stop Codon Score -2 × 0.0085

  11. Example of Amino Acid Scores and the Stop Codon Score (2-1) Score = EIIP – 0.0045 Negative Score Positive Score Stop Codon Score -2 × 0.0445

  12. Example of Amino Acid Scoresand the Stop Codon Score (2-2) Change the Stop Codon Score. -0.089 → -0.178 (-4 × 0.0445)

  13. Threshold of Amino Acid Sequence • may become high by chance in the region that is meaningless at an amino acid sequence. • The threshold is used in order to distinguish from meaningless regions. • The score sequence of an amino acid sequence assumes that it is independent and identically distribution. • can be considered to be the waiting time of GI/GI/1 queuing system.

  14. Threshold and the Probabilitythat will exceed the Threshold accidentally for any then The waiting time GI/GI/1 queuing system fills the following inequalities. is the probability judged to be a meaningful sequence although it is a meaningless sequence. The probability that will exceed (Threshold) by chance is 0.05.

  15. Distinction of gene coding regions and junk regions by Threshold

  16. Similarity Comparison Experiment • The target sequence of this experiment is the genome data of human - and -Hemoglobins. • Hemoglobin is contained in erythrocyte and consists of a “hem” containing iron, and a “globin” which is protein, and has the important role of carrying oxygen inside of the body. • We calculate using Lindley equation and EIIP.

  17. Sequences of Human - and -Hemoglobins • The genome data that we use is a gene coding region of Human - and -Hemoglobins. • A gene coding region of Human -Hemoglobin VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH • A gene coding region of Human -Hemoglobin VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

  18. Amino Acid and the Stop Codon Scores EIIP - 0.0532 -2 × 0.0532

  19. Calculation Results of in -Hemoglobin and -Hemoglobin

  20. The difference (absolute value) of calculation results of in -Hemoglobin and -Hemoglobin

  21. Conclusion • We could find the gene regions from the DNA sequence by Lindley equation and EIIP. • We could show a technique of similarity comparison which shortened the processing time by Lindley equation and EIIP.

More Related