1 / 23

CSCE555 Bioinformatics

CSCE555 Bioinformatics. Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555. University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. Roadmap.

nasia
Download Presentation

CSCE555 Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu.

  2. Roadmap • Transcription and Translation • Structure and Organization of Genes • Gene Finding in genomes of Prokaryotic organisms • Introduction to Sequence Alignment • Summary

  3. How to Do Great Bioinformatics? • You need to understand biology • You need to understand the NEEDS of biologists • You know how to identify the key problems in biology that become addressable today

  4. Transcription & Translation Eukaryotic Cells Prokaryotic Cells

  5. Transcription Process: RNA Polymerase

  6. Translation: How Ribosome Synthesizes Proteins Genetic Code Ribosomes manufacture proteins based on mRNA instructions. Each ribosome reads mRNA, recruits tRNA molecules to fetch amino acids, and assembles the amino acids in the proper order.

  7. Genetic Code

  8. Gene Structure of Prokaryotic Cells TAA TGA TAG

  9. Genes in Eukaryotic Cells

  10. Pre-mRNA Splicing Process

  11. 1M Alternative Splicing Gene Info:1) A DNA sequence coding for the pre-mRNA 2) An additional DNA code or other regulating process, which regulates the alternative splicing.

  12. Core Promoter Structure

  13. Roadmap • Transcription and Translation • Structure and Organization of Genes • Gene Finding in genomes of Prokaryotic organisms • Introduction to Sequence Alignment • Summary

  14. How to Find Genes TAA TGA TAG ATG

  15. Gene-Finding Algorithm • Input: DNA sequences, a threshold gene length K • Output: All possible ORF sequences • Procedure: • Scan each of 3 ORFs, and find subsequence that start with ATG and end with one of (TAA, TAG, TGA) • Repeat above for the complementary sequences also

  16. Risk of the Simple Gene Finding Algorithm • The identified ORFs may arise just from randomness. • How likely is it for an ORF to be a result of random sequences? • Significance of an ORF to be Gene: • We expect the likelihood of ORF being result of random sequences to be less than p.

  17. Calculating p • 3 out of 64 are stopping condons • P( run of k non-stop condons)=(61/64)^k • (61/64)^62=0.051 • Setting k=64 (62+1 ATG+ 1 StopCondon) will make sure the identified ORFs are less likely to be out of random permutation.

  18. Permutation Test/Randomization Test • A generic method to estimate significance level (p value) • Example: how likely that a 10-condon ORF is result of random permutation? • Method: • Randomly generate (or permute given sequences) 10,000 sequences • Draw a histogram of seq lengths of sequences that have a stop-condon (Null distribution) • Calculate the percentage of random ORFs that have lengths >=10.

  19. Estimating cut-off K for gene finding algorithm • Exact theoretical calculation: sensitive to the assumptions, equal probability of condons, etc • Randomized test: do a permutation test, find a length k such that <5% of random ORFs have lengths greater than k.

  20. Sequence Alignment: the Problem • Given two sequences, measure their similarity • ATAACTTTAATTAA • ATCCTTTTACTAAA

  21. Web Tool to Align Two Sequences • http://www.ebi.ac.uk/emboss/align

  22. Applications of Sequence Alignment • Prediction of functions of (gene/protein/promoters)  homology • Database search • Find similar sequences that are similar to our query sequence (e.g. new gene) • Gene finding by genome comparison • Sequence divergence/phylogeny • Sequence Assembly

  23. Summary • Transcription, Translation • Gene structures of Prokaryotic and Eukaryotic cells • Finding genes (ORFs) for prokaryotic cells • Sequence alignment applications

More Related