1 / 86

Chapter 6 Genomics and Gene Recognition

Chapter 6 Genomics and Gene Recognition. 暨南大學資訊工程學系 黃光璿 (HUANG, Guan Shieng) 2004/04/26. Motivation. Cells can determine the beginnings and ends of genes. How can we identify genes algorithmically? prokaryotic genomes eukaryotic genomes. Review. DNA Sequencing.

morna
Download Presentation

Chapter 6 Genomics and Gene Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6Genomics and Gene Recognition 暨南大學資訊工程學系 黃光璿 (HUANG, Guan Shieng) 2004/04/26

  2. Motivation • Cells can determine the beginnings and ends of genes. • How can we identify genes algorithmically? • prokaryotic genomes • eukaryotic genomes

  3. Review

  4. DNA Sequencing • Determine the order of nucleotides in a DNA fragment • Maxam-Gilbert method, 1970 • Sanger’s Chain-termination method

  5. Base-calling • Phred program • Developed at the University of Washinton in 1998, can convert traces (analog signals) into sequences (digital signals). • <50: noisy • >800: signals declined

  6. High-throughput Sequencing • Four-color fluorescent dyes have replaced the radioactive label. • Reads greater than 800 bp are possible, though 500~700 is more common. • Applied Biosystem's ABI PrismTM 3700 • six 96-well plates per day • 96 X 6 X 800 ~ 0.5 M • Amersham Pharmacia's Mega BASE 1000TM

  7. 6.1 Prokaryotic Genomes Should contain at least information to • make and replicate its DNA; • make new proteins; • obtain and store energy.

  8. 6.1.1 Contig Assembly

  9. TIGR (The Institute for Genome Research) • have made bacterial genome sequencing as a cottage industry • Example • bio-terrorism mailings (anthrax strains,炭疽病株), late 2001.

  10. 6.2 Prokaryotic Gene Structure

  11. 6.2.1 Promoter Elements • promoter • a binding site in a DNA chain at which RNA polymerase binds to initiate transcription of messenger RNA by one or more nearby structural genes

  12. 6.2.1.1 RNA polymerases • β’: to bind to DNA template • β: to link one nucleotide to another • α: to hold all subunits together • σ: to recognize the specific nucleotide sequences (which is less conserved)

  13. 6.2.1.2

  14. 6.2.1.3 • consensus sequence • recognized by the same σ-factor • agree for many different genes • operon • the set of genes with related functions • regulatory proteins • positive regulator  enhance (強化) • negative regulator  repress (抑制), attenuate (減弱)

  15. lactose (乳糖) operon (操縱子) (in E. coli) • beta-galactosidase (z) • lactose permease (y) • lactose transacetylase (a)  One long polycistronic RNA makes all three proteins.

  16. 6.2.1.4 E. Coli’s Lac Operon • σ70 • Most efficiently expressed only when a cell’s environment is rich in lactose (乳糖) and also poor in glucose (葡萄糖) • lactose   combined with negative regulator pLacI  gene expressed! • glucose  positive regulator CRP  gene enhanced!

  17. 6.2.2 Open Reading Frames • stop codons • UAA, UAG, UGA • (1 - 3/64)N = 0.05  N~63 • E. Coli • average length = 316.8 codons, 1.8% shorter than 60 codons • Open Reading Frame (ORF) • continuous triplet codons without stop codon

  18. start codon • AUG • E. Coli • AUG ~ 83%, UUG ~ 17% • How to determine the starting position for translation? • start codon • Shine-Delgarno sequence • A,G-rich region serves as ribosome loading sites • E.g., 5’ – AGGAGGT – 3’

  19. 6.2.4 Conceptual Translation

  20. 6.2.4 Termination Sequences (refer to transcription) • > 90% prokaryotic operons contain intrinsic terminators • inverted repeat (7~20 bp, G-C rich) (e.g., 5’- CGGATG|CATCCG-3’) • ~ 6 U’s following the inverted repeat • cause RNA polymerases to pause ~ 1 min (RNA polymerases incorporate ~ 100 nt/sec)

  21. 6.3 GC-Content in Prokaryotic Genomes • G/C to A/T relative ratio • recognized as a distinguishing attribute of bacterial genomes • GC: 25% ~ 75%, wide range • GC-content of each bacterial species • seems to be independently shaped by mutational biases

  22. GC-contents are generally uniform throughout bacteria’s genomes • horizontal gene transfer • the movement of genetic material between bacteria other than by descent in which information travels through the generations as the cell divides  GC-contents reflect the evolutionary history of the bacteria

  23. Prokaryotic Gene Density • 85%~88% are associated with the coding regions • E. Coli • 4288 genes, average length 950 bp, separated by 118 bp.

  24. Finding genes in prokaryotic genomes is relatively easy. • Long open reading frames (>60); • Matches to simple promoter sequences; • Transcriptional termination signal; • Comparisons with the nucleotide sequences of known protein coding regions from other organisms.

  25. 6.5 Eukaryotic Genomes • Differences (to prokaryotic genomes) • Internal membrane-bound compartments allows them to maintain a wide variety of chemical environment. • eukaryotes  Multicellular organisms, each cell type usually has a distinctive pattern of gene expression. • relatively little constraint on the size of their genomes  gene expressions, more complicated & flexible

  26. 6.6 Eukaryotic Gene Structure • 1000 times harder than finding a needle in a haystack??? • Long open reading frames • is not appropriated since introns exist.

  27. Grail EXP & GenScan • Rely on neural network and dynamic programming. • prediction < 50%

  28. Detecting features include • promoter • a series of introns/exon boundaries • putative ORF with codon usage bias

  29. 6.6.1 Promoter Elements • prokaryotes • single RNA polymerase • eukaryotes • three kinds of RNA polymerases

  30. RNA polymerase I, III • are needed at fairly constant levels in all eukaryotic cells at all times.

  31. RNA polymerase II • basal promoter • RNA polymerase II initiation complex is assembled and transcription begins. • upstream promoter elements • protein binding • Have been estimated that at least 5 upstream promoter elements are required to uniquely identify the genes.

  32. RNA polymerase II does not recognize the basal promoter directly. • basal transcription factors • TATA-binding protein (TBP) • at least 12 TBP-associated factors (TAFs) • TATA-box for eukaryotes (-25) • 5’ – TATAWAW – 3’ (W= A or T) • initiator (Inr) sequence • 5’ – YYCARR – 3’ (Y=C or T, R=A or G)

  33. Transcription factor differences • cause tissue-specific expression of some gene.

  34. 6.6.2 Regulatory Protein Binding Sites • bacteria • RNA polymerases have high affinity for promoters. • emphasis on negative regulation • eukaryotes • RNA polymerases II & III do not assemble around promoters very efficiently. • additional emphasis on positive regulations

  35. Transcription Factors • constitutive • Do not respond to external signal. • regulatory • Do respond to external signals. • sequence-specific DNA-binding protein

  36. 6.7 Open Reading Frames • Nuclear membrane • separates the process of transcription and translation. • DNA  hnRNA (heterogeneous RNA)  mRNA • translation • capped, spliced, poly-A • capped: chemical alteration (e.g., methylation) • splicing: removal of introns • polyadenylation: ~ 250 A’s at the 3’ end

  37. Splicing causes a serious problem for gene recognition algorithm.  Do not have to posses the statistically significant long ORFs.

More Related