1 / 33

Stat 877(992)

Stat 877(992). Statistical methods in molecular biology. Course plans. Team taught: Newton, Larget, Ane, Keles, Kendziorski, Broman, Yandell Per instructor homework set (six at 12pts each) Final project, poster presentation (28 pts). National Research Council Report, 2004

Download Presentation

Stat 877(992)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 877(992) Statistical methods in molecular biology

  2. Course plans • Team taught: Newton, Larget, Ane, Keles, Kendziorski, Broman, Yandell • Per instructor homework set (six at 12pts each) • Final project, poster presentation (28 pts)

  3. National Research Council Report, 2004 Mathematics and 21st Century Biology “Progress in the biosciences will increasingly depend on deep and broad integration of mathematical analysis into studies at all levels of biological organization…: molecules, cells, organisms, populations, and Ecosystems.” “The committee regards the interface between mathematics and biology as biology-driven.”

  4. Some definitions [first approximations!] cell structural/functional unit of all living organisms protein organic compound produced and used by cell amino acid protein building block nucleic acid chainlike molecule involved in preservation, replication, and expression of hereditary information in every living cell nucleotide nucleic acid building block

  5. Example function: oxygen transport 2 x 10^6 new cells/second 2-3 x 10^13 red blood cells/body 95% of dry weight is protein hemoglobin

  6. hemoglobin more about hemoglobin

  7. sequence of amino acids in hemoglobin • alpha chain (141 amino acids) [2 subunits] • VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHGKKVADGLTLAVGHLDDLPGALSDLSNLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR • beta chain (146 amino acids) [2 subunits] • VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDPENFRLLGNVLALVVARHFGKDFTPELQASYQKVVAGVANALAHKYH

  8. V = Val = Valine L = Leu = Leucine M = Meth = Methionine A few amino acids (among 20 standard) more about amino acids

  9. Amino acids are concatenated into protein by the translation of information stored in messenger RNA Ribonucleic acid (RNA) Nucleotide bases A = adenine C = cytosine U = uracil G = guanine single stranded

  10. Met Thr Glu Leu Arg Ser stop Amino acids are concatenated into protein by the translation of information stored in messenger RNA (mRNA) Ribonucleic acid (RNA) Nucleotide bases A = adenine C = cytosine U = uracil G = guanine

  11. Amino acids are encoded by triples of mRNA nucleotides called codons more about the genetic code

  12. Translation: mRNA to protein via ribosome & tRNA Base pairing A-U, G-C video podcast of translation

  13. orientation 5’ to 3’ UTR = untranslated region: mRNA stability mRNA localization translational efficiency mRNA structure

  14. Mature mRNA may have been processed by splicing a primary transcript (pre-mRNA)

  15. Primary transcripts are produced by the transcription of DNA Deoxyribonucleic acid (DNA) double stranded 4 nucleotide bases ATGC base pairing: A-T, C-G

  16. initiate elongate terminate Transcription: DNA to RNA via RNA polymerase

  17. Central dogma of molecular biology

  18. Replication: DNA copies itself during cell division

  19. More on organization of DNA

  20. Chromosomes are organized structures of DNA and proteins that are found in cells. Each chromosome contains a single continuous piece of DNA. In diploid species, chromosomes are paired.

  21. Humantotal number chromosome base pairs 1 247,200,000 2 242,750,000 3 199,450,000 4 191,260,000 5 180,840,000 6 170,900,000 7 158,820,000 8 146,270,000 9 140,440,000 10 135,370,000 11 134,450,000 12 132,290,000 13 114,130,000 14 106,360,000 15 100,340,000 16 88,820,000 17 78,650,000 18 76,120,000 19 63,810,000 20 62,440,000 21 46,940,000 22 49,530,000 X (sex chromosome) 154,910,000 Y (sex chromosome) 57,740,000 A genome equals the sequence of one full copy 3 Gbp, or 100 yrs at 1bp/second Estimates from Sanger’s Vertebrate Genome Annotation (VEGA) database, 7/07

  22. 2001: drafts of the human genome sequence published 1 % of bases are in exons 24 % of bases are in introns 2007: pilot phase of ENCODE project completed Encyclopedia Of DNA Elements majority of bases are transcribed extensive transcript overlap functions poorly understood

  23. Evolving definition of gene 1860s-1900s: a discrete unit of heredity (Mendel) 1910s: a distinct locus (Morgan) 1940s: the blueprint for a protein (Beadle & Tatum) 1960s: a transcribed code (Watson & Crick) Genome era: a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions

  24. Figure 5"> Figure 5 Mark B. Gerstein et al. Genome Res. 2007; 17: 669-681

  25. Post ENCODE The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products Gerstein et al 2007

  26. What about Statistics?

  27. Statistics supports the development of genomic resources • In accomodating sequencing errors for genome assembly • In rating the significance of sequence matches by alignment algorithms

  28. Statistics supports analyses to determine the function of genes/transcripts/proteins • Gene regulation • Gene expression • Network considerations (many processes/functions) Example: oxygen transport According to the Gene Ontology (GO) project, 46 different genes are involved in this biological process

  29. Statistics is critical in analyzing patterns of genomic variation within populations, and in relating this variation to disease states or other phenotypes • Genomes differ from the reference copy (single nucleotide polymorphisms, structural variants) • Gene mapping by linkage and association methods

  30. Statistics is critical in analyzing patterns of genomic variation between populations/species • Phylogenetic analysis

  31. “Nothing in biology makes sense except in the light of evolution” -T. Dobzhansky Tree of life project

  32. “It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with reproduction; Inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the conditions of life, and from use and disuse; a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less improved forms. Thus, from the war of nature, from famine and death, the most exalted object which we are capable of conceiving, namely, the production of the higher animals, directly follows.” - Charles Darwin

More Related