1 / 9

Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization

Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics). The spectrum of experimental Biology Practice – Theory. Theory (population/statistical genetics) Theory: 80+ years of Mathematical Biology Methods: Ag,RFLPs,SNPs …. Practice

mareo
Download Presentation

Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics: Theory and Practice – Striking a Balance(a plea for teaching, as well as doing, Bioinformatics) The spectrum of experimental BiologyPractice – Theory Theory (population/statistical genetics) Theory: 80+ years of Mathematical Biology Methods: Ag,RFLPs,SNPs… Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization Experiment as “Art” Current practice My ideal Bioinformatics Theory: 40 years of algorithms, information theory 20+ years of statistics

  2. Teaching and Bioinformatics What is the goal? • Learning Biology / learning Computer Science • Becoming "computer literate"scripting/programming • Exploring uncertainty • experimental shortcomings • computational biases • Utility – getting something done Bioinformatics is challenging because biology is complicated and idiosyncratic

  3. Biology: A “clean” experiment –Internal positive and negative controls Southern blot of human class-mu Glutathione transferase genes from individuals with low (-) or high (+) GT-tSBO activity. RFLP independent of GT-tSBO Bands found with high GT-tSBO (GSTM1) • When GSTM1 is present, it is detected • When it is not detected, it is absent

  4. Bioinformatics –ambiguity or computational error? • D3BUQ5 is “clearly” homologous to GSTA6_RAT, aligning from beginning to end • Does it have a GST_C domain? • Does it have glutathione transferase activity? • Could it be a steroid isomerase? Prostaglandin synthetase?

  5. Why is Bioinformatics “hard”? Bioinformatics is at the intersection of Biology, Computer science, and Statistics • What is interesting to Computer Scientists, – algorithms, optimality – is less relevant to Biologists (text book bias) • “irrelevant” parameters for Computer Scientists – DNA vs protein – are important in practice • Statistics are central, and the statistical perspective is not well integrated into either Biology or CS curricula • The biological assumptions behind a “null hypothesis” are rarely explicit and often idealistic • Biologists do experiments (CS folks like theory). If it works, use it. Bioinformatics uses "hard/true/reproducible" techniques to solve "soft/ambiguous/varying" biological questions. A teaching "opportunity"

  6. Alberts is wrong about sequence similarity(three times in three claims) “With such a large number of proteins in the database, the search programs find many nonsignificant matches, resulting in a background noise level that makes it very difficult to pick out all but the closest relatives. Generally speaking, one requires a 30% identityin sequence to consider that two proteins match. However, we know the function of many short signature sequences ("fingerprints"), and these are widely used to find more distant relationships.” – Alberts, Molecular Biology of the Cell (5thed, 2007) p. 139 • Sequences producing statistically significant alignments ALWAYS share a common structure • Many significant alignments share < 30% identity (<25% identity is routine, and <20% identity can be significant) • In the absence of significant similarity, “fingerprints” should never be trusted.

  7. How can we teach better? • Discuss the strengths and weaknesses of data resources • Examine how published protocols go out of date (or are optimized for different problems). Examine potential weaknesses – what do the protocols assume? • Review high-profile papers with mistaken conclusions to understand what went wrong.

  8. Biology 4XXX – Bioinformatics and Functional Genomics3hr lecture, 1hr lab • Introduction to Unix / perl (python) scripting / web resources • programming by imitation • similarity searching / domain identification • homology, scoring matrices • errors in domain annotation (why) • multiple sequence alignment • sequences vs domains • evolutionary tree-building • finding the best tree • evaluating alternative trees • where is the uncertainty (why) • Introduction to 'R' statistical language • programming by imitation • Expression analysis • read mapping, read counting • Motif extraction, mapping • motif independence? • Pathway analysis – gene enrichment • Gene models and alternative splicing • which gene/splicing models supported?

  9. Computational and Comparative GenomicsOct 29 – Nov 4, 2014(application deadline July 15, 2014)

More Related