1 / 33

Bioinformatics for Stem Cell Lecture 1

Bioinformatics for Stem Cell Lecture 1. Debashis Sahoo , PhD. Outline. Introduction History of Bioinformatics Introduction to computing Data collection Experiment design Data analysis. Bioinformatics Definition. Biological Data Representation Storage Access Processing

nailah
Download Presentation

Bioinformatics for Stem Cell Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics for Stem CellLecture 1 DebashisSahoo, PhD

  2. Outline • Introduction • History of Bioinformatics • Introduction to computing • Data collection • Experiment design • Data analysis

  3. Bioinformatics Definition • Biological Data • Representation • Storage • Access • Processing • bi·o·in·for·mat·ics[bahy-oh-in-fer-mat-iks] • noun ( used with a singular verb ) the retrieval and analysis of biochemical and biological data using mathematics and computer science, as in the study of genomes.

  4. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html http://www.merriam-webster.com/dictionary/bioinformatics

  5. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

  6. The science behind Michael Levitt's Nobel Prize Michael Levitt, PhD, has dramatically advanced the field of structural biology by developing sophisticated computer algorithms to build models of complex biological molecules.

  7. Professor Donald E. Knuth The "father" of the analysis of algorithms He is the author of the seminal multi-volume work The Art of Computer Programming. “It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well-explored things. I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.”

  8. Historical perspective

  9. History of Bioinformatics • Gergor Mendel (1866, Verhandlungen des naturforschenden Vereins Brünn) • 1951 – structure for the alpha-helix and beta-sheet • Pauling and Corey (PNAS – 1951) • 1953 - double helix model for DNA • Watson and Crick (Nature, 171: 737-738, 1953) • 1955 – protein sequence of bovine insulin • F. Sanger.

  10. History of Bioinformatics • 1958 – 1990 • Revolution in Computer Science and Engineering • Computer, email, network, internet • 1990 – BLAST • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410. • 1995 - The Haemophilus influenzea genome (1.8 Mb) is sequenced. • 1993 – 2013 – Microarrays • 2005 – 2013 – High-throughput sequencing

  11. Introduction to computing

  12. What is a computer? Tape 1 0 0 1 0 1 1 0 Read/Write head Controller Turing Machine (1936) Alan Turing, "On computable numbers, with an application to the Entscheidungsproblem", Proceedings of the London Mathematical Society, Series 2, 42 (1937), pp 230–265.

  13. Modern Computer Main Memory Processor Disk Drives IO controller Display Keyboard Mouse

  14. What is a Computer Program? Assembly Program C Program Executable file Load to Memory Run the program

  15. Data collection

  16. Public Databases • Gene Expression Omnibus (GEO) • Array Express • National Center for Biotechnology Information (NCBI) • UCSC Genome Browser • The human protein atlas • Catalogue of Somatic Mutations in Cancer – COSMIC • The Cancer Genome Atlas (TCGA)

  17. http://www.ncbi.nlm.nih.gov/geo/

  18. http://www.ebi.ac.uk/arrayexpress

  19. http://www.ncbi.nlm.nih.gov/pubmed

  20. http://genome.ucsc.edu

  21. http://www.proteinatlas.org/

  22. http://www.sanger.ac.uk/genetics/CGP/cosmic/

  23. https://tcga-data.nci.nih.gov/tcga/

  24. Experiment design

  25. To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination : he may be able to say what the experiment died of. - R. A. Fisher

  26. http://graphpad.com/guides/prism/5/user-guide/prism5help.htmlhttp://graphpad.com/guides/prism/5/user-guide/prism5help.html

  27. Independent Samples • Statistical tests are based on the assumption that each subject was sampled independently. • Provides maximum amount of information. • Provides better estimation of the mean.

  28. The Gaussian Approximation Everybody believes in the normal approximation, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact. G. Lippman (1845 – 1921)

  29. Sample Size Estimation

  30. Data analysis

  31. Correlation

  32. Hypothesis Testing • Randomly select samples from the population • State the null hypothesis • Distribution of values in two different populations are the same • Perform the statistical test • T test, F test, Chi-sq test • Get P-value • Set a threshold (usually < 0.05) for significance

  33. Multiple Comparisons • The Bonferroni correction • P < 0.05/N (N = number of comparisons) • False Discovery Rate (FDR) – Q value • What fraction of all the discoveries are false? • Q = 10%, N = 100, smallest p-value < Q/N • http://genomics.princeton.edu/storeylab/qvalue/ • Permutation based approaches

More Related