1 / 29

341: Introduction to Bioinformatics

341: Introduction to Bioinformatics. Dr. Nataša Pržulj & Dr. Peter Rice Department of Comput ing Imperial College London natasha@imperial.ac.uk. Course overview. Motivation:. Explosion in the availability of biological data : Sequences and microarrays (Dr. Rice) Protein 3D structure

karena
Download Presentation

341: Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 341: Introduction to Bioinformatics Dr. Nataša Pržulj & Dr. Peter Rice Department of Computing Imperial College London natasha@imperial.ac.uk

  2. Course overview Motivation: • Explosion in the availability of biological data: • Sequences and microarrays (Dr. Rice) • Protein 3D structure • Networks: e.g., of protein interactions; expected to be as useful as the sequence data in uncovering new biology (Dr. Pržulj) • The goal of systems biology: • Systems-level understanding of biological systems, e.g. the cell • Analyze not only individual components, but their interactions as well and its functioning as a whole • E.g.: Learn new biology from the topology of such interaction networks • However, biological data analysis research faces considerable challenges • Incomplete and noisy data • Computational infeasibility of many computational (e.g., graph theoretic) problems

  3. Course overview We will cover: • Sequence analysis (Dr. Peter Rice) • Microarray analysis (Dr. Peter Rice) • Graph theoretic aspects: • Fundamental topics in graph theory (e.g. basic graph notation, graph representation, and special graph types) • Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis) • Important computational complexity concepts (e.g., complexity classes, subgraph isomorphism, and NP-completeness) which pose challenges on analyzing biological nets • Protein 3D structure • Biological networks aspects: • Basic biological concepts (e.g., DNA, genes, proteins, gene expression, …) • Different types of biological networks • Experimental techniques for acquiring the data and their biases • Public databases and other sources of biological network data • Existing approaches for analyzing and modeling biological networks: • Structural properties of large networks • Network models • Network clustering • Network alignment • Software tools for network analysis • Applications – data analysis: interplay of topology and biology • Learn how the above methods have been applied • Discuss valuable insights that have been learned: into biological function, evolution, complex diseases (e.g., cancer) and drug discovery

  4. Course overview • Grading scheme: • One coursework assignment • Given out on Feb 21 by email and posted to class website • Due on Thursday, March 6, by 2pm • Written exam • Standard DoC Grading Scheme will be used as described by Degree Regulations at https://www.doc.ic.ac.uk/internal/teachingsupport/regulations/index.htm • Other departments: we provide coursework and exam marks and they decide on the weighting for the final grade

  5. Course overview External Students – get onto DoC CATE etc.: 1) Apply at: https://dbc.doc.ic.ac.uk/externalreg/ 2) Then, ➢ Your department's endorser will approve/reject your application 3) If approved, ➢ DoC's External Student Liaison will approve/reject your application 4) If approved (again !), ➢ Students will get access to DoC resources (DoC account, CATE, …) ➢ No access after a few days? Check status of approval and contact relevant person(s) ● Key Dates: ➢ Exam registration opens end January for 2-3 weeks ➢ Exams for DoC 3rd/4th yr. courses take place at the end of the Term in which the course is taught ● If in doubt, read the guidelines available at the link above :-)

  6. Course overview • Course organization: • Lectures • Relevant theoretical concepts and examples • Tutorials • Exercises covering concepts covered in class • One coursework assignment • Opportunity to solve problems using the methods learned in class • Written exam • Testing students’ understanding of the concepts learned in lectures • Tutorial helpers: • AnidaSarajlic (a.sarajlic12@imperial.ac.uk) • Dr. Noel Malod-Dognin(n.malod-dognin@imperial.ac.uk) • VukJanjic ( v.janjic11@imperial.ac.uk )

  7. Course overview • Textbooks and readings • Recommended textbooks: • Pevzner and Shamir, “Bioinformatics for Biologists,” Cambridge University Press, 2011 • Junker and Schreiber, “Analysis of Biological Networks,” Wiley, 2008. • West, “Introduction to graph theory,” 2nd edition, Prentice Hall, 2001 or T. Cormen et al., “Analysis of Algorithms”, 3rd edition, MIT press, 2009. • A list of up-to-date research papers selected by the instructor: see http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html . • Recommended readings: • F. Kepes (Author, Editor), “Biological Networks (Complex Systems and Interdisciplinary Science),” World Scientific Publishing Company; 1st edition, 2007. • Bornholdt and Schuster (Editors), “Handbook of Graphs and Networks: From the Genome to the Internet,” Wiley, 2003. or Dorogovtsev and Mendes (Authors), “Evolution of Networks: From Biological Nets to the Internet and WWW (Physics),” Oxford University Press, 2003. • Chapter 17 from: Chen and Lonardi (Editors), “Biological Data Mining,” Chapman and Hall/CRC press, 2009. • Chapter 4 from: Jurisica and Wigle (Editors), “Knowledge Discovery in Proteomics,” CRC Press, 2005. • “LEDA: A Platform for Combinatorial and Geometric Computing,” by Kurt Mehlhorn, Stefan Näher, Cambridge University Press, 1999.

  8. Course overview • When and where: • Fridays, 9-11h (LT 308) and 14-16h (LT 145) • Huxley • Contact: • E-mail: natasha@imperial.ac.uk • Subject: “341 Bioinformatics” • Office hours: • Fridays after class, 4pm • Office: 407 C Huxley

  9. Course overview • Prerequisites: no formal ones, but • General computational/mathematical maturity • Basic programming skills are desirable • Introduction into biological concepts will be provided • Course website (curriculum, class material, etc.): • http://www.doc.ic.ac.uk/~natasha/course2012/index.html also linked from CATE • Academic code of honor

  10. Topics • Introduction: biology (Dr. Przulj, 1 lecture) • Sequence analysis (Dr. Rice, 2 lectures) • Microarray analysis (Dr. Rice, 3 lectures) • Introduction to graph theory (Dr. Przulj, 2 lectures) • Protein 3D structure (Dr. Malod-Dognin, 2 lectures) • Network biology (Dr. Przulj, 8 lectures): • Network properties • Network/node centralities • Network motifs • Network models • Network/node clustering • Network comparison/alignment • Software tools for network analysis • Interplay between topology and biology

  11. Course overview • Any questions so far?

  12. Course overview • About you…

  13. Introduction: biology 13

  14. Introduction: biology • Cell - the building block of life • Cytoplasm and organelles separated by membranes: • Mitochondria, nucleus, etc. 14

  15. Introduction: biology • Distinguish between: • Prokaryotes • Single-celled, no cell nucleus or any other membrane-bound organelles • The genetic material in prokaryotes is not membrane-bound • The bacteria and the archaea • Model organism: E.coli • Eukaryotes • Have "true" nuclei containing their DNA • May be unicellular, as in amoebae • May be multicellular, as in plants and animals • Model organism: S. cerevisiae(baker’s yeast) 15

  16. Introduction: biology • Nucleus contains DNA • Deoxyribonucleic acid • DNA nucleotides: A and T, C and G • DNA structure: double helix 16

  17. Introduction: biology • Chromosomes • RNA: similar to DNA, except T U and single stranded 17

  18. Introduction: biology • Main role of DNA: long-term storage of genetic information • Genes: DNA segments that carry this information • Intron: part of gene not translated into protein, spliced out of mRNA • Exon: mRNA translated into protein consists only of exon-derived sequences • Genome: total set of all genes in an organism • Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism 18

  19. Introduction: biology • Codons: sets of three nucleotides • 4 nucleotides  43=64 possible codons • Each codon codes for an amino acid • 64 codons produce 20 different amino acids • More than one codon stands for one amino acid • Polypeptide: • String of amino acids, composed from a 20-character alphabet • Proteins: • Composed of one or more polypeptide chains (70-3000 amino acids) • Sequence of amino acids is defined by a gene • Gene expression: information transmission from DNA to proteins • Proteome: total set of proteins in an organism

  20. Introduction: biology • The 20 amino acids 20

  21. Introduction: biology • Levels of protein structure: 21

  22. Introduction: biology • Genes vs. proteins • Genes – passive; proteins – active • Protein synthesis: from genes to proteins • Transcription (in nucleus) • Splicing (eukaryotes) • Translation (in cytoplasm) 22

  23. Introduction: biology • Transcription (in nucleus) • RNA polymerase enzyme builds an RNA strand from a gene (DNA is "unzipped“) • The gene is transcribed to messenger RNA (mRNA) • Transcription is regulated by proteins called transcription factors 23

  24. Introduction: biology • Splicing (eukaryotes) • Regions that are not coding for proteins (introns) are removed from sequence 24

  25. Introduction: biology • Translation(in cytoplasm) • Ribosomes synthesize proteins from mRNA • mRNA is decoded and used as a template to guide the synthesis of a chain of amino acids that form a protein • Translation: the process of converting the mRNA codon sequences into an amino acid polypeptide chain 25

  26. Introduction: biology • Microarrays: • Measure mRNA abundance for each gene • The amount of transcribed mRNA correlates with gene expression: • The rate at which a gene produces the corresponding protein It is hard to measure protein level directly! 26

  27. Introduction: biology • Every cell* contains the complete genome of an organism • How is the variety of different tissues encoded and expressed? 27

  28. Introduction: biology 22,000? 28

  29. Introduction: biology • -ome and –omics • Genome and genomics • Proteome and proteomics • … 29

More Related