1 / 36

Introduction to bioinformatics

Introduction to bioinformatics. Sylvia B. Nagl. What is bioinformatics?. an emerging interdisciplinary research area

simone
Download Presentation

Introduction to bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to bioinformatics Sylvia B. Nagl

  2. What is bioinformatics? • an emerging interdisciplinary research area • deals with the computational management and analysis of biological information: genes, genomes, proteins, cells, ecological systems, medical information, robots, artificial intelligence...

  3. The Core of Bioinformatics to date • Relationships between • sequence 3D structure protein functions • Properties and evolution of genes, genomes, proteins, metabolic pathways in cells • Use of this knowledge for prediction, modelling, and design TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATMLV

  4. “The holy grail of bioinformatics” GCTCCTCACTGTCTGTGTTTATTCTTTTAGCTTCTTCAGATCTTTTAGTCTGAGGAAGCCTGGCATGTGCAAATGAAGTTAACCTAA... > 500, 000 genes sequenced to date Expected number of unique protein structures: ~ 700-1, 000

  5. Basic concepts • conceptual foundations of bioinformatics: evolution protein folding protein function • bioinformatics builds mathematical models of these processes - to infer relationships between components of complex biological systems

  6. Information processing in cells nucleic acids proteins coding regions regulatory sites transcripts One-to-many mappings! Context-dependence!

  7. Global approaches: Toward a new Systems Biology Global cell state Genome Protein population: proteomics Genome activation patterns: transcriptomics • How does the spatial and temporal organisation of living matter give rise to biological processes? Organisation: tissue imaging EM X-ray, NMR cells molecular complexes

  8. Global approaches: Toward a new Systems Biology Perturbation Living cell Dynamic response Biological knowledge (computerised) • Basic principles • Practical applications “Virtual cell” Sequence information Structural information Bioinformatics Mathematical modelling Simulation

  9. We do not know yet whether the information in the genome is sufficient to reconstruct an entire biological system. Information on building blocks not enough, information on their interactions is essential. External environment Internal environment Metabolic net Genetic networks DNA hRNA mRNAs proteins

  10. Bioinformatics in context Mathematics/computer science Genomics Molecular biology Bioinformatics Biophysics Ethical, legal, and social implications Molecular evolution

  11. Current challenges to users • Potential hurdles: Methods are in flux and not fully developed- scattered and heterogeneous resources • Remedies: Web resources navigation guides integration of tools and databanks http://www.biochem.ucl.ac.uk/~nagl/bioinformatics.html

  12. Example 1 Sequence homology search of the genome of Plasmodium falciparumTarget identification for antimalerial drugs

  13. The search for new antimalarial drugs • Malaria is one of the leading causes of morbidity and mortality in the tropics. • 300 to 500 million estimated clinical cases and 1.5 million to 2.7 million deaths per year. • Nearly all fatal cases are caused by Plasmodium falciparum. • The parasite's resistance to conventional antimalarial drugs such as chloroquine is growing at an alarming rate.

  14. P. falciparum has a plastidlike organelle, called the apicoplast, acquired by endosymbiosis of an alga. • Self-replicating, maternally inherited (35kb, circular DNA). • Comparative genome analysis: Search for orthologs. • Apicoplast contains enzymes found in plant and bacterial, but not animal metabolic pathways. • Potential target for antimalerial drugs: • DOXP reductoisomerase Jomaa et al. (1999)

  15. Jomaa et al. (1999) Science 285: 1573-1576:

  16. Biological databases

  17. The challenge (Boguski, 1999) In 1995, the number of genes in the database started to exceed the number of papers on molecular biology and genetics in the literature!

  18. Data types primary data sequence DNA amino acid primary database AATGCGTATAGGC DMPVERILEALAVE secondary data secondary protein structure secondary db “motifs”:regular expressions, blocks, profiles, fingerprints e. g., alpha-helices, beta-strands tertiary data tertiary protein structure tertiary db atomic co-ordinates domains, folding units

  19. Nucleic acid EMBL GenBank DDBJ (DNA Data Bank of Japan) Protein PIR MIPS SWISS-PROT TrEMBL NRL-3D Primary biological databases

  20. International nucleotide data banks EMBL Europe GenBank USA International Advisory Meeting Collaborative Meeting NLM EMBL NCBI EBI DDBJ Japan TrEMBL NRDB NIG CIB

  21. GenBank file format

  22. GenBank file format

  23. Swiss-Prot

  24. SWISS-PROT file format

  25. SWISS-PROT file format

  26. SWISS-PROT file format

  27. SWISS-PROT file format

  28. Other primary protein databases • TrEMBL (translated EMBL) in SWISS-PROT format rapid access to sequence data from genome projects computer-annotated supplement to SWISS-PROT translations of all coding sequences (CDS) in EMBL • SP-TrEMBL • REM-TrEMBL: immunoglobulins, T-cell receptors, short fragments, synthetic and patented sequences

  29. Other primary protein databases The Protein Information Resource (PIR) • integrated system of protein sequence databases and derived related databases, e. g., alignment databases • rapid searching, comparison, and pattern matching of protein sequences • retrieval of descriptive, bibliographic, feature, and concurrent cross-reference information • aims to be comprehensive and consistently annotated

  30. PIR: related databases NRL-3D Sequence-Structure Database • produced by PIR from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB) • allows keyword and similarity searches

  31. PIR: related databases PATCHX integrated with PIR • a non-redundant database of protein sequences produced by MIPS, the European branch of PIR-International The PIR Protein Sequence Database and PATCHX together provide the most complete collection of protein sequence data currently available in the public domain.

  32. Composite protein sequence dbs NRDBOWLMIPSX(PIR+PATCHX)SP+TrEMBL PIR PIR PIRTrEMBL SP SP SP SP PDB GenBank MIPSOwn GenPept NRL-3D NRL-3D MIPSH PIRMOD MIPSTrn EMTrans GBTrans Kabat PseqIP

  33. OWL composite database • By accession number • By database code • By text • By sequence • By title • By author • By query language • By regular expression • Direct OWL access: OWL only released every 6-8 weeks OWL Blast server

  34. Two other useful sites INFOBIOGEN-The Public Catalog of Databases http://www.infobiogen.fr/services/dbcat/ KEGG-Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.

  35. Sequence Retrieval System (SRS) • Database browser that allows users to • retrieve • link • access • entries from all interconnected resources. • Users can formulate queries across a range of different database types.

  36. Guide to Protein Databases: http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html With thanks to Dr Roman Laskowski.

More Related