1 / 10

Bioinformatics

Bioinformatics. Dillon Dugan | BIOL 446L. What is Bioinformatics?. Seems that no one can agree on a definitive explanation The best explanation I found: “Bioinformatics involves the integration of computers, software tools, and databases in an effort to address biological questions. ”

jamessclark
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Dillon Dugan | BIOL 446L

  2. What is Bioinformatics? • Seems that no one can agree on a definitive explanation • The best explanation I found: “Bioinformatics involves the integration of computers, software tools, and databases in an effort to address biological questions. ” • The National Center for Biotechnology Information (NCBI) defines Bioinformatics into three important sub-disciplines: • The development of new algorithms and statistics with which to assess relationships among members of large data sets • The analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures • The development and implementation of tools that enable efficient access and management of different types of information

  3. History of Bioinformatics • The demand of bioinformatic databases started in 1956 when Sanger reported the first protein sequence and nearly a decade later when the first nucleic acid sequence was reported • In 1966, Margaret Belle (Oakley) Dayhoff and Richard V. Eck pioneered the field of bioinformatics by using computational analysis to compare protein sequences to reconstruct their evolutionary histories from those sequence alignments • Their database was published as Atlas of Protein Sequence and Structure, which is known as the first bioinformatic database • The field of bioinformatics would be fueled by the need of databases with lots of storage and the need of computer programs to process the data collected from sequences

  4. Databases • What is a database • An organized collection of data • Generally stored and accessed electronically • How do electronic databases work? • In general, information is stored as bytes either on a local hard drive or a cloud service somewhere else • This information is stored in rows called records and contain columns of similar information called fields • Queries are used to search for the desired information within the storage • Searches can be based on what you want in a field • Some codes provide maintenance to the database

  5. Biological Databases • There are approximately 180 biological databases available presently • The three primarily used databases are GenBank, EMBL, and DDJB • These databases are divided into nucleic acid sequences, protein/amino acid sequences, signal transduction pathway, metabolic pathway, and a few other minor databases • The main databases are nucleic acid and protein/amino acid databases • How do Biological Databases work? • DNA sequence is read through sequencing • That data is put into some sort of server/database • This information can be analysis by bioinformatic tools such as BLAST or MEME

  6. GenBank • Started as the Los Alamos Sequence Database in 1979 at Los Alamos National Laboratory • Walter Goad, a nuclear physicist, decided to focus on biological efforts by creating the Los Alamos Sequence Database with some of his colleagues. • This would later be culminated with the creation of the public GenBank in 1982 • In collaboration with BBN, nearly 2,000 sequences were stored in this database by 1983 • Later, LANL would collaborate with Stanford University in the mid-80s • As one of the earliest widely accessible biological databases, GenBank started a program to promote open access communication between bio-scientists • By 1992, GenBank project was transferred to the newly created National Center of Biotechnology Information under the National Library of Medicine. • Currently, nearly 100,000 distinct organisms’ nucleotide sequence and protein translations are publicly accessible through GenBank • As you all know, GenBank hosts a webpage-based tool to search for sequences similar to yours and give detail information about the match, called BLAST

  7. European Molecular Biology Laboratory (EMBL) • The EMBL was the creation of Leó Szilárd, James Watson, and John Kendrew as an international research center to rival the American-dominated field of molecular biology in 1974 • Its main laboratory is in Heidelberg, Germany. But there are outstations in England, France, Italy, Spain, and another in Germany •  The EMBL focuses on research of molecular biology and molecular medicine as well as training for scientists, students, and visitors • Contains two important tools: ClustalX and HMMER • ClustalX: Multiple Sequence Alignment of DNA or protein sequences • HMMER: fast and sensitive homologous searches

  8. DNA Data Bank of Japan • DNA sequence database located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan • DDBJ started its activity in 1986 and remains the only nucleotide sequence database in Asia • Although it is mostly used by Japanese researchers, DDBJ accepts data from researchers of any country. • Has their own BLAST tool for nucleotide sequence search and a TXSearch for taxonomical sequence search

  9. Types of Bioinformatics Tools • BLAST • Basic Local Alignment Search Tool • An algorithm for comparing biological sequence information, such as nucleic acid sequences or protein/amino-acid sequences • FASTA • An algorithm for comparing full length alignments via Smith-Waterman algorithms • Very time consuming • More precise and accurate results • Clustal • Multiple sequence alignment based on deriving phylogenetic trees from UPGMA cluster analysis of pairwise sequences • Written in C++ • HMMER • Detects homologous protein or nucleotide sequences by comparing a profile-HMM (Hidden Markov Model) to either a single sequence or a database of sequences • Profile HMMs turn a multiple sequence alignment into a position-specific scoring system, which can be used to align sequences and search databases for remotely homologous sequences • SignalP • Predicts the presence and location of signal peptide cleavage sites in amino acid sequences in eukaryotes, Gram+ prokaryotes, and Gram- prokaryotes. • Predictions are made through combination of many artificial neural networks • SMART • Simple Modular Architecture Research Tool • Biological database that identifies and analysis of protein domains within protein sequences • Protein domains: conserved part of a given protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain

  10. Work Cited • Fox, Joanne. 4 Aug. 2006. What is Bioinformatics? The Science Creative Quarterly. www.scq.ubc.ca/what-is-bioinformatics/. • Streeton, Antony O. W. 2002. The First Sequence: Fred Sanger and Insulin. The Genetic Society of America. • Christophe. 19 Aug. 2015. How Does a Relational Database Work. Coding Geek. coding-geek.com/how-databases-work/. • Masic, Izet. 2016. The Most Influential Scientists in the Development of Medical Informatics: Margaret Belle Dayhoff. National Center of Biotechnology Information. • Lee, John. 2007. Richard V. Eck (1922-2006): Bioinformatics: In the beginning. National Center of Biotechnology Information. • Thampi, Sabu M. 2001. Bioinformatics. LBS College of Engineering • http://www.cbs.dtu.dk/services/SignalP/ • http://smart.embl-heidelberg.de/help/smart_about.shtml • https://en.wikipedia.org/wiki/Bioinformatics • https://en.wikipedia.org/wiki/Margaret_Oakley_Dayhoff • https://en.wikipedia.org/wiki/Database • https://en.wikipedia.org/wiki/GenBank • https://en.wikipedia.org/wiki/DNA_Data_Bank_of_Japan • https://en.wikipedia.org/wiki/HMMER • https://en.wikipedia.org/wiki/Clustal • https://en.wikipedia.org/wiki/Simple_Modular_Architecture_Research_Tool

More Related