1 / 78

Introduction to Bioinformatics

Introduction to Bioinformatics. Junhui Wang May 2004. outline. What’s bioinformatics? introduction to biological database Sequence Alignment. Why use bioinformatics ?.

alaina
Download Presentation

Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics Junhui Wang May 2004

  2. outline • What’s bioinformatics? • introduction to biological database • Sequence Alignment

  3. Why use bioinformatics ? • An explosive growth in the amount of biological information necessitates the use of computers for cataloguing and retrieval. • Impossible to analyze data by manual inspection • Data mining –functional/structural information is important for studying the molecular basis of diseases(and evolutionary patterns)

  4. What is bioinformatics ? • A mixture of computer science, mathematics and biology. • Development of new algorithms and statistics to assess relationships among members of large data sets. • Analysis and interpretation of various types of data. • Development and implementation of tools to efficiently access and manage different types of information.

  5. Database for bioinformatics ? • Nucleotide Database & Protein database • Primary database & Secondary database

  6. DNA RNA protein

  7. DNA RNA protein protein sequence databases cDNA ESTs genomic DNA databases

  8. There are three major public DNA databases EMBL GenBank DDBJ Housed at EBI European Bioinformatics Institute Housed at NCBI National Center for Biotechnology Information Housed in Japan

  9. www.ncbi.nlm.nih.gov

  10. PubMed is… • National Library of Medicine's search service • 11 million citations in MEDLINE • links to participating online journals • PubMed tutorial (via “Education” on side bar)

  11. Entrez integrates… • a search and retrieval system that integrates NCBI databases • the scientific literature; • DNA and protein sequence databases; • 3D protein structure data;

  12. Entrez

  13. BLAST is… • Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 80,000 searches per day

  14. OMIM is… • Online Mendelian Inheritance in Man • catalog of human genes and genetic disorders • edited by Dr. Victor McKusick

  15. Books is… • searchable resource of on-line books

  16. TaxBrowser is… • browser for the major divisions of living organisms • ( bacteria, viruses) • taxonomy information such as genetic codes • molecular data on extinct organisms

  17. Structure site includes… • Molecular Modeling Database (MMDB) • biopolymer structures obtained from • the Protein Data Bank (PDB) • a 3D-structure viewer

  18. Four questions we can answer at NCBI (and elsewhere): [1] How can I do a literature search using PubMed? [2] How can WelchWeb help? [3] How can I use Entrez to find information about a particular gene or protein? [4] How can I find information about a particular disease?

  19. Question #1: How can I use PubMed at NCBI to find literature information?

  20. PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.

  21. MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.

  22. PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries AND ,OR, NOT Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/

  23. Question #2: How can I use WelchWeb (from the Welch Medical Library) to do literature searches? WelchWeb is available at http://www.welch.jhu.edu

  24. WelchWeb is available at http://www.welch.jhu.edu

  25. E-mail gateway

  26. PubMed gateway

  27. Library catalog

  28. Remote access to Welch services

  29. Request literature

  30. Browse journals

  31. Browse databases

  32. Question #3: How can I use NCBI (or other sites) to find information about a protein or gene?

  33. Four ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez [3] UniGene [4] ExPASy Sequence Retrieval System (this is separate from NCBI)

  34. 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq LocusLink is a great starting point: it collects key information on each gene/protein from major databases. It now covers 8 organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635) [2] Entrez [3] UniGene [4] ExPASy SRS

  35. 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez Entrez is divided into sites for nucleotide, protein, structure, genomes, OMIM, and more. You can use limits (such as RefSeq) to focus your Entrez search. [3] UniGene [4] ExPASy SRS

  36. The Genebank flatfile: • the elementary unit of information • one of the most commonly used format • LOCUS: locus name/the length of the sequence/the molecule type/ • GenBank division code/the date • DEFINITION:summarize the biology of the record • genus species/product name/…. • ACCESSION:An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. • VERSION:accession version • GID: the gi(geninfo identifier)

  37. The Genebank flatfile (cont): • KEYWORDS:identify the particular entry,not very useful • SOURCE:either have the common name for the organism or its scientific name • REFERENCE: at least one reference or citation,can be published or unpublished,MEDLINE and PUBMED identifier provide a link to the MEDLINE and PUBMED database. • COMMENT: refer to the whole record.

  38. Graphics format

  39. 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez [3] UniGene UniGene collects expressed sequence tags (ESTs) into clusters, in an attempt to form one gene per cluster. Use UniGene to study where your gene is expressed in the body, when it is expressed, and see its abundance. [4] ExPASy SRS

More Related