1 / 53

Tutorial: Bioinformatics Resources

Tutorial: Bioinformatics Resources. BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation, GUMC. What is Bioinformatics?.

rex
Download Presentation

Tutorial: Bioinformatics Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial: Bioinformatics Resources BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation, GUMC

  2. What is Bioinformatics? • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. • Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.

  3. Bioinformatics Resources • The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources • 2003 update: http://www3.oup.co.uk/nar/database/ • Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/) • DBcat: A Catalog of > 500 Biological Databases • http://www.infobiogen.fr/services/dbcat/

  4. Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)

  5. The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.)--An online resource of 386 key databases of 18 categories • Major sequence repositories • Comparative Genomics • Gene Expression • Gene Identification and Structure • Genetic and Physical Maps • Genomic Databases • Intermolecular Interactions • Metabolic Pathways and Cellular Regulation • Mutation Databases • Pathology • Protein Sequence Motifs • Proteome Resources • Retrieval Systems and Database Structure • RNA Sequences • Structure • Transgenics • Varied Biomedical Content

  6. Overview • Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods • Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures • Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses

  7. I. Sequence Similarity Search • Find a protein sequence: text search • Based on Pair-Wise Comparisons • BLOSUM scoring matrix • PAM scoring matrix • Dynamic Programming Algorithms • Global Similarity: Needleman-Wunsch (GAP/BestFit) • Local Similarity: Smith-Waterman (SSEARCH) • Heuristic Algorithms (Sequence Database Searching) • FASTA: Based on K-Tuples (2-Amino Acid) • BLAST: Triples of Conserved Amino Acids • Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) • PHI-BLAST: Pattern-Hit Initiated Search (NCBI) • PSI-BLAST: Iterative Search (NCBI)

  8. Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) Sequence Search by Text or Unique ID (http://pir.georgetown.edu/pirwww/search/textsearch.html)

  9. Pair-Wise Comparisons • Scoring matrix • Global and local • Similarity: Dynamic Programming • (Needleman-Wunsch, • Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/)

  10. (http://pir.georgetown.edu/pirwww/search/fasta.html) FASTA Search (http://www.ebi.ac.uk/fasta33/)

  11. (http://pir.georgetown.edu/pirwww/search/pirnref.shtml) Gapped-BLAST Search (http://www.ncbi.nlm.nih.gov/BLAST/)

  12. A BLAST Result

  13. PSI-BLAST Iterative Search (http://www.ncbi.nlm.nih.gov/BLAST/)

  14. PSI-BLAST

  15. II. Family Classification Methods • Multiple Sequence Alignment and Phylogenetic Analysis • ClustalW Multiple Sequence Alignment • Alignment Editor & Phylogenetic Trees • Searches Based on Family Information • PROSITE Pattern Search • Motif and Profile Search • Hidden Markov Model (HMMs)

  16. Multiple Sequence Alignment • ClustalW (http://pir.georgetown.edu/pirwww/search/multaln.html)

  17. Alignment Editor (Jalview) (http://www.ebi.ac.uk/clustalw/)

  18. Alignment Editor (GeneDoc) (http://www.psc.edu/biomed/genedoc/)

  19. Tree Programs: (http://evolution. genetics.washington.edu/phylip.html) Phylogenetic Analysis Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)

  20. Phylogenetic Trees (IGFBP Superfamily) (Radial Tree) (Phylogram)

  21. (http://pir.georgetown.edu/pirwww/search/patmatch.html) PROSITE Pattern Search

  22. (http://bmerc-www.bu.edu/bioinformatics/profile_request.html)(http://bmerc-www.bu.edu/bioinformatics/profile_request.html) Profile Search

  23. (http://www.sanger.ac.uk/Software/Pfam/search.shtml) Hidden Markov Model Search (http://smart.embl-heidelberg.de)

  24. III. Structural Prediction Methods • Signal Peptide: SIGFIND, SignalP • Transmembrane Helix: TMHMM, TMAP • 2D Prediction (a-helix, b-sheet, Coiled-coils): PHD, JPred • 3D Modeling: Homology Modeling (Modeller, SWISS-MODEL), Threading, Ab-initio Prediction

  25. StructurePrediction:A Guide (http://speedy.embl-heidelberg.de/gtsp/flowchart2.html)

  26. Protein Prediction Server (http://www.cbs.dtu.dk/services/)

  27. (http://www.stepc.gr/~synaptic/sigfind.html) Signal Peptide Prediction (http://www.cbs.dtu.dk/services/SignalP-2.0)

  28. Transmembrane Helix (http://www.cbs.dtu.dk/services/TMHMM/)

  29. (http://cmgm.stanford.edu/WWW/www_predict.html) Protein Structure Prediction (http://restools.sdsc.edu/biotools/biotools9.html)

  30. (http://cubic.bioc.columbia.edu/predictprotein/) Structure Prediction Server (http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html)

  31. (http://www.salilab.org/modeller/modeller.html) 3D-Modelling (http://www.expasy.ch/swissmod/SWISS-MODEL.html)

  32. IV. Protein Family Databases • Whole Proteins • PIR: Superfamilies and Families • COG (Clusters of Orthologous Groups) of Complete Genomes • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Protein Sequence Motifs and Signatures • Integrated Family Databases • iProClass: Superfamilies/Families, Domains, Motifs, Rich Links • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

  33. (http://www.ncbi.nlm.nih.gov/COG/) Protein Clustering

  34. Pfam (http://www.sanger.ac.uk/Software/Pfam/) Protein Domains • SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)

  35. PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles.(http://www.expasy.ch/prosite/) Protein Motifs

  36. Integrated Family Classification • InterPro: Anintegrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

  37. V. Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • WIT: Functional Curation and Metabolic Models • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Klotho: Collection and Categorization of Biological Compounds • Cellular Regulation and Gene Networks • EpoDB: Genes Expressed during Human Erythropoiesis • BIND:Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • RegulonDB:Escherichia coli Pathways and Regulation

  38. KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00590+874)

  39. The BioCyc Knowledge Library is a collection of Pathway/Genome • Databases (http://biocyc.org/) BioCyc (EcoCyc/MetaCyc Metabolic Pathways)

  40. Protein-Protein Interactions: DIP (http://dip.doe-mbi.ucla.edu/)

  41. (http://www.bind.ca/) Protein-Protein Interaction: BIND

  42. (http://www.biocarta.com/index.asp) BioCarta Cellular Pathways

  43. VI. Databases of Protein Structures • Protein Structure and Classification • PDB: Structure Determined by X-ray Crystallography and NMR • CATH: Hierarchical Classification of Protein Domain Structures • SCOP: Familial and Structural Protein Relationships • FSSP: Protein Fold Family Database • Protein Sequence-Structure Relationship • PIR-NRL3D: Protein Sequence-Structure Database • PIR-RESID: Protein Structure/Post-Translational Modifications • HSSP: Families and Alignments of Structurally-Conserved Regions

  44. (http://www.rcsb.org/pdb/) PDB Structure Data

  45. PDBsum: Summary and Analysis(http://www.biochem.ucl.ac.uk/bsm/pdbsum)

  46. Protein Structural Classification CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)

  47. The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB. Protein Structural Classification (http://scop.mrc-lmb. cam.ac.uk/scop/)

  48. VII. Proteomic Resources • GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) • PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences • Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes • Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes • Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and analyzing microarray data)

  49. 2D-Gel Image Databases (1) (http://gelbank.anl.gov/2dgels/index.asp)

  50. 2D-Gel Image Databases (2) (http://us.expasy.org/ch2d/2d-index.html) (http://us.expasy.org/cgi-bin/nice2dpage.pl?P06493)

More Related