1 / 50

Tutorial: Bioinformatics Resources

Tutorial: Bioinformatics Resources. BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation. What is Bioinformatics?.

jwoodford
Download Presentation

Tutorial: Bioinformatics Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial: Bioinformatics Resources BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation

  2. What is Bioinformatics? • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. • Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.

  3. Bioinformatics Resources • The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources • 2003 update: http://www3.oup.co.uk/nar/database/ • Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/) • DBcat: A Catalog of > 500 Biological Databases • http://www.infobiogen.fr/services/dbcat/

  4. Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)

  5. The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.)--An online resource of 386 key databases of 18 categories • Major sequence repositories • Comparative Genomics • Gene Expression • Gene Identification and Structure • Genetic and Physical Maps • Genomic Databases • Intermolecular Interactions • Metabolic Pathways and Cellular Regulation • Mutation Databases • Pathology • Protein Sequence Motifs • Proteome Resources • Retrieval Systems and Database Structure • RNA Sequences • Structure • Transgenics • Varied Biomedical Content

  6. Overview • Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods • Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures • Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses

  7. I. Sequence Similarity Search • Find a protein sequence: text search • Based on Pair-Wise Comparisons • BLOSUM scoring matrix • PAM scoring matrix • Dynamic Programming Algorithms • Global Similarity: Needleman-Wunsch (GAP/BestFit) • Local Similarity: Smith-Waterman (SSEARCH) • Heuristic Algorithms (Sequence Database Searching) • FASTA: Based on K-Tuples (2-Amino Acid) • BLAST: Triples of Conserved Amino Acids • Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) • PHI-BLAST: Pattern-Hit Initiated Search (NCBI) • PSI-BLAST: Iterative Search (NCBI)

  8. Sequence Search by Text or Unique ID (http://www.ncbi.nlm.nih.gov/Entrez/) (http://pir.georgetown.edu/pirwww/search/textsearch.html)

  9. Pair-Wise Comparisons • Scoring matrix • Global and local • Similarity: Dynamic Programming • (Needleman-Wunsch, • Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/)

  10. (http://pir.georgetown.edu/pirwww/search/fasta.html) FASTA Search (http://www.ebi.ac.uk/fasta33/)

  11. (http://pir.georgetown.edu/pirwww/search/pirnref.shtml) Gapped-BLAST Search (http://www.ncbi.nlm.nih.gov/BLAST/)

  12. PSI-BLAST Iterative Search (http://www.ncbi.nlm.nih.gov/BLAST/)

  13. PSI-BLAST

  14. II. Family Classification Methods • Multiple Sequence Alignment and Phylogenetic Analysis • ClustalW Multiple Sequence Alignment • Alignment Editor & Phylogenetic Trees • Based on Family Information • PROSITE Pattern Search • Motif and Profile Search • Hidden Markov Model (HMMs)

  15. Multiple Sequence Alignment • ClustalW (http://pir.georgetown.edu/pirwww/search/multaln.html)

  16. Alignment Editor (Jalview) (http://www.ebi.ac.uk/clustalw/)

  17. Alignment Editor (GeneDoc) (http://www.psc.edu/biomed/genedoc/)

  18. Tree Programs: (http://evolution. genetics.washington.edu/phylip.html) Phylogenetic Analysis Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)

  19. PROSITE Pattern Search (http://pir.georgetown.edu/pirwww/search/patmatch.html)

  20. (http://bmerc-www.bu.edu/bioinformatics/profile_request.html)(http://bmerc-www.bu.edu/bioinformatics/profile_request.html) Profile Search

  21. (http://www.sanger.ac.uk/Software/Pfam/search.shtml) Hidden Markov Model Search (http://smart.embl-heidelberg.de)

  22. III. Structural Prediction Methods • Signal Peptide (e.g. http://www.cbs.dtu.dk/services/) • Transmembrane Helix (e.g. http://www.cbs.dtu.dk/services/) • 2D Prediction (e.g. http://cubic.bioc.columbia.edu/ predictprotein/, http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html) • 3D Modeling (e.g. http://guitar.rockefeller.edu/modeller/ modeller.html)

  23. StructurePrediction:A Guide (www.bmm.icnet.uk/people/rob/CCP11BBS/flowchart2.html)

  24. Protein Prediction Server (http://www.cbs.dtu.dk/services/)

  25. (http://www.stepc.gr/~synaptic/sigfind.html) Signal Peptide Prediction (http://www.cbs.dtu.dk/services/SignalP)

  26. Transmembrane Helix (http://www.cbs.dtu.dk/services/TMHMM/)

  27. (http://cmgm.stanford.edu/WWW/www_predict.html) Protein Structure Prediction (http://restools.sdsc.edu/biotools/biotools9.html)

  28. (http://cubic.bioc.columbia.edu/predictprotein/) Structure Prediction Server (http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html)

  29. (http://guitar.rockefeller.edu/modeller/modeller.html) 3D-Modelling (http://www.expasy.ch/swissmod/SWISS-MODEL.html)

  30. IV. Protein Family Databases • Whole Proteins • PIR: Superfamilies and Families • COG (Clusters of Orthologous Groups) of Complete Genomes • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Protein Sequence Motifs and Signatures • Integrated Family Databases • iProClass: Superfamilies/Families, Domains, Motifs, Rich Links • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

  31. (http://www.ncbi.nlm.nih.gov/COG/) Protein Clustering

  32. Pfam (http://www.sanger.ac.uk/Software/Pfam/) Protein Domains • SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)

  33. PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles.(http://www.expasy.ch/prosite/) Protein Motifs

  34. Integrated Family Classification • InterPro: Anintegrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs. (http://www.ebi.ac.uk/interpro/search.html)

  35. V. Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • WIT: Functional Curation and Metabolic Models • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Klotho: Collection and Categorization of Biological Compounds • Cellular Regulation and Gene Networks • EpoDB: Genes Expressed during Human Erythropoiesis • BIND:Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • RegulonDB:Escherichia coli Pathways and Regulation

  36. KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00590+874)

  37. The BioCyc Knowledge Library is a collection of Pathway/Genome • Databases (http://biocyc.org/) BioCyc (EcoCyc/MetaCyc Metabolic Pathways)

  38. Protein-Protein Interactions: DIP (http://dip.doe-mbi.ucla.edu/)

  39. (http://www.bind.ca/) Protein-Protein Interaction: BIND

  40. (http://www.biocarta.com/index.asp) BioCarta Cellular Pathways

  41. VI. Databases of Protein Structures • Protein Structure and Classification • PDB: Structure Determined by X-ray Crystallography and NMR • CATH: Hierarchical Classification of Protein Domain Structures • SCOP: Familial and Structural Protein Relationships • FSSP: Protein Fold Family Database • Protein Sequence-Structure Relationship • PIR-NRL3D: Protein Sequence-Structure Database • PIR-RESID: Protein Structure/Post-Translational Modifications • HSSP: Families and Alignments of Structurally-Conserved Regions

  42. (http://www.rcsb.org/pdb/) PDB Structure Data

  43. PDBsum: Summary and Analysis(http://www.biochem.ucl.ac.uk/bsm/pdbsum)

  44. CATH: Hierarchical domain classification of protein structures (http://www.biochem.ucl.ac.uk/bsm/cath_new/) Protein Structural Classification

  45. The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB. Protein Structural Classification (http://scop.mrc-lmb. cam.ac.uk/scop/)

  46. Proteomic Resources • GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) • PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences • Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes • Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes • Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and analyzing microarray data)

  47. (2D-gel of human ventricle proteins) (http://gelbank.anl.gov/2dgels/index.asp) VII. 2D-Gel Image Databases (http://www-lecb.ncifcrf.gov/2dwgDB)

  48. (http://www.ebi.ac.uk/proteome) VIII. Proteome Analysis

  49. Human and Mouse Transcriptome Expression Profiling (http://expression.gnf.org/cgi-bin/index.cgi) (http://genome-www. stanford.edu/serum/)

  50. Lab: • Visit selected websites and analyze some protein sequence of • your own choices. • List of Bioinformatics Resources of this tutorial available: • http://pir.georgetown.edu/~huz/bioinfo_resource.html • Try some of the following sequences for analysis: • 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) • 2) less characterized proteins: PIR:A59000(MATER) • TrEMBL:Q9QY16(GRTH) • 3) hypothetical protein: PIR:T12515, T00338 , T47130 • SWISS-PROT:Q9BWT7

More Related