Proteomics Methods and Tools Overview
530 likes | 548 Views
Learn about bioinformatics resources for protein analysis including databases, sequence search methods, and structural prediction techniques. Explore key tools such as BLAST, ClustalW, and HMMs for proteomic research.
Proteomics Methods and Tools Overview
E N D
Presentation Transcript
Tutorial: Bioinformatics Resources BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation, GUMC
What is Bioinformatics? • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. • Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.
Bioinformatics Resources • The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources • 2003 update: http://www3.oup.co.uk/nar/database/ • Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/) • DBcat: A Catalog of > 500 Biological Databases • http://www.infobiogen.fr/services/dbcat/
Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)
The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.)--An online resource of 386 key databases of 18 categories • Major sequence repositories • Comparative Genomics • Gene Expression • Gene Identification and Structure • Genetic and Physical Maps • Genomic Databases • Intermolecular Interactions • Metabolic Pathways and Cellular Regulation • Mutation Databases • Pathology • Protein Sequence Motifs • Proteome Resources • Retrieval Systems and Database Structure • RNA Sequences • Structure • Transgenics • Varied Biomedical Content
Overview • Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods • Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures • Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses
I. Sequence Similarity Search • Find a protein sequence: text search • Based on Pair-Wise Comparisons • BLOSUM scoring matrix • PAM scoring matrix • Dynamic Programming Algorithms • Global Similarity: Needleman-Wunsch (GAP/BestFit) • Local Similarity: Smith-Waterman (SSEARCH) • Heuristic Algorithms (Sequence Database Searching) • FASTA: Based on K-Tuples (2-Amino Acid) • BLAST: Triples of Conserved Amino Acids • Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) • PHI-BLAST: Pattern-Hit Initiated Search (NCBI) • PSI-BLAST: Iterative Search (NCBI)
Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) Sequence Search by Text or Unique ID (http://pir.georgetown.edu/pirwww/search/textsearch.html)
Pair-Wise Comparisons • Scoring matrix • Global and local • Similarity: Dynamic Programming • (Needleman-Wunsch, • Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/)
(http://pir.georgetown.edu/pirwww/search/fasta.html) FASTA Search (http://www.ebi.ac.uk/fasta33/)
(http://pir.georgetown.edu/pirwww/search/pirnref.shtml) Gapped-BLAST Search (http://www.ncbi.nlm.nih.gov/BLAST/)
PSI-BLAST Iterative Search (http://www.ncbi.nlm.nih.gov/BLAST/)
II. Family Classification Methods • Multiple Sequence Alignment and Phylogenetic Analysis • ClustalW Multiple Sequence Alignment • Alignment Editor & Phylogenetic Trees • Searches Based on Family Information • PROSITE Pattern Search • Motif and Profile Search • Hidden Markov Model (HMMs)
Multiple Sequence Alignment • ClustalW (http://pir.georgetown.edu/pirwww/search/multaln.html)
Alignment Editor (Jalview) (http://www.ebi.ac.uk/clustalw/)
Alignment Editor (GeneDoc) (http://www.psc.edu/biomed/genedoc/)
Tree Programs: (http://evolution. genetics.washington.edu/phylip.html) Phylogenetic Analysis Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)
Phylogenetic Trees (IGFBP Superfamily) (Radial Tree) (Phylogram)
(http://pir.georgetown.edu/pirwww/search/patmatch.html) PROSITE Pattern Search
(http://bmerc-www.bu.edu/bioinformatics/profile_request.html)(http://bmerc-www.bu.edu/bioinformatics/profile_request.html) Profile Search
(http://www.sanger.ac.uk/Software/Pfam/search.shtml) Hidden Markov Model Search (http://smart.embl-heidelberg.de)
III. Structural Prediction Methods • Signal Peptide: SIGFIND, SignalP • Transmembrane Helix: TMHMM, TMAP • 2D Prediction (a-helix, b-sheet, Coiled-coils): PHD, JPred • 3D Modeling: Homology Modeling (Modeller, SWISS-MODEL), Threading, Ab-initio Prediction
StructurePrediction:A Guide (http://speedy.embl-heidelberg.de/gtsp/flowchart2.html)
Protein Prediction Server (http://www.cbs.dtu.dk/services/)
(http://www.stepc.gr/~synaptic/sigfind.html) Signal Peptide Prediction (http://www.cbs.dtu.dk/services/SignalP-2.0)
Transmembrane Helix (http://www.cbs.dtu.dk/services/TMHMM/)
(http://cmgm.stanford.edu/WWW/www_predict.html) Protein Structure Prediction (http://restools.sdsc.edu/biotools/biotools9.html)
(http://cubic.bioc.columbia.edu/predictprotein/) Structure Prediction Server (http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html)
(http://www.salilab.org/modeller/modeller.html) 3D-Modelling (http://www.expasy.ch/swissmod/SWISS-MODEL.html)
IV. Protein Family Databases • Whole Proteins • PIR: Superfamilies and Families • COG (Clusters of Orthologous Groups) of Complete Genomes • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Protein Sequence Motifs and Signatures • Integrated Family Databases • iProClass: Superfamilies/Families, Domains, Motifs, Rich Links • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART
(http://www.ncbi.nlm.nih.gov/COG/) Protein Clustering
Pfam (http://www.sanger.ac.uk/Software/Pfam/) Protein Domains • SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles.(http://www.expasy.ch/prosite/) Protein Motifs
Integrated Family Classification • InterPro: Anintegrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)
V. Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • WIT: Functional Curation and Metabolic Models • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Klotho: Collection and Categorization of Biological Compounds • Cellular Regulation and Gene Networks • EpoDB: Genes Expressed during Human Erythropoiesis • BIND:Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • RegulonDB:Escherichia coli Pathways and Regulation
KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00590+874)
The BioCyc Knowledge Library is a collection of Pathway/Genome • Databases (http://biocyc.org/) BioCyc (EcoCyc/MetaCyc Metabolic Pathways)
Protein-Protein Interactions: DIP (http://dip.doe-mbi.ucla.edu/)
(http://www.bind.ca/) Protein-Protein Interaction: BIND
(http://www.biocarta.com/index.asp) BioCarta Cellular Pathways
VI. Databases of Protein Structures • Protein Structure and Classification • PDB: Structure Determined by X-ray Crystallography and NMR • CATH: Hierarchical Classification of Protein Domain Structures • SCOP: Familial and Structural Protein Relationships • FSSP: Protein Fold Family Database • Protein Sequence-Structure Relationship • PIR-NRL3D: Protein Sequence-Structure Database • PIR-RESID: Protein Structure/Post-Translational Modifications • HSSP: Families and Alignments of Structurally-Conserved Regions
(http://www.rcsb.org/pdb/) PDB Structure Data
PDBsum: Summary and Analysis(http://www.biochem.ucl.ac.uk/bsm/pdbsum)
Protein Structural Classification CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)
The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB. Protein Structural Classification (http://scop.mrc-lmb. cam.ac.uk/scop/)
VII. Proteomic Resources • GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) • PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences • Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes • Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes • Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and analyzing microarray data)
2D-Gel Image Databases (1) (http://gelbank.anl.gov/2dgels/index.asp)
2D-Gel Image Databases (2) (http://us.expasy.org/ch2d/2d-index.html) (http://us.expasy.org/cgi-bin/nice2dpage.pl?P06493)