1 / 43

Tutorial: Bioinformatics Resources ( http://pir.georgetown.edu/~huz/class/bioinfo_resource.html )

Tutorial: Bioinformatics Resources ( http://pir.georgetown.edu/~huz/class/bioinfo_resource.html ) . Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research Assistant Professor, Department of

alexis
Download Presentation

Tutorial: Bioinformatics Resources ( http://pir.georgetown.edu/~huz/class/bioinfo_resource.html )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial: Bioinformatics Resources(http://pir.georgetown.edu/~huz/class/bioinfo_resource.html) Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center

  2. What is Bioinformatics? computer + mouse = bioinformatics(information) (biology) • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store,organize, archive, analyze, or visualize such data.

  3. Molecular Biology Database Collection --858 key databases of 15 categories (http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3/DC1)

  4. Database Collection inNucleic Acids Res.

  5. Online Access to Database Collection 2006 http://pir.georgetown.edu/~huz/class/2005_database_update.html http://www.oxfordjournals.org/nar/database/cap/

  6. Overview Database Contents, Search and Retrieval • Text search / Information retrieval • Sequence & genomics databases • Protein family databases • Database of protein functions • Databases of protein structures • Proteomics databases

  7. Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)

  8. PubMed Literature Database(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed)

  9. (http://www.pir.uniprot.org/cgi-bin/textSearch) UniProt Text Search Googletype search vs. Booleansearches: AND, OR, NOT

  10. (http://pir.georgetown.edu/pirwww/search/textsearch.html) PIR Text Search (I) Search: Alpha crystallin A chain and protein family?

  11. PIR Text Search (II) Search: Crystallins that are enzymes ? Can you find which crystallin that has 3D structure determined?

  12. I. Sequence & Genomics Databases • GenBank: An annotated collection of all publicly available nucleotide and protein sequences. • RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products • UniProtConsortium Database: Universal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. • Entrez Gene: Gene-centered information at NCBI. • UniGene: Unified clusters of ESTs and full-length mRNA sequences . • OMIM:Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. • Model Organism Genome Databases:MGD, RGD, SGD, Flybase… • GeneCards:Integrated database of human genes, maps, proteins and diseases. • SNP Consortium Database

  13. 2.85 million Universal Protein Resource UniProt Consortium Databases (http://www.uniprot.org) UniProtKB UniRef UniParc

  14. UniProt Sequence Report (I) What’s the difference between CRYAA_RABIT & CYRBAA? (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRYAA_RABIT)

  15. UniProt Sequence Report (II) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)

  16. Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

  17. OMIM: Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)

  18. II. Protein Family Databases • Whole Proteins • PIRSF: A Network Classification System of Protein Families • COG (Clusters of Orthologous Groups) of Complete Genomes • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • CDD: Conserved Domain Database • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Protein Sequence Motifs and Signatures • Integrated Family Databases • iProClass: Superfamilies/Families, Domains, Motifs, Rich Links • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily

  19. Protein Clustering COGs: (http://www.ncbi.nlm.nih.gov/COG/)

  20. KOGs: Eukaryotic Clusters (http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi?KOG3591)

  21. Domain Classification (http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRYAA_RABIT) (http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT)

  22. (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525) Pfam Domain

  23. Integrated Family Classification InterPro: Anintegrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

  24. PIRSF: Full Length Classification iProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)

  25. PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/) Protein Motifs

  26. III. Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Cellular Regulation and Gene Networks • EpoDB: Genes Expressed during Human Erythropoiesis • BIND: Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • BioCarta: Biological pathways of human and mouse • GO: Gene Ontology Consortium Database

  27. KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)

  28. BioCyc (EcoCyc/MetaCyc Metabolic Pathways) • The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)

  29. (http://www.biocarta.com/index.asp) BioCarta Cellular Pathways

  30. (http://www.bind.ca/) Protein-Protein Interaction: BIND

  31. Gene Ontology(http://www.geneontology.org/) Three GOs: Molecular Function Biological Process Cellular Component

  32. IV. Databases of Protein Structures • Protein Structure • PDB: Structure Determined by X-ray Crystallography and NMR • PDBsum: Summaries and analyses of PDB structures • MMDB: NCBI’s database of 3D structures, part of NCBI Entrez • SWISS-MODEL Repository: Database of annotated protein 3D models • ModBase: Annotated comparative protein structure models • Structure Classification • CATH: Hierarchical Classification of Protein Domain Structures • SCOP: Familial and Structural Protein Relationships • FSSP: Protein Fold Classification Based on Structure--Structure Alignment

  33. Rat gamma-crystallin, chain A, B. PDB: Experimental 3D Structure Repository Can you do a text search at PIR to find this? (http://www.rcsb.org/pdb/)

  34. Summary and Analysis(http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/) PDBsum: Search 3-D structure summary 2-D structure

  35. Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)

  36. SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known. Protein Structural Classification (2) (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)

  37. SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)

  38. VI. Proteomic Resources • GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) • PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences • Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets • PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databases • GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databases

  39. (http://us.expasy.org/ch2d/2d-index.html) 2D-Gel Image Databases (1) (http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489)

  40. 2D-Gel Image Databases (2) (http://gelbank.anl.gov/2dgels/index.asp)

  41. http://gpmdb.thegpm.org/ GPMdb MS Data Search Craig, et al., J Proteome Res. 2004, 3:1234-42.

  42. iProLINK: Protein Literature Mining Resource Text mining of Protein phospohrylation Gene/protein name thesaurus: synonyms, ambiguous names… http://pir.georgetown.edu/iprolink/

  43. Alpha crystallin A (UniProt: CRYAA_RABIT/P02493) • Delta crystallin II (Argininosuccinate lyase) (UniProt: ARLY2_ANAPL/P24058) • Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to. Lab:

More Related