tutorial bioinformatics resources n.
Skip this Video
Loading SlideShow in 5 Seconds..
Tutorial: Bioinformatics Resources PowerPoint Presentation
Download Presentation
Tutorial: Bioinformatics Resources

Loading in 2 Seconds...

play fullscreen
1 / 47

Tutorial: Bioinformatics Resources - PowerPoint PPT Presentation

  • Uploaded on

Tutorial: Bioinformatics Resources. ( http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html ). Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Tutorial: Bioinformatics Resources' - mayten

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tutorial bioinformatics resources

Tutorial: Bioinformatics Resources


Bio-Trac 25 (Proteomics: Principles and Methods)

March 23, 2007

Zhang-Zhi Hu, M.D.

Research Associate Professor

Protein Information Resource, Department of

Biochemistry and Molecular & Cellular Biology

Georgetown University Medical Center


What is Bioinformatics?

computer + mouse = bioinformatics(information) (biology)

  • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualizesuch data.
molecular biology database collection

--968 key databases of 14 categories

Molecular Biology Database Collection




Online Access to Database Collection




Database Contents, Search and Retrieval

  • Text search / Information retrieval
  • Sequence & genomics databases
  • Protein family databases
  • Database of protein functions
  • Databases of protein structures
  • Proteomics databases
entrez text searches
Entrez Text Searches


pubmed literature database
PubMed Literature Database


Literature mining

iprolink protein literature mining resource
iProLINK: Protein Literature Mining Resource

Text mining for protein phosphorylation

Gene/protein name thesaurus: synonyms, ambiguous names…


biothesaurus gene protein name searches synonyms ambiguous names
BioThesaurus:Gene/protein name searches - synonyms, ambiguous names…



crystallin, alpha A




rlims p text mining for protein phosphorylation
RLIMS-P: Text mining for protein phosphorylation


uniprot text search
UniProt Text Search


Googletype search vs.

Booleansearches: AND, OR, NOT

pir text search i
PIR Text Search (I)


Search: alpha crystallin A chain that are in protein families?

Search for synonyms

pir text search ii
PIR Text Search (II)

Search: what crystallins are enzymes and what families they belong to?

Can you find which crystallins have 3D structure determined?

i sequence genomics databases
I. Sequence & Genomics Databases
  • GenBank: An annotated collection of all publicly available nucleotide and protein sequences.
  • RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products
  • UniProtConsortium Database: Universal protein resource, a central repository of protein sequence and function.
  • Entrez Gene: Gene-centered information at NCBI.
  • UniGene: Unified clusters of ESTs and full-length mRNA sequences .
  • OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders.
  • Model Organism Genome Databases: MGD, RGD, SGD, Flybase…
  • GeneCards: Integrated database of human genes, maps, proteins and diseases.
  • SNP Consortium Database; International HapMapProject: Genes associated with human disease


uniprot consortium databases

4.1 million

UniProt Consortium Databases

Universal Protein Resource


UniProtKB UniRef UniParc

uniprot sequence report i
UniProt Sequence Report (I)


What’s the difference between CRYAA_RABIT & CYRBAA?


uniprot report ii uniref100 90
UniProt Report (II): UniRef100 & 90





entrez gene gene centric information
Entrez Gene – Gene centric information


omim online mendelian inheritance in man
OMIM:Online Mendelian inheritance in man


ii protein family databases
II. Protein Family Databases
  • Whole Proteins
    • PIRSF: Network Classification Based on Evolutionary Relationship of Whole Protein
    • COG (Clusters of Orthologous Groups) of Complete Genomes
    • PANTHER: Proteins Classified into Families/Subfamilies of Shared Function
    • ProtoNet: Automated Hierarchical Classification of Proteins
  • Protein Domains
    • Pfam: Alignments and HMM Models of Protein Domains
    • SMART: Protein Domain Families
    • CDD: Conserved Domain Database
  • Protein Motifs
    • PROSITE: Protein Patterns and Profiles
    • BLOCKS: Protein Sequence Motifs and Alignments
    • PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs)
  • Integrated Family Databases
    • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily…
protein clustering
Protein Clustering

Initial version


New version: Includes Eukaryotic Clusters - KOGs

pirsf full length classification iproclass family report
PIRSF: Full Length ClassificationiProClass Family Report


domain classification pfam domain
Domain Classification – Pfam Domain



pfam domain
Pfam Domain



Protein Motifs: PROSITE –A database of protein families and domains. It consists of biologically significant sites, patterns and profiles.


integrated family classification
Integrated Family Classification


An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

Mapping of families

iii databases of protein functions
III. Databases of Protein Functions
  • Metabolic Pathways, Enzymes, and Compounds
    • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB)
    • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways
    • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes
    • EcoCyc: Encyclopedia of E. coli Genes and Metabolism
    • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)
    • BRENDA: Enzyme Database
    • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways
  • Inter-Molecular interactions and Regulatory Pathways
    • IntAct: Protein interaction data from literature and user submission
    • BIND: Descriptions of interactions, molecular complexes and pathways
    • DIP: Catalogs experimentally determined interactions between proteins
    • Reactome - A curated knowledgebase of biological pathways
    • BioCarta: Biological pathways of human and mouse
    • GO: Gene Ontology Consortium Database
  • Pathway Resources - Pathguide
biological pathway resource collection
Biological Pathway Resource Collection


  • Protein-protein interactions
  • Metabolic pathways
  • Signaling pathways
  • Pathway diagrams
  • Transcription factors / gene regulatory networks
  • Protein-compound interactions
  • Genetic interaction networks
kegg metabolic regulatory pathways
KEGG Metabolic & Regulatory Pathways
  • KEGG is a suite of databases and associated software, integrating our current knowledge
  • on molecular interaction networks, the information of genes and proteins, and of chemical
  • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)


biocyc ecocyc metacyc metabolic pathways
BioCyc: EcoCyc/MetaCyc Metabolic Pathways
  • The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)
biocarta cellular pathways
BioCarta Cellular Pathways


reactome http www reactome org
  • Collaboration of CSHL, EBI and GO Consortium
  • Curated resource of core pathways and reactions in human biology
  • Authored by biological researchers of field experts
  • Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG…
  • Inferred orthologous events in 22 non-human species (mouse, rat…)
transforming growth factor tgf beta signaling homo sapiens
Transforming Growth Factor (TGF) beta signaling [Homo sapiens]


Reactome: events and objects (including modified forms and complex)

Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens]

Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]

Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]

Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]

Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus

Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……

protein protein interaction database intact
Protein-Protein Interaction Database - IntAct


gene ontology go
Gene Ontology (GO)


- Molecular Function

- Biological Process

- Cellular Component

iv databases of protein structures
IV. Databases of Protein Structures
  • Protein Structure
    • PDB: Structure Determined by X-ray Crystallography and NMR
    • PDBsum: Summaries and analyses of PDB structures
    • MMDB: NCBI’s database of 3D structures, part of NCBI Entrez
    • SWISS-MODEL Repository: Database of annotated protein 3D models
    • ModBase: Annotated comparative protein structure models
  • Structure Classification
    • CATH: Hierarchical Classification of Protein Domain Structures
    • SCOP: Familial and Structural Protein Relationships
    • FSSP: Protein Fold Classification Based on Structure--Structure Alignment
pdb experimental 3d structure repository
PDB: Experimental 3D Structure Repository

Rat gamma-crystallin (chain A, B.)

Can you do a text search at PIR to find this (CRGE_RAT)?



Pictorial Database to Provide Summary and Analysis to PDB Entries


3-D structure summary

2-D structure


protein structural classification 1
Protein Structural Classification (1)

CATH: Hierarchical domain classification of protein structures (http://www.cathdb.info/latest/index.html)

protein structural classification 2
Protein Structural Classification (2)

SCOP:comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.


swiss model repository
SWISS-MODEL Repository

A database of annotated three-dimensional comparative protein structure models(http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)

vi proteomic resources
VI. Proteomic Resources
  • GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with completed genomes.
  • SWISS-2DPAGE (http://www.expasy.org/ch2d/): index of 2D-gels
  • PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences
  • Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets
  • PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databases
  • GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databases
2d gel image databases
2D-Gel Image Databases


Part of WORLD-2DPAGE: index to 2-D PAGE databases and services



GPMdb: MS Data Search


Craig, et al., J Proteome Res. 2004, 3:1234-42.

pride centralized standards compliant public data repository for proteomics data
PRIDE: centralized, standards compliant, public data repository for proteomics data


HUPO Plasma Proteome Project



  • Text search / Information retrieval
    • Literature search and text mining
      • Finding synonyms (BioThesaurus)
      • Information extraction (e.g., protein phosphorylation sites)
    • Find the sequence for the rabbit alpha crystallin A chain
    • Find all alpha crystallin A chain classified in protein families
    • Search crystallins that have active enzyme activities
    • Find crystallins that have determined 3D structures
  • Database contents (reports)
    • Sequence & genomics databases (UniProt)
    • Protein family databases (PIRSF)
    • Database of protein functions (KEGG)
    • Databases of protein structures (PDB)
    • Proteomics databases (Swiss-2D)
  • Protein Examples
  • Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493)
  • Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058)
  • Any additional proteins of your interest for search and retrieval