1 / 43

Other biological databases and ontologies

Other biological databases and ontologies. Biological systems. Sequence data. Protein folding and 3D structure. Taxonomic data Literature. Pathways and networks. Protein families and domains. Small molecules. Whole genome data. Ontologies -GO. Biological systems. Ontologies.

marja
Download Presentation

Other biological databases and ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other biological databases and ontologies

  2. Biological systems Sequence data Protein folding and 3D structure Taxonomic data Literature Pathways and networks Protein families and domains Small molecules Whole genome data Ontologies -GO Biological systems

  3. Ontologies • An ontology is a formal specification of terms and relationships between them –widely used in biology and boinformatics (e.g. taxonomy) • The relationships are important and represented as graphs • Ontology terms should have definitions • Ontologies are machine-readable • They are needed for ordering and comparing large data sets

  4. What’s in a name? • What is a cell?

  5. What’s in a name? • What is a cell?

  6. Ambiguities in naming • The same name can be used to describe different concepts, e.g: • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose • Makes it difficult to compare the information • Solution: use Ontologies and Data Standards

  7. Gene Ontology (GO) http://www.geneontology.org • Controlled vocabulary/ontology • Introduced to provide standardised way of annotating gene products (http://www.geneontology.org) • Used for functional annotation of genes or proteins

  8. GO ontologies • Molecular function: • tasks performed by gene product –e.g. G-protein coupled receptor • Biological process: • broad biological goals accomplished by one or more gene products –e.g. G-protein signaling pathway • Cellular component: • part(s) of a cell of which a gene product is a component; includes extracellular environment of cells –e.g nucleus, membrane etc.

  9. GO term examples • GO terms arranged in DAG • Relationships between terms

  10. How to annotate to GO • See if gene product annotated already e.g. by MOD or GOA • Manual annotation –need evidence codes • Blast2GO • Using GO mapping files (e.g. InterPro, EC, Swiss-Prot keyword)

  11. Multiple GO terms Process mappings: -Cell communication (IPR2GO) -GPCR pathways (SPKW2GO) -GPCR pathways (IDA) Select most manual first, then most specific

  12. Finding existing GO annotation • Small-scale –QuickGO or AmiGO browsers • Large-scale: • GOA FTP site • GOA proteomes (>25% coverage) • GOA human, mouse, rat, cow, zebrafish, Arabidopsis, etc. • GOA UniProt • Proteome Analysis

  13. Searching GOA in QuickGO • http://www.ebi.ac.uk/ego

  14. Uses of GO annotation Microarray data analysis Analysis of high-throughput data Proteomics data analysis GO classification GO classification Larkin JE et al, Physiol Genomics, 2004 Cunliffe HE et al, Cancer Res, 2003

  15. Open Biomedical Ontologies (OBO) http://obo.sourceforge.net • Central web location for accessing well-structured CVs and ontologies for use in the biological and medical sciences. • Provides a simple format for ontologies that encodes terms, relationships between terms and definitions of terms (Not all OBO ontologies use this format however).

  16. Scope of OBO • Anatomy • Animal natural history and life history • Chemical • Development • Ethology • Evidence codes • Experimental conditions • Genomic and proteomic • Metabolomics • OBO relationship types • Phenotype • Taxonomic classification • Vocabularies

  17. Other Biological Databases • Transcription factor binding sites -TRANSFAC • Protein structure databases- PDB, SCOP, CATH • Protein family databases- Pfam, Prints, PROSITE etc. • Chemicals and small molecules -ChEBI • Gene expression databases –GEO, ArrayExpress • Metabolic pathways - Reactome, KEGG • Genome Databases- Ensembl, FlyBase, WormBase etc.

  18. Transcription factor binding sites • TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac • TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess • TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

  19. Protein structure databases • Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ • Repository for solved structures • Can search by PDB code • Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) • Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

  20. Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code

  21. Link to CATH

  22. Protein family databases • Databases that produce signatures for identifying protein families or domains • Used for functional classification of proteins • E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. • Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

  23. InterProScan sequence search Stand-alone version available

  24. Results for protein acc

  25. Example InterPro entry

  26. Chemicals and small molecules • Chemical abstracts- http://www.cas.org/ • ChEBI- http://www.ebi.ac.uk/chebi • KEGG –part of it includes chemicals http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp • MSD-Chem –ligands and chemicals in MSD

  27. CheBI example entry

  28. Hierarchy for chemicals

  29. Gene expression databases • NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ • Stanford microarray database http://genome-www5.stanford.edu/ • Can usually search for experiments or particular expression profiles

  30. GEO search page

  31. Profiles search results

  32. Specific entry and experiment info

  33. ArrayExpress search results

  34. Metabolic Pathways • PATHGUIDE >200 pathways • KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: • Database of chemicals, genes and networks (metabolic, regulatory etc.) • Well-curated and quite specific • EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome • Reactome –curated biological pathways: http://www.reactome.org/ • GenMAPP –pathways contributed by users

  35. Pathway in Reactome

  36. Example of a pathway in BioCyc

  37. Protein-protein interaction databases • Protein-protein interaction databases store pairwise interactions or complexes • IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/ • BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/

  38. Protein-protein interactions

  39. Genome browsers • Integrate sequence & functional data for a genome • Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org • UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase • WormBase –C. elegans: http://www.wormbase.org • PlasmoDB –Plasmodium (malaria): http://plasmodb.org • Etc.

  40. Ensembl genome browser

  41. Ensembl gene view 1

  42. Ensembl gene view 2

  43. Gene within context on chromosome

More Related