The string database l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 103

The STRING database PowerPoint PPT Presentation


  • 506 Views
  • Updated On :
  • Presentation posted in: General

The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.

Download Presentation

The STRING database

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The STRING database

Michael Kuhn

EMBL Heidelberg


protein interactions


example

  • Tryptophan synthase beta chain

  • E. Coli K12


many sources


genomic context


curated knowledge


experimental evidence

T


literature


373 genomes

  • (only completely sequenced genomes)


1.5 million genes

  • (not proteins)


Genome Reviews


RefSeq


Ensembl


model organism databases


data integration


genomic context methods


gene fusion


gene neighborhood


phylogenetic profiles


Cell

Cellulosomes

Cellulose


automatic inferenceof interactions


correct interactions


wrong associations


gene fusion

  • score: sequence similarity


gene neighborhood

  • score: sum of intergenic distances


phylogenetic profiles


SVD

  • singular value decomposition

  • (removes redundancy)


score: Euclidean distance


all scores are “raw scores”


not comparable

  • sequence similarity

  • sum of intergenic distances

  • Euclidean distance


benchmarking

  • calibrate against “gold standard”

  • (KEGG)


raw scores


probabilistic scores

  • e.g. “70% chance for an assocation”


curated knowledge


KEGG

  • Kyoto Encyclopedia of Genes


Reactome


GO

  • Gene Ontology


primary experimental data


many sources


many parsers


BIND

  • Biomolecular Interaction Network Database


GRID

  • General Repository for Interaction Datasets


HPRD

  • Human Protein Reference Database


co-expression

  • microarray data


GEO

  • Gene Expression Omnibus


correlation coefficient


literature mining


different gene identifiers


synonyms list


Medline


SGD

  • Saccharomyces Genome Database


The Interactive Fly


OMIM

  • Online Mendelian Inheritance in Man


simple scheme


co-mentioning


more advanced


NLP

  • Natural Language Processing


Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

The expression of

the cytochrome genes

CYC1 and CYC7

is controlled by

HAP1


calibrate against gold standard


combine all evidence


Bayesian scoring scheme


e.g.: two scores of 0.7combined probability: ?


e.g.: two scores of 0.7combined probability: 0.91

  • 1 - (1-0.7)2 = 0.91


evidence transfer


evidence spread over many species


transfer by orthology

  • (or “fuzzy orthology”)


von Mering et al., Nucleic Acids Research, 2005


von Mering et al., Nucleic Acids Research, 2005


two modes


COG mode


von Mering et al., Nucleic Acids Research, 2005


higher coveragelower specificity

  • includes all available evidence

  • some orthologous groups are too large to be meaningful


proteins mode


von Mering et al., Nucleic Acids Research, 2005


maximum specificitylower coverage

  • information will be relevant for selected species


Demo


outlook


take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes


Acknowledgements

  • The STRING team

  • Lars Jensen

  • Peer Bork

  • Christian von Mering & group in Zurich

  • Berend Snel

  • Martijn Huynen


Thank you for your attention


take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes


Exercises:tinyurl.com/36twzq(or via course wiki)Alternative server:xi.embl.de


Bork et al., Current Opinion in Structural Biology, 2004


  • Login