The string database
Download
1 / 103

The STRING database - PowerPoint PPT Presentation


  • 540 Views
  • Updated On :

The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The STRING database' - paul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The string database l.jpg

The STRING database

Michael Kuhn

EMBL Heidelberg



Example l.jpg
example

  • Tryptophan synthase beta chain

  • E. Coli K12







373 genomes l.jpg
373 genomes

  • (only completely sequenced genomes)


1 5 million genes l.jpg
1.5 million genes

  • (not proteins)











Slide32 l.jpg

Cell

Cellulosomes

Cellulose


Automatic inference of interactions l.jpg
automatic inferenceof interactions




Gene fusion36 l.jpg
gene fusion

  • score: sequence similarity


Gene neighborhood37 l.jpg
gene neighborhood

  • score: sum of intergenic distances



Slide39 l.jpg
SVD

  • singular value decomposition

  • (removes redundancy)




Not comparable l.jpg
not comparable

  • sequence similarity

  • sum of intergenic distances

  • Euclidean distance


Benchmarking l.jpg
benchmarking

  • calibrate against “gold standard”

  • (KEGG)



Probabilistic scores l.jpg
probabilistic scores

  • e.g. “70% chance for an assocation”



Slide48 l.jpg
KEGG

  • Kyoto Encyclopedia of Genes



Slide50 l.jpg
GO

  • Gene Ontology





Slide54 l.jpg
BIND

  • Biomolecular Interaction Network Database


Slide55 l.jpg
GRID

  • General Repository for Interaction Datasets


Slide56 l.jpg
HPRD

  • Human Protein Reference Database


Co expression l.jpg
co-expression

  • microarray data


Slide58 l.jpg
GEO

  • Gene Expression Omnibus







Slide64 l.jpg
SGD

  • Saccharomyces Genome Database



Slide66 l.jpg
OMIM

  • Online Mendelian Inheritance in Man





Slide70 l.jpg
NLP

  • Natural Language Processing


Slide71 l.jpg

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

The expression of

the cytochrome genes

CYC1 and CYC7

is controlled by

HAP1





E g two scores of 0 7 combined probability l.jpg
e.g.: two scores of 0.7combined probability: ?


E g two scores of 0 7 combined probability 0 91 l.jpg
e.g.: two scores of 0.7combined probability: 0.91

  • 1 - (1-0.7)2 = 0.91



Evidence spread over many species l.jpg
evidence spread over many species


Transfer by orthology l.jpg
transfer by orthology

  • (or “fuzzy orthology”)







Higher coverage lower specificity l.jpg
higher coveragelower specificity

  • includes all available evidence

  • some orthologous groups are too large to be meaningful




Maximum specificity lower coverage l.jpg
maximum specificitylower coverage

  • information will be relevant for selected species




Take home message l.jpg
take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes


Acknowledgements l.jpg
Acknowledgements

  • The STRING team

  • Lars Jensen

  • Peer Bork

  • Christian von Mering & group in Zurich

  • Berend Snel

  • Martijn Huynen



Take home message100 l.jpg
take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes


Exercises tinyurl com 36twzq or via course wiki alternative server xi embl de l.jpg
Exercises:tinyurl.com/36twzq(or via course wiki)Alternative server:xi.embl.de



ad