the string database l.
Download
Skip this Video
Download Presentation
The STRING database

Loading in 2 Seconds...

play fullscreen
1 / 103

The STRING database - PowerPoint PPT Presentation


  • 543 Views
  • Uploaded on

The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The STRING database' - paul


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the string database

The STRING database

Michael Kuhn

EMBL Heidelberg

example
example
  • Tryptophan synthase beta chain
  • E. Coli K12
373 genomes
373 genomes
  • (only completely sequenced genomes)
1 5 million genes
1.5 million genes
  • (not proteins)
slide32

Cell

Cellulosomes

Cellulose

gene fusion36
gene fusion
  • score: sequence similarity
gene neighborhood37
gene neighborhood
  • score: sum of intergenic distances
slide39
SVD
  • singular value decomposition
  • (removes redundancy)
not comparable
not comparable
  • sequence similarity
  • sum of intergenic distances
  • Euclidean distance
benchmarking
benchmarking
  • calibrate against “gold standard”
  • (KEGG)
probabilistic scores
probabilistic scores
  • e.g. “70% chance for an assocation”
slide48
KEGG
  • Kyoto Encyclopedia of Genes
slide50
GO
  • Gene Ontology
slide54
BIND
  • Biomolecular Interaction Network Database
slide55
GRID
  • General Repository for Interaction Datasets
slide56
HPRD
  • Human Protein Reference Database
co expression
co-expression
  • microarray data
slide58
GEO
  • Gene Expression Omnibus
slide64
SGD
  • Saccharomyces Genome Database
slide66
OMIM
  • Online Mendelian Inheritance in Man
slide70
NLP
  • Natural Language Processing
slide71
Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

The expression of

the cytochrome genes

CYC1 and CYC7

is controlled by

HAP1

transfer by orthology
transfer by orthology
  • (or “fuzzy orthology”)
higher coverage lower specificity
higher coveragelower specificity
  • includes all available evidence
  • some orthologous groups are too large to be meaningful
maximum specificity lower coverage
maximum specificitylower coverage
  • information will be relevant for selected species
take home message
take home message
  • STRING integrates information and predicts interactions
  • You can always go to the sources
  • Proteins mode: specific species
  • COG mode: more coverage, especially for prokaryotic genes
acknowledgements
Acknowledgements
  • The STRING team
  • Lars Jensen
  • Peer Bork
  • Christian von Mering & group in Zurich
  • Berend Snel
  • Martijn Huynen
take home message100
take home message
  • STRING integrates information and predicts interactions
  • You can always go to the sources
  • Proteins mode: specific species
  • COG mode: more coverage, especially for prokaryotic genes