Download

The STRING database






Advertisement
/ 103 []
Download Presentation
Comments
paul
From:
|  
(4192) |   (0) |   (0)
Views: 155 | Added:
Rate Presentation: 1 0
Description:
The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.
The STRING database

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




Slide 1

The STRING database

Michael Kuhn

EMBL Heidelberg

Slide 2

protein interactions

Slide 9

example

  • Tryptophan synthase beta chain

  • E. Coli K12

Slide 13

many sources

Slide 14

genomic context

Slide 15

curated knowledge

Slide 16

experimental evidence

T

Slide 17

literature

Slide 18

373 genomes

  • (only completely sequenced genomes)

Slide 19

1.5 million genes

  • (not proteins)

Slide 20

Genome Reviews

Slide 21

RefSeq

Slide 22

Ensembl

Slide 23

model organism databases

Slide 24

data integration

Slide 25

genomic context methods

Slide 26

gene fusion

Slide 27

gene neighborhood

Slide 28

phylogenetic profiles

Slide 32

Cell

Cellulosomes

Cellulose

Slide 33

automatic inferenceof interactions

Slide 34

correct interactions

Slide 35

wrong associations

Slide 36

gene fusion

  • score: sequence similarity

Slide 37

gene neighborhood

  • score: sum of intergenic distances

Slide 38

phylogenetic profiles

Slide 39

SVD

  • singular value decomposition

  • (removes redundancy)

Slide 40

score: Euclidean distance

Slide 41

all scores are “raw scores”

Slide 42

not comparable

  • sequence similarity

  • sum of intergenic distances

  • Euclidean distance

Slide 43

benchmarking

  • calibrate against “gold standard”

  • (KEGG)

Slide 45

raw scores

Slide 46

probabilistic scores

  • e.g. “70% chance for an assocation”

Slide 47

curated knowledge

Slide 48

KEGG

  • Kyoto Encyclopedia of Genes

Slide 49

Reactome

Slide 50

GO

  • Gene Ontology

Slide 51

primary experimental data

Slide 52

many sources

Slide 53

many parsers

Slide 54

BIND

  • Biomolecular Interaction Network Database

Slide 55

GRID

  • General Repository for Interaction Datasets

Slide 56

HPRD

  • Human Protein Reference Database

Slide 57

co-expression

  • microarray data

Slide 58

GEO

  • Gene Expression Omnibus

Slide 59

correlation coefficient

Slide 60

literature mining

Slide 61

different gene identifiers

Slide 62

synonyms list

Slide 63

Medline

Slide 64

SGD

  • Saccharomyces Genome Database

Slide 65

The Interactive Fly

Slide 66

OMIM

  • Online Mendelian Inheritance in Man

Slide 67

simple scheme

Slide 68

co-mentioning

Slide 69

more advanced

Slide 70

NLP

  • Natural Language Processing

Slide 71

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

The expression of

the cytochrome genes

CYC1 and CYC7

is controlled by

HAP1

Slide 72

calibrate against gold standard

Slide 74

combine all evidence

Slide 75

Bayesian scoring scheme

Slide 76

e.g.: two scores of 0.7combined probability: ?

Slide 77

e.g.: two scores of 0.7combined probability: 0.91

  • 1 - (1-0.7)2 = 0.91

Slide 78

evidence transfer

Slide 79

evidence spread over many species

Slide 80

transfer by orthology

  • (or “fuzzy orthology”)

Slide 81

von Mering et al., Nucleic Acids Research, 2005

Slide 82

von Mering et al., Nucleic Acids Research, 2005

Slide 83

two modes

Slide 86

COG mode

Slide 87

von Mering et al., Nucleic Acids Research, 2005

Slide 88

higher coveragelower specificity

  • includes all available evidence

  • some orthologous groups are too large to be meaningful

Slide 89

proteins mode

Slide 90

von Mering et al., Nucleic Acids Research, 2005

Slide 91

maximum specificitylower coverage

  • information will be relevant for selected species

Slide 92

Demo

Slide 94

outlook

Slide 97

take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes

Slide 98

Acknowledgements

  • The STRING team

  • Lars Jensen

  • Peer Bork

  • Christian von Mering & group in Zurich

  • Berend Snel

  • Martijn Huynen

Slide 99

Thank you for your attention

Slide 100

take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes

Slide 101

Exercises:tinyurl.com/36twzq(or via course wiki)Alternative server:xi.embl.de

Slide 103

Bork et al., Current Opinion in Structural Biology, 2004


Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro