Download

The STRING database






Advertisement
/ 103 []
Download Presentation
Comments
paul
From:
|  
(4196) |   (0) |   (0)
Views: 166 | Added:
Rate Presentation: 1 0
Description:
The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.
The STRING database

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




The string database l.jpgSlide 1

The STRING database

Michael Kuhn

EMBL Heidelberg

Protein interactions l.jpgSlide 2

protein interactions

Example l.jpgSlide 9

example

  • Tryptophan synthase beta chain

  • E. Coli K12

Many sources l.jpgSlide 13

many sources

Genomic context l.jpgSlide 14

genomic context

Curated knowledge l.jpgSlide 15

curated knowledge

Experimental evidence l.jpgSlide 16

experimental evidence

T

Literature l.jpgSlide 17

literature

373 genomes l.jpgSlide 18

373 genomes

  • (only completely sequenced genomes)

1 5 million genes l.jpgSlide 19

1.5 million genes

  • (not proteins)

Genome reviews l.jpgSlide 20

Genome Reviews

Refseq l.jpgSlide 21

RefSeq

Ensembl l.jpgSlide 22

Ensembl

Model organism databases l.jpgSlide 23

model organism databases

Data integration l.jpgSlide 24

data integration

Genomic context methods l.jpgSlide 25

genomic context methods

Gene fusion l.jpgSlide 26

gene fusion

Gene neighborhood l.jpgSlide 27

gene neighborhood

Phylogenetic profiles l.jpgSlide 28

phylogenetic profiles

Slide32 l.jpgSlide 32

Cell

Cellulosomes

Cellulose

Automatic inference of interactions l.jpgSlide 33

automatic inferenceof interactions

Correct interactions l.jpgSlide 34

correct interactions

Wrong associations l.jpgSlide 35

wrong associations

Gene fusion36 l.jpgSlide 36

gene fusion

  • score: sequence similarity

Gene neighborhood37 l.jpgSlide 37

gene neighborhood

  • score: sum of intergenic distances

Phylogenetic profiles38 l.jpgSlide 38

phylogenetic profiles

Slide39 l.jpgSlide 39

SVD

  • singular value decomposition

  • (removes redundancy)

Score euclidean distance l.jpgSlide 40

score: Euclidean distance

All scores are raw scores l.jpgSlide 41

all scores are “raw scores”

Not comparable l.jpgSlide 42

not comparable

  • sequence similarity

  • sum of intergenic distances

  • Euclidean distance

Benchmarking l.jpgSlide 43

benchmarking

  • calibrate against “gold standard”

  • (KEGG)

Raw scores l.jpgSlide 45

raw scores

Probabilistic scores l.jpgSlide 46

probabilistic scores

  • e.g. “70% chance for an assocation”

Curated knowledge47 l.jpgSlide 47

curated knowledge

Slide48 l.jpgSlide 48

KEGG

  • Kyoto Encyclopedia of Genes

Reactome l.jpgSlide 49

Reactome

Slide50 l.jpgSlide 50

GO

  • Gene Ontology

Primary experimental data l.jpgSlide 51

primary experimental data

Many sources52 l.jpgSlide 52

many sources

Many parsers l.jpgSlide 53

many parsers

Slide54 l.jpgSlide 54

BIND

  • Biomolecular Interaction Network Database

Slide55 l.jpgSlide 55

GRID

  • General Repository for Interaction Datasets

Slide56 l.jpgSlide 56

HPRD

  • Human Protein Reference Database

Co expression l.jpgSlide 57

co-expression

  • microarray data

Slide58 l.jpgSlide 58

GEO

  • Gene Expression Omnibus

Correlation coefficient l.jpgSlide 59

correlation coefficient

Literature mining l.jpgSlide 60

literature mining

Different gene identifiers l.jpgSlide 61

different gene identifiers

Synonyms list l.jpgSlide 62

synonyms list

Medline l.jpgSlide 63

Medline

Slide64 l.jpgSlide 64

SGD

  • Saccharomyces Genome Database

The interactive fly l.jpgSlide 65

The Interactive Fly

Slide66 l.jpgSlide 66

OMIM

  • Online Mendelian Inheritance in Man

Simple scheme l.jpgSlide 67

simple scheme

Co mentioning l.jpgSlide 68

co-mentioning

More advanced l.jpgSlide 69

more advanced

Slide70 l.jpgSlide 70

NLP

  • Natural Language Processing

Slide71 l.jpgSlide 71

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

The expression of

the cytochrome genes

CYC1 and CYC7

is controlled by

HAP1

Calibrate against gold standard l.jpgSlide 72

calibrate against gold standard

Combine all evidence l.jpgSlide 74

combine all evidence

Bayesian scoring scheme l.jpgSlide 75

Bayesian scoring scheme

E g two scores of 0 7 combined probability l.jpgSlide 76

e.g.: two scores of 0.7combined probability: ?

E g two scores of 0 7 combined probability 0 91 l.jpgSlide 77

e.g.: two scores of 0.7combined probability: 0.91

  • 1 - (1-0.7)2 = 0.91

Evidence transfer l.jpgSlide 78

evidence transfer

Evidence spread over many species l.jpgSlide 79

evidence spread over many species

Transfer by orthology l.jpgSlide 80

transfer by orthology

  • (or “fuzzy orthology”)

Slide81 l.jpgSlide 81

von Mering et al., Nucleic Acids Research, 2005

Slide82 l.jpgSlide 82

von Mering et al., Nucleic Acids Research, 2005

Two modes l.jpgSlide 83

two modes

Cog mode l.jpgSlide 86

COG mode

Slide87 l.jpgSlide 87

von Mering et al., Nucleic Acids Research, 2005

Higher coverage lower specificity l.jpgSlide 88

higher coveragelower specificity

  • includes all available evidence

  • some orthologous groups are too large to be meaningful

Proteins mode l.jpgSlide 89

proteins mode

Slide90 l.jpgSlide 90

von Mering et al., Nucleic Acids Research, 2005

Maximum specificity lower coverage l.jpgSlide 91

maximum specificitylower coverage

  • information will be relevant for selected species

Slide92 l.jpgSlide 92

Demo

Outlook l.jpgSlide 94

outlook

Take home message l.jpgSlide 97

take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes

Acknowledgements l.jpgSlide 98

Acknowledgements

  • The STRING team

  • Lars Jensen

  • Peer Bork

  • Christian von Mering & group in Zurich

  • Berend Snel

  • Martijn Huynen

Thank you for your attention l.jpgSlide 99

Thank you for your attention

Take home message100 l.jpgSlide 100

take home message

  • STRING integrates information and predicts interactions

  • You can always go to the sources

  • Proteins mode: specific species

  • COG mode: more coverage, especially for prokaryotic genes

Exercises tinyurl com 36twzq or via course wiki alternative server xi embl de l.jpgSlide 101

Exercises:tinyurl.com/36twzq(or via course wiki)Alternative server:xi.embl.de

Slide103 l.jpgSlide 103

Bork et al., Current Opinion in Structural Biology, 2004


Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro