Improving sensitivity by combining results from multiple search methodologies
Download
1 / 45

Improving Sensitivity by Combining Results from Multiple Search Methodologies - PowerPoint PPT Presentation


  • 325 Views
  • Uploaded on

Improving Sensitivity by Combining Results from Multiple Search Methodologies . Brian C. Searle Proteome Software Inc. Portland, OR [email protected] MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005) . The Analytical Challenge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Improving Sensitivity by Combining Results from Multiple Search Methodologies' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Improving sensitivity by combining results from multiple search methodologies

Improving Sensitivity by Combining Results from Multiple Search Methodologies

Brian C. Searle

Proteome Software Inc.

Portland, OR

[email protected]

MBI workshop on Computational Proteomics and Mass Spectrometry

(January 11-14, 2005)


The analytical challenge
The Analytical Challenge Search Methodologies

Biological Samples

Control Experiments

Q-TOF

Unknown

Spectra

IonTrap


The analytical challenge1
The Analytical Challenge Search Methodologies

  • Why can you only interpret half as much MS/MS data in experiments you actually care about?

  • What is going on with the remaining 90% unidentified spectra?


The opensea approach
The OpenSea Approach Search Methodologies

De Novo Sequence:

YD[Cc]DD[220]GADHFTY[200]R

OpenSea Alignment:

Crystallin, S (CRBS_HUMAN)

GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS

|| | | X X X || | || | |

YD(Cc)D(D )([220])(G )AD(HF)TY([200])R


de novo Search Methodologies Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…


de novo Search Methodologies Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…

G-57

R-156

R-156

Y-163

D-115

C-160

D-115

C-160

D-115

C-160

A-71

Database Sequence


de novo Search Methodologies Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…

G-57

R-156

R-156

Y-163

D-115

C-160

D-115

C-160

D-115

C-160

A-71

Database Sequence


Auto interpretation of opensea results
Auto-Interpretation of OpenSea Results Search Methodologies

OpenSea Alignment:

GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS

|| | | X X X || | || | |

YD(Cc)D(D )([220])(G )AD(HF)TY([200])R

+14 AMU on either cysteine or -43 AMU on aspartic acid… Modification lookup table suggests methylation of cysteine!

Auto-Interpretation:

GRRYD(Cc)D( CmDCc )AD(FH)TY( LS )RCNS

|| | | : || | || | |

YD(Cc)D(D[220]G)AD(HF)TY([200])R


Spectrum identification overlap between search methods
Spectrum Identification Overlap Between Search Methods Search Methodologies

SEQUEST

6%

17%

7%

41%

X!Tandem

10%

10%

OpenSea

PTMs

polymorphisms

9%


Spectrum identification overlap between search methods1
Spectrum Identification Overlap Between Search Methods Search Methodologies

SEQUEST neutral losses

6%

17%

7%

41%

X!Tandem

semi-tryptic

no ladder

10%

10%

OpenSea

9%


Scaffold data compiler
Scaffold Data Compiler Search Methodologies

  • Combine SEQUEST, Mascot, X!Tandem, and OpenSea results

  • Utilize spectrum clustering and noise filters to remove uninteresting spectra

  • Export interesting, unidentified spectra for further analysis

Search

Wider

Drill

Deeper

Remove

Junk

Focus

Efforts

Combine

Database

Searching

IDs

Cluster

Spectra to

Previously

IDs

Report

Interesting,

Unidentified

Spectra

Filter

Electronic

Noise

For All

Spectra


Combining sequest and x tandem scores
Combining SEQUEST Search Methodologies and X!Tandem Scores

X!Tandem –log(E-Value) Score

SEQUEST Descriminant Score (Peptide Prophet, ISB)


Combining sequest and x tandem scores1
Combining SEQUEST Search Methodologies and X!Tandem Scores

X!Tandem –log(E-Value) Score

SEQUEST Descriminant Score (Peptide Prophet, ISB)


Peptide prophet isb
Peptide Prophet (ISB) Search Methodologies

Incorrect IDs

p=50%

Correct IDs


Protein prophet isb
Protein Prophet (ISB) Search Methodologies

Protein 1

Protein 7

Peptide 1

Protein 4

Peptide 2

Peptide 3

Protein 2

Protein 8

Peptide 4

Protein 5

Peptide 5

Protein 3

Peptide 6

Protein 6

Peptide 7


Protein prophet isb1
Protein Prophet (ISB) Search Methodologies

Protein 1

Protein 7

Peptide 1

Protein 4

Peptide 2

Peptide 3

Protein 2

Protein 8

Peptide 4

Protein 5

Peptide 5

Protein 3

Peptide 6

Protein 6

Peptide 7


Incorrect IDs Search Methodologies

p(NSP|-)

Correct IDs

p(NSP|+)

Normalized Distribution

For each

spectrum…

IDs with:

high NSP--p

Low NSP--p

NSP Bin Number

Log p(NSP|+)/p(NSP|-)

Correct IDs have

higher NSP Values


Peptide Search Methodologies

Prophet

Protein

Prophet

Get

SEQUEST

IDs

Calculate

SEQUEST

Probability

Get

Mascot

IDs

Calculate

Mascot

Probability

Calculate

Combined

Peptide

Probability

For Each

Spectrum

Calculate

Protein

Probabilities

Get

X!Tandem

IDs

Calculate

X!Tandem

Probability

Scaffold

Merge

Prophet

Get

OpenSea

IDs

Calculate

OpenSea

Probability


Peptide 1 Search Methodologies

Get

SEQUEST

Identification

p=85%

p=76%

Get

Mascot

Identification

Peptide 2

For Each

Spectrum

Get

X!Tandem

Identification

p=54%

Peptide 3

Get

OpenSea

Identification


Peptide 1 Search Methodologies

Get

SEQUEST

Identification

Peptide 4

p=27%

Get

Mascot

Identification

Peptide 2

p=81%

For Each

Spectrum

Peptide 5

Get

X!Tandem

Identification

p=35%

Peptide 3

Get

OpenSea

Identification


Peptide 1 Search Methodologies

Peptide 7

Get

SEQUEST

Identification

Peptide 4

Get

Mascot

Identification

Peptide 2

Peptide 8

For Each

Spectrum

Peptide 5

Get

X!Tandem

Identification

Peptide 3

Peptide 6

Get

OpenSea

Identification


Protein Prophet’s NSP value Search Methodologies

(number of sibling peptides)

becomes…

Merge Prophet’s

number of sibling programs


Incorrect IDs Search Methodologies

p(NSP|-)

Correct IDs

p(NSP|+)

Normalized Distribution

For each

spectrum…

IDs with:

high NSP--p

Low NSP--p

NSP Bin Number

Log p(NSP|+)/p(NSP|-)

Correct IDs have

higher NSP Values


Accuracy of the probability combining model
Accuracy of the Probability Search Methodologies Combining Model

Mascot

X!Tandem

Calculated Probability

Combination

SEQUEST

Actual Probability


Percentage of qtof spectra correctly identified as control proteins
Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

SEQUEST

(40%)

Unknown

Spectra

(60%)


Percentage of qtof spectra correctly identified as control proteins1
Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(60%)

Unknown

Spectra

(40%)


Percentage of qtof spectra correctly identified as control proteins2
Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(73%)

Unknown

Spectra

(27%)


#1 Proteins

#2


#1 Proteins

#2


#1 Proteins

#2


#1 Proteins

#2


#2 Proteins

#3


Protein Proteins

Prophet

Find Spectra

Similar to

Previously

Identified

Report

Interesting,

Unidentified

Spectra

Calculate

Combined

Probability

Calculate

Protein

Probabilities

Filter

Electronic

Noise

Scaffold

Merge

Prophet

Scaffold

Cluster

Prophet


Cluster prophet principle
Cluster Prophet Principle Proteins

If an unidentified spectrum is 95% similar to a correctly identified spectrum…

it is also considered to be identified.


Rank based cluster similarity score
Rank-Based Cluster Similarity Score Proteins

Incorrect IDs

p=50%

Correct IDs


Ms ms spectrum filter
MS/MS Spectrum Filter Proteins

  • Dynamic range filter removes spectra from peptides with poor/no fragmentation

  • Signal to noise filter removes electronic noise


Percentage of qtof spectra correctly identified as control proteins3
Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(73%)

Unknown

Spectra

(27%)


Percentage of qtof spectra correctly identified as control proteins4
Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(74%)

Unknown

Spectra

(5%)

Not

Interesting

(21%)


Percentage of 2d lc qtof spectra correctly identified as lens proteins
Percentage of 2D-LC QTOF Spectra Correctly Identified as Lens Proteins

Identified By

Scaffold

(48%)

Unknown

Spectra

(21%)

Not

Interesting

(31%)


The analytical challenge2
The Analytical Challenge Lens Proteins

Biological Samples

Control Experiments

IDed by

SEQUEST

IDed by

SEQUEST

Q-TOF

Unknown

Spectra

Unknown

Spectra

IDed by

SEQUEST

IDed by

SEQUEST

IonTrap

Unknown

Spectra

Unknown

Spectra


The analytical challenge3
The Analytical Challenge Lens Proteins

Biological Samples

Control Experiments

IDed by

Scaffold

IDed by

Scaffold

Q-TOF

Unknown

Spectra

Unknown

Spectra

85% more IDs

95% comprehension

336% more IDs

79% comprehension

IDed by

Scaffold

IDed by

Scaffold

IonTrap

Unknown

Spectra

Unknown

Spectra

48% more IDs

65% comprehension

227% more IDs

75% comprehension


Conclusions
Conclusions Lens Proteins

  • Using Scaffold technologies, you can drill deeper and search wider using multiple database searching approaches and MS/MS spectrum clustering

  • Scaffold and implementations of Peptide/Protein Prophet were written in platform-independent Java

  • Scaffold will be available at ASMS 2005


Acknowledgements

OpenSea Team Lens Proteins

(OHSU)

Srinivasa Nagalla

Surendra Dasari

Ashok Reddy

Larry David

Phil Wilmarth

Ashley McCormack

Contact:

[email protected]

Scaffold Team

(Proteome Software Inc.)

Mark Turner

James Brundege

Contact:

[email protected]

ProteomeSoftware.com

Acknowledgements


ad