Improving sensitivity by combining results from multiple search methodologies
Download
1 / 45

Improving Sensitivity by Combining Results from Multiple Search Methodologies - PowerPoint PPT Presentation

Improving Sensitivity by Combining Results from Multiple Search Methodologies . Brian C. Searle Proteome Software Inc. Portland, OR Brian.Searle@ProteomeSoftware.com MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005) . The Analytical Challenge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Improving Sensitivity by Combining Results from Multiple Search Methodologies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Improving Sensitivity by Combining Results from Multiple Search Methodologies

Brian C. Searle

Proteome Software Inc.

Portland, OR

Brian.Searle@ProteomeSoftware.com

MBI workshop on Computational Proteomics and Mass Spectrometry

(January 11-14, 2005)


The Analytical Challenge

Biological Samples

Control Experiments

Q-TOF

Unknown

Spectra

IonTrap


The Analytical Challenge

  • Why can you only interpret half as much MS/MS data in experiments you actually care about?

  • What is going on with the remaining 90% unidentified spectra?


The OpenSea Approach

De Novo Sequence:

YD[Cc]DD[220]GADHFTY[200]R

OpenSea Alignment:

Crystallin, S (CRBS_HUMAN)

GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS

|| | | X X X || | || | |

YD(Cc)D(D )([220])(G )AD(HF)TY([200])R


de novo Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…


de novo Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…

G-57

R-156

R-156

Y-163

D-115

C-160

D-115

C-160

D-115

C-160

A-71

Database Sequence


de novo Sequence

YD[Cc]DD[220]GADHFTY[200]R

163-115-160-115-115-220-57-71-…

G-57

R-156

R-156

Y-163

D-115

C-160

D-115

C-160

D-115

C-160

A-71

Database Sequence


Auto-Interpretation of OpenSea Results

OpenSea Alignment:

GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS

|| | | X X X || | || | |

YD(Cc)D(D )([220])(G )AD(HF)TY([200])R

+14 AMU on either cysteine or -43 AMU on aspartic acid… Modification lookup table suggests methylation of cysteine!

Auto-Interpretation:

GRRYD(Cc)D( CmDCc )AD(FH)TY( LS )RCNS

|| | | : || | || | |

YD(Cc)D(D[220]G)AD(HF)TY([200])R


Spectrum Identification Overlap Between Search Methods

SEQUEST

6%

17%

7%

41%

X!Tandem

10%

10%

OpenSea

PTMs

polymorphisms

9%


Spectrum Identification Overlap Between Search Methods

SEQUEST neutral losses

6%

17%

7%

41%

X!Tandem

semi-tryptic

no ladder

10%

10%

OpenSea

9%


Scaffold Data Compiler

  • Combine SEQUEST, Mascot, X!Tandem, and OpenSea results

  • Utilize spectrum clustering and noise filters to remove uninteresting spectra

  • Export interesting, unidentified spectra for further analysis

Search

Wider

Drill

Deeper

Remove

Junk

Focus

Efforts

Combine

Database

Searching

IDs

Cluster

Spectra to

Previously

IDs

Report

Interesting,

Unidentified

Spectra

Filter

Electronic

Noise

For All

Spectra


Combining SEQUEST and X!Tandem Scores

X!Tandem –log(E-Value) Score

SEQUEST Descriminant Score (Peptide Prophet, ISB)


Combining SEQUEST and X!Tandem Scores

X!Tandem –log(E-Value) Score

SEQUEST Descriminant Score (Peptide Prophet, ISB)


Peptide Prophet (ISB)

Incorrect IDs

p=50%

Correct IDs


Protein Prophet (ISB)

Protein 1

Protein 7

Peptide 1

Protein 4

Peptide 2

Peptide 3

Protein 2

Protein 8

Peptide 4

Protein 5

Peptide 5

Protein 3

Peptide 6

Protein 6

Peptide 7


Protein Prophet (ISB)

Protein 1

Protein 7

Peptide 1

Protein 4

Peptide 2

Peptide 3

Protein 2

Protein 8

Peptide 4

Protein 5

Peptide 5

Protein 3

Peptide 6

Protein 6

Peptide 7


Incorrect IDs

p(NSP|-)

Correct IDs

p(NSP|+)

Normalized Distribution

For each

spectrum…

IDs with:

high NSP--p

Low NSP--p

NSP Bin Number

Log p(NSP|+)/p(NSP|-)

Correct IDs have

higher NSP Values


Peptide

Prophet

Protein

Prophet

Get

SEQUEST

IDs

Calculate

SEQUEST

Probability

Get

Mascot

IDs

Calculate

Mascot

Probability

Calculate

Combined

Peptide

Probability

For Each

Spectrum

Calculate

Protein

Probabilities

Get

X!Tandem

IDs

Calculate

X!Tandem

Probability

Scaffold

Merge

Prophet

Get

OpenSea

IDs

Calculate

OpenSea

Probability


Peptide 1

Get

SEQUEST

Identification

p=85%

p=76%

Get

Mascot

Identification

Peptide 2

For Each

Spectrum

Get

X!Tandem

Identification

p=54%

Peptide 3

Get

OpenSea

Identification


Peptide 1

Get

SEQUEST

Identification

Peptide 4

p=27%

Get

Mascot

Identification

Peptide 2

p=81%

For Each

Spectrum

Peptide 5

Get

X!Tandem

Identification

p=35%

Peptide 3

Get

OpenSea

Identification


Peptide 1

Peptide 7

Get

SEQUEST

Identification

Peptide 4

Get

Mascot

Identification

Peptide 2

Peptide 8

For Each

Spectrum

Peptide 5

Get

X!Tandem

Identification

Peptide 3

Peptide 6

Get

OpenSea

Identification


Protein Prophet’s NSP value

(number of sibling peptides)

becomes…

Merge Prophet’s

number of sibling programs


Incorrect IDs

p(NSP|-)

Correct IDs

p(NSP|+)

Normalized Distribution

For each

spectrum…

IDs with:

high NSP--p

Low NSP--p

NSP Bin Number

Log p(NSP|+)/p(NSP|-)

Correct IDs have

higher NSP Values


Accuracy of the Probability Combining Model

Mascot

X!Tandem

Calculated Probability

Combination

SEQUEST

Actual Probability


Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

SEQUEST

(40%)

Unknown

Spectra

(60%)


Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(60%)

Unknown

Spectra

(40%)


Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(73%)

Unknown

Spectra

(27%)


#1

#2


#1

#2


#1

#2


#1

#2


#2

#3


Protein

Prophet

Find Spectra

Similar to

Previously

Identified

Report

Interesting,

Unidentified

Spectra

Calculate

Combined

Probability

Calculate

Protein

Probabilities

Filter

Electronic

Noise

Scaffold

Merge

Prophet

Scaffold

Cluster

Prophet


Cluster Prophet Principle

If an unidentified spectrum is 95% similar to a correctly identified spectrum…

it is also considered to be identified.


Rank-Based Cluster Similarity Score

Incorrect IDs

p=50%

Correct IDs


MS/MS Spectrum Filter

  • Dynamic range filter removes spectra from peptides with poor/no fragmentation

  • Signal to noise filter removes electronic noise


Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(73%)

Unknown

Spectra

(27%)


Percentage of QTOF Spectra Correctly Identified as Control Proteins

Identified By

Scaffold

(74%)

Unknown

Spectra

(5%)

Not

Interesting

(21%)


Percentage of 2D-LC QTOF Spectra Correctly Identified as Lens Proteins

Identified By

Scaffold

(48%)

Unknown

Spectra

(21%)

Not

Interesting

(31%)


The Analytical Challenge

Biological Samples

Control Experiments

IDed by

SEQUEST

IDed by

SEQUEST

Q-TOF

Unknown

Spectra

Unknown

Spectra

IDed by

SEQUEST

IDed by

SEQUEST

IonTrap

Unknown

Spectra

Unknown

Spectra


The Analytical Challenge

Biological Samples

Control Experiments

IDed by

Scaffold

IDed by

Scaffold

Q-TOF

Unknown

Spectra

Unknown

Spectra

85% more IDs

95% comprehension

336% more IDs

79% comprehension

IDed by

Scaffold

IDed by

Scaffold

IonTrap

Unknown

Spectra

Unknown

Spectra

48% more IDs

65% comprehension

227% more IDs

75% comprehension


Conclusions

  • Using Scaffold technologies, you can drill deeper and search wider using multiple database searching approaches and MS/MS spectrum clustering

  • Scaffold and implementations of Peptide/Protein Prophet were written in platform-independent Java

  • Scaffold will be available at ASMS 2005


OpenSea Team

(OHSU)

Srinivasa Nagalla

Surendra Dasari

Ashok Reddy

Larry David

Phil Wilmarth

Ashley McCormack

Contact:

nagallas@ohsu.edu

Scaffold Team

(Proteome Software Inc.)

Mark Turner

James Brundege

Contact:

Brian.Searle@

ProteomeSoftware.com

Acknowledgements


ad
  • Login