proteome analyst n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Proteome Analyst PowerPoint Presentation
Download Presentation
Proteome Analyst

Loading in 2 Seconds...

play fullscreen
1 / 58

Proteome Analyst - PowerPoint PPT Presentation


  • 287 Views
  • Uploaded on

Proteome Analyst. Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors. Proteome Analyst. Duane Szafron, Paul Lu, Russell Greiner, David Wishart, Zhiyong Lu, Brett Poulin, Roman Eisner, John Anvik,Cam Macdonell. Proteome Analyst. Proteome

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Proteome Analyst' - bowie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
proteome analyst

Proteome Analyst

Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors

proteome analyst1

Proteome Analyst

Duane Szafron, Paul Lu, Russell Greiner, David Wishart, Zhiyong Lu, Brett Poulin, Roman Eisner, John Anvik,Cam Macdonell

proteome analyst2
Proteome Analyst
  • Proteome
    • one of many ‘-omes’
    • set of all proteins in an organism
  • Analysis
    • prediction of protein function or localization from sequence data
analyze a protein
Analyze a Protein
  • We have examples of annotated proteins in various protein classes.
  • We have more examples of unannotated proteins.
analyze a protein1
Analyze a Protein
  • We have examples of annotated proteins in various protein classes.
  • We have more examples of unannotated proteins.
  • What do we do?
analyze a protein2
Analyze a Protein
  • We have examples of annotated proteins in various protein classes.
  • We have more examples of unannotated proteins.
  • What do we do?
    • Find homologues to each protein and assume similar function.
analyze a protein3
Analyze a Protein
  • We have examples of annotated proteins in various protein classes.
  • We have more examples of unannotated proteins.
  • What do we do?
    • Find homologues to each protein and assume similar function.
    • Find characteristics of each protein that affect function.
analyzing proteins
Analyzing Proteins
  • One Protein?
analyzing proteins1
Analyzing Proteins
  • One Protein?
    • Just do it.
analyzing proteins2
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
analyzing proteins3
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
    • Post-doc familiar with protein classes.
analyzing proteins4
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
    • Post-doc familiar with protein classes.
  • 50 Proteins?
analyzing proteins5
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
    • Post-doc familiar with protein classes.
  • 50 Proteins?
    • grad student
analyzing proteins6
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
    • Post-doc familiar with protein classes.
  • 50 Proteins?
    • grad student
  • 5000 proteins?
analyzing proteins7
Analyzing Proteins
  • One Protein?
    • Just do it.
  • 5 Proteins?
    • Post-doc familiar with protein classes.
  • 50 Proteins?
    • grad student
  • 5000 proteins?
    • summer students
proteome analyst4
Proteome Analyst
  • High-throughput
  • Transparent
  • Prediction of
    • Protein Function
    • Protein Localization
    • Custom Classification
machine learning task
Machine Learning Task
  • Training
    • INPUT: sequences, classes
    • OUTPUT: Classifier
  • Analysis
    • INPUT: sequences, Classifier
    • OUTPUT: classes
machine learning task1
Machine Learning Task
  • Training
    • INPUT: sequences, classes
    • OUTPUT: Classifier
  • Analysis
    • INPUT: sequences, Classifier
    • OUTPUT: classes, explanation
training
Training
  • INPUT
    • sequences, classes
  • PA Tools
    • sequences  features
  • ML Algorithm
    • features, classes  Classifier
  • OUTPUT
    • Classifier
training input
Training: INPUT

>class A<Training Seq 1

MVGSGLLWLALVSCILTQASAVQRGYGN

PIEASSYGL...

>class B<Training Seq 2

LLDEPFRSTENSAGSQGCDKNMSGWYRF

VGEGGVRMS...

>class B<Training Seq 3

EVIAYLRDPNCSSILQTEERNWVSVTSP

VQASACRNI...

.

.

.

training input1
Training: INPUT

classes

>class A<Training Seq 1

MVGSGLLWLALVSCILTQASAVQRGYGN

PIEASSYGL...

>class B<Training Seq 2

LLDEPFRSTENSAGSQGCDKNMSGWYRF

VGEGGVRMS...

>class B<Training Seq 3

EVIAYLRDPNCSSILQTEERNWVSVTSP

VQASACRNI...

.

.

.

protein sequences

training pa tools
Training: PA Tools
  • sequences  features
training pa tools1
Training: PA Tools
  • sequences  features
  • Homology Tools (BLAST)
    • sequence  homologues
    • homologues  annotations
    • annotations  features
homology tool
Homology Tool
  • sequence  features

sequence

seq DB

BLAST

homologues

retrieve

parse

annotations

features

homology tool1
Homology Tool
  • sequence  features

sequence

DBSOURCE swissprot: locus MPPB_NEUCR, ...

xrefs (non-sequence databases): ...

InterProIPR001431,...

KEYWORDS Hydrolase; Metalloprotease; Zinc; Mitochondrion; Transit peptide; Oxidoreductase; Electron transport; Respiratory chain.

seq DB

BLAST

homologues

retrieve

parse

annotations

features

homology tool2
Homology Tool
  • sequence  features

sequence

seq DB

BLAST

homologues

retrieve

parse

annotations

features

training pa tools2
Training: PA Tools
  • sequences  features
  • Homology Tools (BLAST)
    • sequence  homologues
    • homologues  annotations
    • annotations  features
  • Pattern Tools (PFAM, ProSite, …)
    • sequences  motifs
    • motifs  features
pattern tool
Pattern Tool
  • sequence  features

sequence

pattern

DB

find

patterns

parse

features

pattern tool1
Pattern Tool
  • sequence  features

sequence

pattern

DB

find

Pfam; PF00234; tryp_alpha_amyl; 1.

PROSITE; PS00940; GAMMA_THIONIN; 1.

PROSITE; PS00305; 11S_SEED_STORAGE; 1.

patterns

parse

features

pattern tool2
Pattern Tool
  • sequence  features
  • not included in current results

sequence

pattern

DB

find

patterns

parse

features

training ml algorithm
Training: ML Algorithm
  • features, classes  Classifier
training ml algorithm1
Training: ML Algorithm
  • features, classes  Classifier
  • any ML Algorithm may be used
  • default = naïve Bayes
    • consistently near-best accuracy

(SVM, ANN slightly better)

    • efficient (for high-throughput)
    • easy to interpret
analysis classification
Analysis (Classification)
  • INPUT
    • sequences
  • PA Tools
    • sequences  features
  • Classifier
    • features  classes, explanation
  • OUTPUT
    • classes
analysis input
Analysis: INPUT

>Seq 1

DTILNINFQCAYPLDMKVSLQAALQPIV

SSLNVSVDG...

>Seq 2

AVELSVESVLYVGAILEQGDTSRFNLVL

RNCYATPTE...

>Seq 3

HVEENGQSSESRFSVQMFMFAGHYDLVF

LHCEIHLCD...

.

.

.

analysis input1
Analysis: INPUT

>Seq 1

DTILNINFQCAYPLDMKVSLQAALQPIV

SSLNVSVDG...

>Seq 2

AVELSVESVLYVGAILEQGDTSRFNLVL

RNCYATPTE...

>Seq 3

HVEENGQSSESRFSVQMFMFAGHYDLVF

LHCEIHLCD...

.

.

.

protein sequences

analysis pa tools
Analysis: PA Tools
  • sequences  features
analysis pa tools1
Analysis: PA Tools
  • sequences  features
  • Homology Tools (BLAST)
    • sequence  homologues
    • homologues  annotations
    • annotations  features
  • Pattern Tools (PFAM, ProSite, …)
    • sequences  motifs
    • motifs  features
analysis classification1
Analysis: Classification
  • features  classes
analysis classification2
Analysis: Classification
  • features  classes
  • naïve Bayes
    • returns probabilities of each class for each sequence
    • efficient (for high-throughput)
    • easy to interpret
analysis classification3
Analysis: Classification
  • features  classes, explanation
analysis classification4
Analysis: Classification
  • features  classes, explanation
analysis classification5
Analysis: Classification
  • features  classes, explanation
analysis classification6
Analysis: Classification
  • features  classes, explanation
analysis classification7
Analysis: Classification
  • features  classes, explanation
results general function
Results: General Function
  • GeneQuiz classification
  • 5-fold x-val accuracy on 14 classes
results general function1
Results: General Function
  • GeneQuiz classification
  • 5-fold x-val accuracy on 14 classes
results specific function
Results: Specific Function
  • K+ Ion Channel Proteins
  • 5-fold x-val accuracy on

78 sequences, 4 classes

results specific function1
Results: Specific Function
  • K+ Ion Channel Proteins
  • 5-fold x-val accuracy on

78 sequences, 4 classes

results localization
Results: Localization
  • Sub-cellular localization prediction
  • 3146 sequences from 10 classes
results localization1
Results: Localization
  • Sub-cellular localization prediction
  • 3146 sequences from 10 classes
results
Results
  • Sub-cellular localization prediction
  • 3146 sequences from 10 classes
proteome analyst5
Proteome Analyst
  • High-throughput
  • Transparent
  • Prediction of
    • Protein Function
    • Protein Localization
    • Custom Classification
acknowledgements
Acknowledgements
  • Student developers
    • Cynthia Luk
    • Samer Nassar
    • Kevin McKee
  • Biologists
    • Warren Gallin
    • Kathy Magor
  • Data
    • Nair and Rost
acknowledgements1
Acknowledgements
  • Funding
    • PENCE – Protein Engineering Network of Centres of Excellence
    • NSERC - National Science and Engineering Research Council
    • Sun Microsystems
    • AICML - Alberta Ingenuity Centre for Machine Learning
acknowledgements2
Acknowledgements
  • Many ‘-ome’ jokes
    • my wife, Jen
contact
Contact
  • http://www.cs.ualberta.ca/~bioinfo/PA
  • poulin@cs.ualberta.ca