1 / 37

Transmembrane Protein Prediction

Transmembrane Protein Prediction. Project Presentation CMPUT 606. Overview. Transmembrane (TM) protein: Associated with the plasma membrane “A protein that has domains exposed on both sides of the membrane” [Genes VII]

baylee
Download Presentation

Transmembrane Protein Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transmembrane Protein Prediction Project Presentation CMPUT 606

  2. Overview • Transmembrane (TM) protein: • Associated with the plasma membrane • “A protein that has domains exposed on both sides of the membrane” [Genes VII] • Some of the TM proteins that span the lipid layer several times form a hydrophilic channel that permits various ions and molecules to circulate through the plasma membrane.

  3. Transmembrane Proteins

  4. Transmembrane Segments

  5. Ion Channels

  6. Transmembrane Domains

  7. Data Sets

  8. Predictors • ePST • bPST • TMHMM • TMpred • HMMTOP • HMMer • TMDET

  9. Predictors

  10. Predictors Performance: Theoretical Time

  11. TMHMM • Short form prediction • sp_1xqe_A len=418 ExpAA=243.54 First60=39.67 PredHel=11 • Topology=o10-32i45-67o98-120i127-149o159-181i193-215o225-247i259-281o285-302i315-337o352-374i

  12. TMHMM

  13. TMpred

  14. TMpred

  15. HMMTOP

  16. TMDET

  17. HMMer Flow

  18. Scores for complete sequences (score includes all domains): Sequence Description Score E-value N -------- ----------- ----- ------- --- nontm|1ALO._ OXIDOREDUCTASE -20.6 4.7 1 nontm|1CDE._ TRANSFERASE(FORMYL) -26.1 9.9 1 nontm|1AKO._ NUCLEASE -27.4 10 1 nontm|1ARU._ PEROXIDASE -37.1 10 1 sp|1pv7_A -41.7 10 1 sp|1pw4_A -46.0 10 1 sp|1pxs_A -48.9 10 1 sp|1xqe_A -49.0 10 1 sp|1r2c_L -53.2 10 1 nontm|1HSB.B HISTOCOMPATIBILITY -61.4 10 1 Parsed for domains: Sequence Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- nontm|1ALO._ 1/1 125 323 .. 1 199 [] -20.6 4.7 nontm|1CDE._ 1/1 4 202 .. 1 199 [] -26.1 9.9 nontm|1AKO._ 1/1 5 202 .. 1 199 [] -27.4 10 nontm|1ARU._ 1/1 112 295 .. 1 199 [] -37.1 10 sp|1pv7_A 1/1 116 314 .. 1 199 [] -41.7 10 sp|1pw4_A 1/1 162 329 .. 1 199 [] -46.0 10 sp|1pxs_A 1/1 51 249 .] 1 199 [] -48.9 10 sp|1xqe_A 1/1 39 226 .. 1 199 [] -49.0 10 sp|1r2c_L 1/1 62 260 .. 1 199 [] -53.2 10 nontm|1HSB.B 1/1 2 99 .] 1 199 [] -61.4 10 HMMer

  19. HMMer Total sequences searched: 10 Whole sequence top hits: tophits_s report: Total hits: 10 Satisfying E cutoff: 9 Total memory: 16K Domain top hits: tophits_s report: Total hits: 10 Satisfying E cutoff: 10 Total memory: 22K

  20. ePST Output TM# Start End 1 12 24 2 50 61 3 101 112 4 130 142 5 163 166 6 168 175 7 199 201 8 203 211 9 228 240 10 260 271 11 287 297 12 315 333 13 353 365 Total # ePST segments = 13

  21. s# i char pos neg odds tot win maxwin region s 0 A -1.87 -708.40 706.52 706.52 706.52 0.00 - s 1 P -2.96 -708.40 705.44 1411.96 1411.96 0.00 - s 2 A -1.87 -708.40 706.52 2118.48 2118.48 0.00 - s 3 V -0.75 -708.40 707.64 2826.13 2826.13 0.00 - s 4 A -1.80 -708.40 706.60 3532.72 3532.72 0.00 - s 5 D -6.47 -708.40 701.92 4234.65 4234.65 0.00 - s 6 K -3.53 -708.40 704.87 4939.52 4939.52 0.00 - s 7 A -3.40 -708.40 705.00 5644.51 5644.51 0.00 - s 8 D -6.47 -708.40 701.92 6346.43 6346.43 0.00 - s 9 N -5.22 -708.40 703.18 7049.61 7049.61 0.00 - s 10 A -1.87 -708.40 706.52 7756.14 7756.14 0.00 - s 11 F -3.91 -708.40 704.49 8460.63 8460.63 0.00 - s 12 M -3.76 -708.40 704.63 9165.26 9165.26 0.00 - s 13 M -3.76 -708.40 704.63 9869.89 9869.89 0.00 - s 14 I -2.06 -708.40 706.34 10576.23 10576.23 0.00 - s 15 C -4.54 -708.40 703.86 11280.08 10573.56 10573.56 - s 16 T -2.71 -708.40 705.69 11985.77 10573.81 10573.81 - s 17 A -2.48 -708.40 705.91 12691.68 10573.20 10573.81 - s 18 L -4.01 -708.40 704.38 13396.07 10569.94 10573.81 - s 19 V -1.29 -708.40 707.11 14103.18 10570.45 10573.81 - s 20 L -0.59 -708.40 707.81 14810.99 10576.34 10576.34 - s 21 F -1.12 -708.40 707.28 15518.26 10578.75 10578.75 + s 22 M -3.76 -708.40 704.63 16222.90 10578.39 10578.75 + s 23 T -3.12 -708.40 705.27 16928.17 10581.74 10581.74 + s 24 I -0.87 -708.40 707.52 17635.69 10586.08 10586.08 + s 25 P -0.51 -708.40 707.89 18343.58 10587.44 10587.44 + s 26 G -2.25 -708.40 706.15 19049.73 10589.11 10589.11 + s 27 I -1.49 -708.40 706.91 19756.64 10591.38 10591.38 + s 28 A -1.54 -708.40 706.85 20463.50 10593.61 10593.61 + s 29 L -4.01 -708.40 704.38 21167.88 10591.65 10593.61 + s 30 F -1.92 -708.40 706.48 21874.36 10594.27 10594.27 + s 31 Y -6.07 -708.40 702.33 22576.69 10590.91 10594.27 + s 32 G -2.25 -708.40 706.15 23282.84 10591.15 10594.27 + s 33 G -4.38 -708.40 704.02 23986.86 10590.79 10594.27 + s 34 L -1.54 -708.40 706.85 24693.71 10590.53 10594.27 + s 35 I -2.06 -708.40 706.34 25400.05 10589.06 10594.27 + s 36 R -2.75 -708.40 705.65 26105.70 10587.43 10594.27 + s 37 G -2.25 -708.40 706.15 26811.85 10588.95 10594.27 + ePST Output

  22. Training Set ePST ePST Prediction Post-processing Scripts TM# Start End 1 12 24 2 50 61 3 101 112 4 130 142 5 163 166 6 168 175 7 199 201 8 203 211 9 228 240 10 260 271 11 287 297 12 315 333 13 353 365 Total # segments predicted by ePST = 13 Testing Set ePST Execution Flow

  23. HMMer Results for both.fasta

  24. HMMer vs. ePST

  25. ePST

  26. Cross-validation (5 folds) - ePST

  27. TMHMM and ePST

  28. Scanning PDB • Training: DMTMR40672 • Testing: PDB • Threshold 705.37->Nrtm=1665 chains • PDB_TM retrieves 1673 chains • Validation necessary – lack of ground truth

  29. TMH Benchmark • tmeval.fasta: 2247 non-annotated sequences • Script for converting ePST output to TMH submit format • Comparison with other predictors • 4 tables • 8 evaluation parameters

  30. Window 25, 35, T 10584 - High Resolution

  31. Window 25, 35, T 10584 - Low Resolution

  32. Window 15, T 10588 – High Resolution

  33. Window 15, T 10588 – Low Resolution

  34. Window 15, T 10588 – False Positives

  35. Window 15, T 10588 – Confusion with Signal Peptides

  36. Conclusions • ePST competitive predictor • Fast training • Scales well in contrast with HMMs • ePST does not suffer from a poor local minimum as HMMs • ePST does not require MSA of the sequences • ePST allows more than one test sequence at a time

  37. Future Work • More tuning, use pruning • Applications to other tasks (phosphorylation) involved in signal transduction pathways • Search for a verified data set for training and testing (no consensus in the literature) • Extract features from the sequence • Analyze the false negatives with particular helix topologies (such as 1orq)

More Related