1 / 35

Capstone Project Presentation

Capstone Project Presentation. Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD. Predicting Deleterious Mutations. Deleterious “Hurtful or injurious to life or health; noxious” (Oxford English Dictionary)

Download Presentation

Capstone Project Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD Stuart Young

  2. Predicting Deleterious Mutations Deleterious “Hurtful or injurious to life or health; noxious” (Oxford English Dictionary) “Tis pity wine should be so deleterious, For tea and coffee leave us much more serious.” (BYRONJuan IV, 1821) Stuart Young

  3. Predicting Deleterious Mutations SNPs What is an SNP (single nucleotide polymorphism)? Why are SNPs important? Some SNPs are nonsynonymous The molecular effects of SNPs vary widely Stuart Young

  4. Predicting Deleterious Mutations MOTIVATION Improve on the existing deleterious prediction methods Use protein sequence, evolution and structure data combined with machine learning to identify potentially disease-causing SNPs Stuart Young

  5. Predicting Deleterious Mutations SNP data is increasingly available Over 40 major online databases dbSNP is the primary SNP database (contains 5,000,000+ validated human SNPs) Many databases contain potentially disease-causing SNPs related to a particular disease Stuart Young

  6. Predicting Deleterious Mutations Deleterious effects of mutations on proteins Function Stability Expression Protein-Protein Interactions Stuart Young

  7. Current Classification Tools Sequence Approaches • BLOSUM62 • An amino acid substitution score matrix • SIFT • Collects sequence homologues in multiple alignments and identifies non-conservative changes in amino acids • Ng P & Henikoff S, 'Predicting Deleterious Amino Acid Substitutions‘. Genome Research, 2001, 11:863-874. Stuart Young

  8. Current Classification Tools • Structural Approaches • Expert rules • Uses evolutionary and structural data • Sunyaev et al, 'Prediction of deleterious human alleles‘. Human Molecular Genetics, 2001, Vol. 10, No. 6, 593. • Decision Trees • Improved performance based on sequence and structural data • Produces intuitive rules Stuart Young

  9. Our foundation for the project Saunders CT & Baker D ‘Evaluation of Structural and Evolutionary Contributions to Deleterious Mutation Prediction’ J. Mol. Biol. (2002) 322, 891–901 Structural and evolutionary features Trained classifiers based on two data sets - experimental mutations and human alleles Stuart Young

  10. Predicting Deleterious Mutations • S & B - Training Sets • Experimental mutations (~5,000) • HIV-1 protease • E. Coli Lac repressor • T4 Lysozyme • Human alleles (~350 mutations) • 103 ‘hot’ human genes Stuart Young

  11. Predicting Deleterious Mutations • Why two training sets? • Unbiased human data is hard to get: • Many disease-associated mutations are discovered through genetics association studies and may not be causative (i.e., only linked with the causative allele) • Effect of mutations is hard to measure • Experimental ‘whole gene mutagenesis’ data is used considered ‘unbiased’ Stuart Young

  12. Predicting Deleterious Mutations • Features used in S&B Study • SIFT • SIFT + Solvent Accessibility(SA) • SIFT + normalized B-factor • SIFT + Sunyaev expert rules • SIFT + SA + B-factor Stuart Young

  13. Predicting Deleterious Mutations • Hypothesis • Can we improve on the results of Saunders and Baker by using more structural and sequence properties? Stuart Young

  14. Predicting Deleterious Mutations • Experimental Design • Classification algorithm • Decision Trees • Support Vector • Neural Nets • Additional Features • Amino acid relative frequencies • Additional structural properties Stuart Young

  15. Predicting Deleterious Mutations • Structural Property Values • Russ Altman (Stanford) developed a vector representation of protein structural sites • Spheres (1.875Å→7.5Å)centered on C-alpha atom of the mutation position • 66 features • Atom/residue counts within sphere and other features, e.g.: • Solubility • Solvent accessibility Stuart Young

  16. Predicting Deleterious Mutations • Amino Acid Windows • AA frequencies within a window on either side of the mutation position • 20 AAs = 20 features • LEFT and RIGHT → 40 features Stuart Young

  17. Predicting Deleterious Mutations • Amino Acid Windows Stuart Young

  18. Predicting Deleterious Mutations • Tools • Databases • PDB - Protein structure data • S-BLEST - Structural features • Software • Perl 5.8.0 • Matlab (NN, PRTools(DT), SVC) Stuart Young

  19. Predicting Deleterious Mutations • List of Features Used • BLOSUM62, disorder, secondary structure, molecular weight • Grouped amino acid frequency windows of varying widths • SIFT • S-BLEST (vector contains four sub-shells spreading outward from site) • Solvent accessibility (C-beta density, i.e., the number of C-beta atoms around the site) Stuart Young

  20. Predicting Deleterious Mutations Comparison with S&B Results Stuart Young

  21. Predicting Deleterious Mutations • 1. Human Data Set • Human allele dataset as train and test set • Ensembles of decision trees for classification • 20-fold cross validation • Progressively added features to see their affect on performance • Because structural data was not available for all mutation sites, we used a subset of the original Saunders and Baker training set Stuart Young

  22. Predicting Deleterious Mutations • Best Features Stuart Young

  23. Predicting Deleterious Mutations • 1. Experimental Data Set • Same as human data set but using experimental mutations for training and testing Stuart Young

  24. Predicting Deleterious Mutations • Evaluation of S-BLEST Using a Random Subset of the Experimental Training Set Stuart Young

  25. Predicting Deleterious Mutations • 3. Cross-classification • Used the same features described above • Trained on one dataset and tested on the other: • Human to experimental • Experimental to human • Experimental gene to exp. gene Stuart Young

  26. Predicting Deleterious Mutations Stuart Young

  27. Predicting Deleterious Mutations Stuart Young

  28. Predicting Deleterious Mutations Stuart Young

  29. Predicting Deleterious Mutations Stuart Young

  30. Predicting Deleterious Mutations • Summary of Results • Human data set • 80% accuracy (up from 70%) • Experimental data set • 87% accuracy (up from 79.5%) Stuart Young

  31. Predicting Deleterious Mutations • Conclusion • Prediction tools CAN identify deleterious mutations • We believe that further study is warranted to identify over-fitted classifiers to further improve classification accuracy on real world data Stuart Young

  32. Acknowledgements People Andrew Campen (CCBB IT, IUPUI) Brandon Peters (CCBB, IUPUI) Haixu Tang (Capstone Coordinator, IUB) Funding This work was funded by a grant from the Showalter Trust (Sean Mooney, PI), INGEN, and a IUPUI McNair Scholarship. The Indiana Genomics Initiative (INGEN) Indiana University is supported in part by Lilly Endowment Inc. Stuart Young

  33. Predicting Deleterious Mutations Thank You Stuart Young

  34. Predicting Deleterious Mutations Stuart Young

  35. Predicting Deleterious Mutations Stuart Young

More Related