1 / 16

A machine learning approach towards explaining tissue-specific somatic mutations in cancer

This thesis proposal aims to use machine learning to identify normal tissue characteristics associated with the mutation of specific genes in cancer counterparts. The goal is to understand the factors that predispose tissues to tissue-specific mutations, potentially leading to new ways of prevention and unexpected drug targets.

hatfield
Download Presentation

A machine learning approach towards explaining tissue-specific somatic mutations in cancer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A machine learning approach towards explaining tissue-specific somatic mutations in cancer Thesis Proposal Miguel Ángel Cortés Guzmán MCC – First Semester Principal advisor: Victor Manuel Treviño Alvarado

  2. What is the project about? • Identify potentially relevant normal tissue characteristics (features) contributing to the mutation of specific genes in the cancer counterparts of such normal tissues. • Certain genes become affected frequently only in certain cancers and the explanation behind that is not clear for the most part. • The hypothesis that we are trying to test that normal tissues are naturally predisposed to tissue-specific mutations due to characteristics inherent to them such as gene expression profiles, epigenetic profiles, pathway interactions profiles, etc. • We think machine learning can be useful for analysing these features and finding patterns liking them to tissue-specific cancer genes

  3. An example: the APC case, clinical fact but why?

  4. Why is knowing important? • Understanding what predisposes a specific tissue to acquire a specific mutation related to cancer, that can lead to new ways of prevention before full-blown cancer affects a patient. • Case example: aspirin • It was found it lowers colon cancer risk via some inflammation related mechanism, however this was only a “lucky” statistical find. What if we could find more unexpected drug targets like this? (Benetou, Lagiou & Lagiou, 2015).

  5. Previous attempts • A logical way of thinking would be “if gene x is highly mutated in cancer tissue y then the same gene must be have a significantly high expression in normal tissue y” • The latter sounds about right because we can assume that x is important for the normal functions of tissue x, however this isn’t that simple… APC normal tissue expression values, research suggests nonlinear Relationships of features like gene expression

  6. What is the proposed methodology?

  7. What has been done so far? • Scope of genes and tissue types to be considered (Kandoth et al, 2013)

  8. What has been done so far? • Standardizing of mutation frequencies to classify genes in positive, uncertain and negative cases. O = Matrix of observed mutations of each tissue type and each gene of interest E = Matrix of expected mutations of each tissue type and each gene m = vector of total mutations across all genes (of the human genome) for each tissue type. p = vector of protein product length of the genes of interest E E p

  9. For each column of the chi matrix we obtain:

  10. Unsupervised clustering of chi values • Initialize kmeans with 3 centroids for creating 3 clusters (positive, negative, uncertain) • One model is created per gene across the 33 cancer types: • We further determine that genes considered as specific will be those positive for at most 3 ancer types at a time (as with 4 positives we start seeing genes like TP53 being positive)

  11. Genes specific to only one type of cancer:

  12. Genes specific for two types of cancer:

  13. Genes specific for three types of cancer:

  14. What is next? • Construct database resembling this general structure based on the labels we found for the genes in the previous step: • The idea is to select features and construct models in a per gene basis:

  15. Currently…

  16. References • Bernard Weinstein, I & Case, Kathleen. (2008). The History of Cancer Research: Introducing an AACR Centennial Series. Cancer research. 68. 6861-2. 10.1158/0008-5472.CAN-08-2827. • Schaefer, M. H. & Serrano, L. (2016). Celltype. ScientificReports, 64(2), p. 10-12. • Benetou, V., Lagiou, A., & Lagiou, P. (2015). Chemoprevention of cancer: current evidence and future prospects. F1000Research, 4(F1000 Faculty Rev), 916. doi:10.12688/f1000research.6684.1 • Tiong, K. L., & Yeang, C. H. (2018). Explaining cancer type specific mutations with transcriptomic and epigenomic features in normal tissues. Scientific reports, 8(1), 11456. doi:10.1038/s41598-018-29861-1 • Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. NatureMethods64 (2), p. 10-12. • Pirooznia, M., Yang, J. Y., Yang, M. Q., & Deng, Y. (2008). A comparative study of different machine learning methods on microarray gene expression data. BMC genomics, 9 Suppl 1(Suppl 1), S13. doi:10.1186/1471-2164-9-S1-S13 • Kandoth, C., McLellan, M. D., Vandin, F., Ye, K., Niu, B., Lu, C., et al.(2013). Mutationallandscape and significanceacross 12 majorcancertypes. Nature64 (2), p. 10-12.

More Related