1 / 48

Feature Identification for Colon Tumor Classification

Feature Identification for Colon Tumor Classification. UCI Interdisciplinary Computational and Applied Mathematics Program Representative: Anthony Hou. Joint Work with Melody Lim, Janine Chua, Natalie Congdon Faculty Advisors: Dr. Fred Park, Dr. Ernie Esser , and Anna Konstorum.

donald
Download Presentation

Feature Identification for Colon Tumor Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Identification for Colon Tumor Classification UCI Interdisciplinary Computational and Applied Mathematics Program Representative: Anthony Hou Joint Work with Melody Lim, Janine Chua, Natalie Congdon Faculty Advisors: Dr. Fred Park, Dr. Ernie Esser, and Anna Konstorum

  2. Problem Statement Tumor spheroids Control Chemical Added

  3. Biological Background • Hepatocyte Growth Factor (HGF) has been shown to be increased in colon tumor microenvironment (in vivo) • Increased HGF is correlated with increased growth & dispersiveness Tumor spheroids Control +HGF

  4. Experimental Approach • Data obtained from the Laboratory of Dr. Marian Waterman, in the Department of Microbiology at UC Irvine • Cell line used: primary, ‘colon cancer initiating cells’ (CCICs) • Cultured CCICs trypsinized and spun down

  5. Experimental Approach (cont.) • Single cells plated in 96 well ultra-low attachment plates with DMEM, supplement, and with or without HGF at various concentrations • CCICs imaged at 10x resolution once aday for 12 days Spheroid grown in media + 50ng/ml HGF, day 8

  6. Our Motivational Goal • Having a set of data, biologists can see the qualitative effect when the concentration of HGF is high and when the concentration of HGF is low. • We want to find the feature(s) that can discriminate between a tumor spheroid that has high and low concentrations of HGF. • We hope this discovery can indicate which features are useful in helping biologists measure the amount of HGF in a certain colon tumor spheroid

  7. Image Processing/Computer Vision Background • Classification • We humans have an innate ability to learn to identify one object from another

  8. Now, how can we automate this process with respect to biological images? Control +HGF

  9. Classification Approach • Image Processing • Mathematical features • Shape features: Area, Perimeter/Area, Circularity Ratio, Texture features: Total Variation/Area, Average Intensity, Eccentricity • Why these 6 features? • Given feature: Day • Fisher’s Linear Discriminant (FLD) Classification

  10. Processing Data Raw +HGF tumor Binary image with boundary applied Segmented +HGF tumor Boundary of +HGF tumor Thresholdedbinary image

  11. Shape Information HGF Binary • Features from Given Shape • Area • Perimeter/Area • Circularity Ratio • Eccentricity

  12. Image Information HGF Segmented Features from Given Image • Total Variation • Average Intensity

  13. Classification <V1,V2, …Vn> Tumor gets mapped to feature vectors, which get mapped to points in high dimensional space. Now how do we separate the 2 groups?

  14. Fisher’s Linear Discriminant • Describe mapping • Fisher’s Linear Discriminant: maximize ratio of inter-class variance to intra-class variance

  15. Project Overview • Develop classification scheme for colon tumor spheroids grown in media with and without HGF • Broader goal is to obtain quantitative understanding of HGF action on tumor spheroids. • Feature vectors can be utilized to quantify HGF action on tissue growth in vitro.

  16. Results • Ran FLD code on 6 features: Area, Circularity Ratio, Average Intensity, Eccentricity, Perimeter/Area, TV/Area • Train on half the data • Repeated Random Sub-sampling Cross Validation was used on all tests

  17. Results • Ran FLD code on 6 features: Area, Circularity Ratio, Average Intensity, Eccentricity, Perimeter/Area, TV/Area • Percent Correct for Control: 91.50% • Percent Correct for +HGF: 90.99%

  18. Results: Adding Day • Good results, but our goal is to maximize percentage correct, so included time (day) • Features used: Area, Perimeter/Area, TV/Area, Eccentricity, Average Intensity, Circularity Ratio, Day • Observed some tumors similar in shape and size, so we needed a descriptor to separate those. Caused by larger control tumor from later phase having similar area & perimeter to earlier-stage HGF tumor.

  19. Results: Adding Day • Good results, but our goal is to maximize percentage correct, so included time (day) • Features used: Area, Perimeter/Area, TV/Area, Eccentricity, Average Intensity, Circularity Ratio, Day • Observed some tumors similar in shape and size, so we needed a descriptor to separate those. Caused by larger control tumor from later phase having similar area & perimeter to earlier-stage HGF tumor. Percent Correct for Control: 98.88% Percent Correct for +HGF: 100%

  20. Next Approach • Excellent results, but curious to see if same results can be obtained using less features • Plot all separately to get an idea of their individual classifying potential

  21. Area Control=blue HGF=red Due to area differences between tumors from control and +HGF

  22. Circularity Ratio Description • C1 = (Area of a shape)/(Area of circle) where circle has the same perimeter as shape

  23. Circularity Ratio Control=blue HGF=red Given data are relatively circular from both groups (control and +HGF)

  24. Average Intensity Description • Average Intensity: sum of the image intensities over the shape divided by area • Inversely related to density. • Smaller values indicate less light passing through, suggesting a denser object Control Day 8 (10x) +HGF 10ng/ml Day 11 (10x)

  25. Average Intensity Control=blue HGF=red • Control Group is similar in Average Intensity, whereas +HGFs are denser • Not all are very dense, so there are some overlap with controls

  26. Eccentricity Description • Measure of elongation of an object

  27. Eccentricity Control=blue HGF=red Due to most tumors from both groups being circular except for a few outliers

  28. Perimeter to Area Ratio • Why Normalize Perimeter by Area? • We do so because a small, jagged object may have the same area as a large, circular object. Thus, we divide by area, creating a more effective classifier.

  29. Perimeter to Area Ratio Control=blue HGF=red This is to be expected because the +HGF tumor spheroids have more dispersion, resulting in greater area, in contrast to the control tumor spheroids.

  30. Total Variation to Area Ratio Description • At every point, estimate its gradient (difference in intensities in x and y direction). Use discretization of Total Variation. Also normalized by area. • Texture Control Day 11 (10x) +HGF 10ng/ml Day 12 (10x)

  31. Total Variation to Area Ratio Control=blue HGF=red Due to similar densities/intensities in tumors from both groups

  32. Intuition Through Trial and Error • Given the individual results, we combined the two strongest features, area and perimeter/area, and plot them both using a scatter plot

  33. Area vs. Perimeter/Area Control=blue HGF=red

  34. Results • We obtained reasonably accurate results, having only two controls on the +HGF side if we draw an imaginary line to separate the two groups • Ran FLD code on Area and Perimeter/Area

  35. Results • We obtained reasonably accurate results, having only two controls on the +HGF side if we draw an imaginary line to separate the two groups • Ran FLD code on Area and Perimeter/Area • Percent Correct for Control: 89.03% • Percent Correct for +HGF: 96.92%

  36. Evaluation • Reasonably decent results, but decided to add the feature Day

  37. Evaluation • Reasonably decent results, but decided to add the feature Day • Results: Area, Perimeter/Area, Day • Percent Correct for Control: 100% • Percent Correct for +HGF: 100%

  38. “Bad” Features • Plotting graphs of “good” features and running FLD showed how strong those features really are. • Our first thoughts: Were the “good” features too strong that the “bad” features couldn’t exhibit their full potential as classifiers? • CR, TV/Area, Average Intensity, Eccentricity

  39. Intuition • Decided to run FLD test to see if they perform better as a group by themselves • Results: CR, TV/Area, Average Intensity, Eccentricity

  40. Intuition • Results: CR, TV/Area, Average Intensity, Eccentricity • Percent Correct for Control: 75.33% • Percent Correct for HGF: 55.27% • Why?

  41. Final Thoughts • Our belief:“bad” features are not necessarily useless. • Data sets vary; some may include tumors with different textures, shapes, area, and so on • Our set of features are extremely versatile • After feature identification, features can be used to further pursue broader goals such as the quantification of a certain chemical’s effect on their tumors

  42. Conclusion • Effectiveness of area vector is obviously in accordance with biological hypothesis that HGF increases cellular mitosis rate, resulting in larger tumors. • Effectiveness of perimeter/area vector quantifies contiguous cell spread, supporting hypothesis stating HGF results in a spheroid with greater perimeter/area ratio. • Tried a lot of fancy ways, but turns out the strongest features were the simplest ones that also agreed with biologists’ intuition.

  43. Conclusion (cont.) • Including Day Vs. Not Including Day • Day + less features = better results • Less features (without day) = worse results • Use more features (without day) = good results; separation in high dimensions

  44. Future Goals • Develop methods to quantify cell spread for cells that are no longer attached to the tumor. • Develop an automated segmentation scheme • Occlusions • Existing strong methods worked, but needed more preprocessing +HGF 10ng/ml Day 13 (10x)

  45. Future Experiments • EXPERIMENT IDEA #1: • Run experiment w/ different concentrations of HGF • We want to quantify how HGF acts with respect to increasing concentration • Utilize developed feature vectors to classify images from different concentrations of HGF.

  46. Future Experiments • EXPERIMENT IDEA #2: • Stain spheroids for proteins associated with stem and differentiated cell compartments • Stains can be incorporated into new feature vectors to identify whether HGF-induced changes in stem / differentiated cell concentrations are significant enough to improve image classification.

  47. Acknowledgements • NSF • Professors Jack Xin, Hongkai Zhao, Sarah Eichorn • Advisors: Dr. Fred Park, Dr. Ernie Esser, and Anna Konstorum • Laboratory of Dr. Marian Waterman • Group: Janine Chua, Melody Lim, Natalie Congdon • MBI

  48. References [1] Thomas Brabletz, Andreas Jung, Simone Spaderna, Falk Hlubek, and Thomas Kirchner. Opinion: migrating cancer stem cells - an integrated concept of malignant tumour progression. Nat Rev Cancer, 5(9):744{749, Sep 2005. [2] Caroline Coghlin and Graeme I Murray. Current and emergingconcepts in tumourmetastasis. J Pathol, 222(1):1{15, Sep 2010. [3] A De Luca, M Gallo, D Aldinucci, D Ribatti, L Lamura, A D'Alessio, R De Filippi, A Pinto, and N Normanno. The role of the egfr ligand/receptor system in the secretion of angiogenicfactors in mesenchymal stem cells. J Cell Physiol, Dec 2010.

More Related