Y j lee o l mangasarian w h wolberg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg PowerPoint PPT Presentation


  • 38 Views
  • Uploaded on
  • Presentation posted in: General

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University. Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg. Data Mining Institute University of Wisconsin - Madison. Second Annual Review June 1, 2001.

Download Presentation

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Survival-Time Classification of Breast Cancer PatientsDIMACS Workshop on Data Mining and Scalable AlgorithmsAugust 22-24, 2001- Rutgers University

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Data Mining Institute

University of Wisconsin - Madison

Second Annual Review

June 1, 2001


American Cancer Society2001 Breast Cancer Estimates

  • Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)

  • 192,200 new cases of breast cancer in women will be diagnosed in the United States

  • 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States

  • According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide


  • Main Difficulty: Cannot carry out comparative

tests on human subjects

  • Our Approach: Classify patients into:

Good,Intermediate& Poor groups

  • Classification based on: 5 cytological features

plus Tumor size

  • Classification criteria: Tumor size & Lymph

node status

Key Objective

  • Identify breast cancer patients for whom adjuvant

    chemotherapy prolongs survival time

  • Similar patients must be treated similarly


Principal ResultsFor 253 Breast Cancer Patients

  • All 69 patients in the Good group:

    • Had the best survival rate

    • Had no chemotherapy

  • All 73 patients in the Poor group:

    • Had the worst survival rate

    • Hadchemotherapy

  • For the 121 patients in the Intermediate group:

    • The 67 patients who had chemotherapy had better survival rate than:

    • The 44 patients who did not have chemotherapy

  • Last result reverses role of chemotherapy for both the overall population as well as the Good & Poor groups


Outline

  • Tools used

    • Support vector machines (SVMs).

      • Feature selection

      • Classification

    • Clustering

      • k-Median (k-Mean fails!)

  • Cluster chemo patients into chemo-good & chemo-poor

  • Cluster no-chemo patients into no-chemo-good & no-chemo-poor

  • Three final classes

    • Good = No-chemo good

    • Poor = Chemo poor

    • Intermediate = Remaining patients

  • Generate survival curves for three classes

  • Use SVM to classify new patients into one of above three classes


  • Feature selection: SVM with 1-norm approach,

min

s. t.

,

where

, denotes Lymph node > 0 or

Lymph node =0

  • 5 out 30 cytological features describenuclear size,

shape and texture

Support Vector Machines Used in this Work

  • 6 out of 31 features selected:

  • Tumor size

  • Classification:Use SSVMs with Gaussian kernel


Clustering in Data Mining

General Objective

  • Given:A dataset ofm points in n-dimensional real space

  • Problem:Extract hidden distinct properties by clustering

    the dataset


of m points in

  • Given:Set

represented by the matrix

,and a number

of desired clusters

,

in

such

  • Problem:Determine centers

that the sum of the minima over

of the

1-norm distance between each point

,

,

,

and cluster centers

is minimized

linear functions, hence

  • Objective:Sum ofm minima of

it ispiecewise-linear concave

  • Difficulty:Minimizing a general piecewise-linear concave

function over a polyhedral set is NP-hard

Concave Minimization Formulationof Clustering Problem


  • Minimize thesum of 1-norm distances between each data

and the closest cluster center

point

:

min

min

s.t.

  • Reformulation:

min

s.t.

Clustering via Concave Minimization


Step 1 (Cluster Assignment): Assign points to the cluster with

the nearest cluster center in 1-norm

Step 2 (Center Update) Recompute location of center for each

cluster as the cluster median (closest point to all cluster

points in 1-norm)

Step3 (Stopping Criterion) Stop if the cluster centers are

unchanged,else go toStep 1

Finite K-Median Clustering Algorithm(Minimizing Piecewise-linear Concave Function)

Step 0 (Initialization): Givenkinitial cluster centers

  • Different initial centers will lead to different clusters


)

  • 6 out of 31 features selected by a linear SVM (

  • SVM separating lymph node positive (Lymph > 0)

from lymph node negative (Lymph = 0)

  • Poor1: Patients with Lymph > 4 OR Tumor

Clustering Process: Feature Selection & Initial Cluster Centers

  • Perform k-Median algorithm in 6-dimensional feature space

  • Initial cluster centers used: Medians of Good1 & Poor1

  • Good1: Patients with Lymph = 0AND Tumor < 2

  • Typical indicator for chemotherapy


Poor1:

Lymph>=5 OR Tumor>=4

Compute Median Using 6 Features

Good1:

Lymph=0 AND Tumor<2

Compute Median Using 6 Features

Compute Initial

Cluster Centers

Cluster 113 NoChemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

Cluster 140 Chemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

44 NoChemo Poor

67 Chemo Good

73 Chemo Poor

69 NoChemo Good

Poor

Intermediate

Good

Clustering Process

253 Patients

(113 NoChemo, 140 Chemo)


Survival Curves forGood, Intermediate& Poor Groups


Survival Curves for Intermediate Group:Split by Chemo & NoChemo


Survival Curves for All PatientsSplit by Chemo & NoChemo


Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy


Survival Curves for All PatientsSplit by Lymph Node Positive & Negative


Four groups from the clustering result:

Intermediate

(NoChemoPoor)

Intermediate

(ChemoGood)

Good

Poor

SVM

Poor2:

NoChemoPoor & Poor

Good2:

Good & ChemoGood

Compute

LI(x) & CI(x)

Compute

LI(x) & CI(x)

SVM

SVM

Poor

Intermediate

Good

Intermediate

Nonlinear SVM Classifier82.7% Tenfold Test Correctness


  • Used five feature from a fine needle aspirate & tumor size

to cluster breast cancer patients into 3 groups:

  • First categorization of a breast cancer group for which

chemotherapy enhances longevity

  • Prescribe a SVM classification procedure to classify new

patients into one of above three groups

Conclusion

  • Good–No chemotherapy recommended

  • Intermediate– Chemotherapy likely to prolong survival

  • Poor – Chemotherapy may or may not enhance survival

  • 3 groups have very distinct survival curves


Simplest Support Vector MachineLinear Surface Maximizing the Margin

A+

A-


Key Objective

  • Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time

  • Main Difficulty: Cannot carry out comparative tests on human subjects

    • Similar patients must be treated similarly

  • Our Approach: Classify patients into: good, intermediate & poor groups

    • Characterize classes by: Tumor size & lymph node status

    • Classification based on: 5 cytological features plus tumor size


Clustering Process: Feature Selection & Initial Cluster Centers

  • 6 out of 31 features selected by a linear SVM

    • SVM separating lymph node positive (Lymph>0) from lymph node negative (Lymph=0)

  • Clustering performed in 6-dimensional feature space

  • Initial cluster centers used:

    • Good: Median in 6-dimensional space of patients with Lymph=0 AND Tumor <2

  • Poor: Median in 6-dimensional space of patients with of Lymph>4 OR Tumor >4

    • Typical indicator for chemotherapy


Conclusion

  • By using five features from a fine needle aspirate & tumor size, breast cancer patients can be classified into 3 classes

    • Good – Requiring no chemotherapy

    • Intermediate – Chemotherapy recommended for longer survival

    • Poor – Chemotherapy may or may not enhance survival

  • 3 classes have very distinct survival curves

  • First categorization of a breast cancer group for which chemotherapy enhances longevity


  • Login