Y j lee o l mangasarian w h wolberg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg PowerPoint PPT Presentation


  • 36 Views
  • Uploaded on
  • Presentation posted in: General

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University. Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg. Data Mining Institute University of Wisconsin - Madison. Second Annual Review June 1, 2001.

Download Presentation

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Y j lee o l mangasarian w h wolberg

Survival-Time Classification of Breast Cancer PatientsDIMACS Workshop on Data Mining and Scalable AlgorithmsAugust 22-24, 2001- Rutgers University

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Data Mining Institute

University of Wisconsin - Madison

Second Annual Review

June 1, 2001


American cancer society 2001 breast cancer estimates

American Cancer Society2001 Breast Cancer Estimates

  • Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)

  • 192,200 new cases of breast cancer in women will be diagnosed in the United States

  • 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States

  • According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide


Key objective

  • Main Difficulty: Cannot carry out comparative

tests on human subjects

  • Our Approach: Classify patients into:

Good,Intermediate& Poor groups

  • Classification based on: 5 cytological features

plus Tumor size

  • Classification criteria: Tumor size & Lymph

node status

Key Objective

  • Identify breast cancer patients for whom adjuvant

    chemotherapy prolongs survival time

  • Similar patients must be treated similarly


Principal results for 253 breast cancer patients

Principal ResultsFor 253 Breast Cancer Patients

  • All 69 patients in the Good group:

    • Had the best survival rate

    • Had no chemotherapy

  • All 73 patients in the Poor group:

    • Had the worst survival rate

    • Hadchemotherapy

  • For the 121 patients in the Intermediate group:

    • The 67 patients who had chemotherapy had better survival rate than:

    • The 44 patients who did not have chemotherapy

  • Last result reverses role of chemotherapy for both the overall population as well as the Good & Poor groups


Outline

Outline

  • Tools used

    • Support vector machines (SVMs).

      • Feature selection

      • Classification

    • Clustering

      • k-Median (k-Mean fails!)

  • Cluster chemo patients into chemo-good & chemo-poor

  • Cluster no-chemo patients into no-chemo-good & no-chemo-poor

  • Three final classes

    • Good = No-chemo good

    • Poor = Chemo poor

    • Intermediate = Remaining patients

  • Generate survival curves for three classes

  • Use SVM to classify new patients into one of above three classes


Support vector machines used in this work

  • Feature selection: SVM with 1-norm approach,

min

s. t.

,

where

, denotes Lymph node > 0 or

Lymph node =0

  • 5 out 30 cytological features describenuclear size,

shape and texture

Support Vector Machines Used in this Work

  • 6 out of 31 features selected:

  • Tumor size

  • Classification:Use SSVMs with Gaussian kernel


Clustering in data mining

Clustering in Data Mining

General Objective

  • Given:A dataset ofm points in n-dimensional real space

  • Problem:Extract hidden distinct properties by clustering

    the dataset


Concave minimization formulation of clustering problem

of m points in

  • Given:Set

represented by the matrix

,and a number

of desired clusters

,

in

such

  • Problem:Determine centers

that the sum of the minima over

of the

1-norm distance between each point

,

,

,

and cluster centers

is minimized

linear functions, hence

  • Objective:Sum ofm minima of

it ispiecewise-linear concave

  • Difficulty:Minimizing a general piecewise-linear concave

function over a polyhedral set is NP-hard

Concave Minimization Formulationof Clustering Problem


Clustering via concave minimization

  • Minimize thesum of 1-norm distances between each data

and the closest cluster center

point

:

min

min

s.t.

  • Reformulation:

min

s.t.

Clustering via Concave Minimization


Finite k median clustering algorithm minimizing piecewise linear concave function

Step 1 (Cluster Assignment): Assign points to the cluster with

the nearest cluster center in 1-norm

Step 2 (Center Update) Recompute location of center for each

cluster as the cluster median (closest point to all cluster

points in 1-norm)

Step3 (Stopping Criterion) Stop if the cluster centers are

unchanged,else go toStep 1

Finite K-Median Clustering Algorithm(Minimizing Piecewise-linear Concave Function)

Step 0 (Initialization): Givenkinitial cluster centers

  • Different initial centers will lead to different clusters


Clustering process feature selection initial cluster centers

)

  • 6 out of 31 features selected by a linear SVM (

  • SVM separating lymph node positive (Lymph > 0)

from lymph node negative (Lymph = 0)

  • Poor1: Patients with Lymph > 4 OR Tumor

Clustering Process: Feature Selection & Initial Cluster Centers

  • Perform k-Median algorithm in 6-dimensional feature space

  • Initial cluster centers used: Medians of Good1 & Poor1

  • Good1: Patients with Lymph = 0AND Tumor < 2

  • Typical indicator for chemotherapy


Clustering process

Poor1:

Lymph>=5 OR Tumor>=4

Compute Median Using 6 Features

Good1:

Lymph=0 AND Tumor<2

Compute Median Using 6 Features

Compute Initial

Cluster Centers

Cluster 113 NoChemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

Cluster 140 Chemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

44 NoChemo Poor

67 Chemo Good

73 Chemo Poor

69 NoChemo Good

Poor

Intermediate

Good

Clustering Process

253 Patients

(113 NoChemo, 140 Chemo)


Survival curves for good intermediate poor groups

Survival Curves forGood, Intermediate& Poor Groups


Survival curves for intermediate group split by chemo nochemo

Survival Curves for Intermediate Group:Split by Chemo & NoChemo


Survival curves for all patients split by chemo nochemo

Survival Curves for All PatientsSplit by Chemo & NoChemo


Survival curves for intermediate group split by lymph node chemotherapy

Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy


Survival curves for all patients split by lymph node positive negative

Survival Curves for All PatientsSplit by Lymph Node Positive & Negative


Nonlinear svm classifier 82 7 tenfold test correctness

Four groups from the clustering result:

Intermediate

(NoChemoPoor)

Intermediate

(ChemoGood)

Good

Poor

SVM

Poor2:

NoChemoPoor & Poor

Good2:

Good & ChemoGood

Compute

LI(x) & CI(x)

Compute

LI(x) & CI(x)

SVM

SVM

Poor

Intermediate

Good

Intermediate

Nonlinear SVM Classifier82.7% Tenfold Test Correctness


Conclusion

  • Used five feature from a fine needle aspirate & tumor size

to cluster breast cancer patients into 3 groups:

  • First categorization of a breast cancer group for which

chemotherapy enhances longevity

  • Prescribe a SVM classification procedure to classify new

patients into one of above three groups

Conclusion

  • Good–No chemotherapy recommended

  • Intermediate– Chemotherapy likely to prolong survival

  • Poor – Chemotherapy may or may not enhance survival

  • 3 groups have very distinct survival curves


Simplest support vector machine linear surface maximizing the margin

Simplest Support Vector MachineLinear Surface Maximizing the Margin

A+

A-


Key objective1

Key Objective

  • Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time

  • Main Difficulty: Cannot carry out comparative tests on human subjects

    • Similar patients must be treated similarly

  • Our Approach: Classify patients into: good, intermediate & poor groups

    • Characterize classes by: Tumor size & lymph node status

    • Classification based on: 5 cytological features plus tumor size


Clustering process feature selection initial cluster centers1

Clustering Process: Feature Selection & Initial Cluster Centers

  • 6 out of 31 features selected by a linear SVM

    • SVM separating lymph node positive (Lymph>0) from lymph node negative (Lymph=0)

  • Clustering performed in 6-dimensional feature space

  • Initial cluster centers used:

    • Good: Median in 6-dimensional space of patients with Lymph=0 AND Tumor <2

  • Poor: Median in 6-dimensional space of patients with of Lymph>4 OR Tumor >4

    • Typical indicator for chemotherapy


Conclusion1

Conclusion

  • By using five features from a fine needle aspirate & tumor size, breast cancer patients can be classified into 3 classes

    • Good – Requiring no chemotherapy

    • Intermediate – Chemotherapy recommended for longer survival

    • Poor – Chemotherapy may or may not enhance survival

  • 3 classes have very distinct survival curves

  • First categorization of a breast cancer group for which chemotherapy enhances longevity


  • Login