y j lee o l mangasarian w h wolberg n.
Skip this Video
Loading SlideShow in 5 Seconds..
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg PowerPoint Presentation
Download Presentation
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Loading in 2 Seconds...

play fullscreen
1 / 19

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg - PowerPoint PPT Presentation

  • Uploaded on

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University. Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg. Data Mining Institute University of Wisconsin - Madison. Second Annual Review June 1, 2001.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg' - hyman

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
y j lee o l mangasarian w h wolberg

Survival-Time Classification of Breast Cancer PatientsDIMACS Workshop on Data Mining and Scalable AlgorithmsAugust 22-24, 2001- Rutgers University

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Data Mining Institute

University of Wisconsin - Madison

Second Annual Review

June 1, 2001

american cancer society year 2001 breast cancer estimates
American Cancer SocietyYear 2001 Breast Cancer Estimates
  • Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)
  • 192,200 new cases of breast cancer in women will be diagnosed in the United States
  • 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States
  • According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide
key objective

Main Difficulty: Cannot carry out comparative

tests on human subjects

  • Our Approach: Classify patients into:

Good,Intermediate& Poor groups

  • Classification based on: 5 cytological features

plus tumor size

  • Classification criteria: Tumor size & lymph

node status

Key Objective
  • Identify breast cancer patients for whom adjuvant

chemotherapy prolongs survival time

  • Similar patients must be treated similarly
principal results for 253 breast cancer patients
Principal ResultsFor 253 Breast Cancer Patients
  • All 69 patients in the Good group:
    • Had the best survival rate
    • Had no chemotherapy
  • All 73 patients in the Poor group:
    • Had the worst survival rate
    • Hadchemotherapy
  • For the 121 patients in the Intermediate group:
    • The 67 patients who had chemotherapy had better survival rate than:
    • The 44 patients who did not have chemotherapy
  • Last result reverses chemotherapy role for overall population
    • Very useful for treatment prescription
  • Tools used
    • Support vector machines (SVMs).
      • Feature selection
      • Classification
    • Clustering
      • k-Median (k-Mean fails!)
  • Cluster chemo patients into chemo-good & chemo-poor
  • Cluster no-chemo patients into no-chemo-good & no-chemo-poor
  • Three final classes
    • Good = No-chemo good
    • Poor = Chemo poor
    • Intermediate = Remaining patients
  • Generate survival curves for three classes
  • Use SVM to classify new patients into one of above three classes
support vector machines used in this work

Feature selection: SVM with 1-norm approach,


s. t.



, denotes Lymph node > 0 or

Lymph node =0

  • 5 out 30 cytological features describenuclear size,

shape and texture

Support Vector Machines Used in this Work
  • 6 out of 31 features selected by SVM:
  • Tumor size from surgery
  • Classification:Use SSVMs with Gaussian kernel
clustering in data mining
Clustering in Data Mining

General Objective

  • Given:A dataset ofm points in n-dimensional real space
  • Problem:Extract hidden distinct properties by clustering

the dataset

concave minimization formulation of clustering problem

of mpoints in

  • Given:Set

represented by the matrix

,and a number

of desired clusters




  • Problem:Determine centers

that the sum of the minima over

of the

1-norm distance between each point




and cluster centers

is minimized

  • Objective Function:Sum ofm minima of

linear functions,

hence it ispiecewise-linear concave

  • Difficulty:Minimizing a general piecewise-linear concave

function over a polyhedral set is NP-hard

Concave Minimization Formulationof Clustering Problem
clustering via concave minimization

Minimize thesum of 1-norm distances between each data

and the closest cluster center






  • Bilinear reformulation:



Clustering via Concave Minimization
finite k median clustering algorithm minimizing piecewise linear concave function

Step 1 (Cluster Assignment): Assign points to the cluster with

the nearest cluster center in 1-norm

Step 2 (Center Update) Recompute location of center for each

cluster as the cluster median (closest point to all cluster

points in 1-norm)

Step3 (Stopping Criterion) Stop if the cluster centers are

unchanged,else go toStep 1

Finite K-Median Clustering Algorithm(Minimizing Piecewise-linear Concave Function)

Step 0 (Initialization): Givenkinitial cluster centers

  • Different initial centers will lead to different clusters
clustering process feature selection initial cluster centers


  • 6 out of 31 features selected by a linear SVM (
  • SVM separating lymph node positive (Lymph > 0)

from lymph node negative (Lymph = 0)

  • Poor1: Patients with Lymph > 4 OR Tumor
Clustering Process: Feature Selection & Initial Cluster Centers
  • Perform k-Median algorithm in 6-dimensional feature space
  • Initial cluster centers used: Medians of Good1 & Poor1
  • Good1: Patients with Lymph = 0AND Tumor < 2
  • Typical indicator for chemotherapy
clustering process


Lymph>=5 OR Tumor>=4

Compute Median Using 6 Features


Lymph=0 AND Tumor<2

Compute Median Using 6 Features

Compute Initial

Cluster Centers

Cluster 113 NoChemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

Cluster 140 Chemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

44 NoChemo Poor

67 Chemo Good

73 Chemo Poor

69 NoChemo Good




Clustering Process

253 Patients

(113 NoChemo, 140 Chemo)

nonlinear svm classifier 82 7 tenfold test correctness

Four groups from the clustering result:









NoChemoPoor & Poor


Good & ChemoGood


LI(x) & CI(x)


LI(x) & CI(x)







Nonlinear SVM Classifier82.7% Tenfold Test Correctness

Used five cytological features & tumor size to cluster

breast cancer patients into 3 groups:

  • First categorization of a breast cancer group for which

chemotherapy enhances longevity

  • SVM- based procedure assigns new patients into one of

above three survival groups

  • Good–No chemotherapy recommended
  • Intermediate– Chemotherapy likely to prolong survival
  • Poor – Chemotherapy may or may not enhance survival
  • 3 groups have very distinct survival curves