y j lee o l mangasarian w h wolberg
Download
Skip this Video
Download Presentation
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Loading in 2 Seconds...

play fullscreen
1 / 23

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg - PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University. Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg. Data Mining Institute University of Wisconsin - Madison. Second Annual Review June 1, 2001.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg' - kessie-garrett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
y j lee o l mangasarian w h wolberg

Survival-Time Classification of Breast Cancer PatientsDIMACS Workshop on Data Mining and Scalable AlgorithmsAugust 22-24, 2001- Rutgers University

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Data Mining Institute

University of Wisconsin - Madison

Second Annual Review

June 1, 2001

american cancer society 2001 breast cancer estimates
American Cancer Society2001 Breast Cancer Estimates
  • Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)
  • 192,200 new cases of breast cancer in women will be diagnosed in the United States
  • 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States
  • According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide
key objective

Main Difficulty: Cannot carry out comparative

tests on human subjects

  • Our Approach: Classify patients into:

Good,Intermediate& Poor groups

  • Classification based on: 5 cytological features

plus Tumor size

  • Classification criteria: Tumor size & Lymph

node status

Key Objective
  • Identify breast cancer patients for whom adjuvant

chemotherapy prolongs survival time

  • Similar patients must be treated similarly
principal results for 253 breast cancer patients
Principal ResultsFor 253 Breast Cancer Patients
  • All 69 patients in the Good group:
    • Had the best survival rate
    • Had no chemotherapy
  • All 73 patients in the Poor group:
    • Had the worst survival rate
    • Hadchemotherapy
  • For the 121 patients in the Intermediate group:
    • The 67 patients who had chemotherapy had better survival rate than:
    • The 44 patients who did not have chemotherapy
  • Last result reverses role of chemotherapy for both the overall population as well as the Good & Poor groups
outline
Outline
  • Tools used
    • Support vector machines (SVMs).
      • Feature selection
      • Classification
    • Clustering
      • k-Median (k-Mean fails!)
  • Cluster chemo patients into chemo-good & chemo-poor
  • Cluster no-chemo patients into no-chemo-good & no-chemo-poor
  • Three final classes
    • Good = No-chemo good
    • Poor = Chemo poor
    • Intermediate = Remaining patients
  • Generate survival curves for three classes
  • Use SVM to classify new patients into one of above three classes
support vector machines used in this work

Feature selection: SVM with 1-norm approach,

min

s. t.

,

where

, denotes Lymph node > 0 or

Lymph node =0

  • 5 out 30 cytological features describenuclear size,

shape and texture

Support Vector Machines Used in this Work
  • 6 out of 31 features selected:
  • Tumor size
  • Classification:Use SSVMs with Gaussian kernel
clustering in data mining
Clustering in Data Mining

General Objective

  • Given:A dataset ofm points in n-dimensional real space
  • Problem:Extract hidden distinct properties by clustering

the dataset

concave minimization formulation of clustering problem

of m points in

  • Given:Set

represented by the matrix

,and a number

of desired clusters

,

in

such

  • Problem:Determine centers

that the sum of the minima over

of the

1-norm distance between each point

,

,

,

and cluster centers

is minimized

linear functions, hence

  • Objective:Sum ofm minima of

it ispiecewise-linear concave

  • Difficulty:Minimizing a general piecewise-linear concave

function over a polyhedral set is NP-hard

Concave Minimization Formulationof Clustering Problem
clustering via concave minimization

Minimize thesum of 1-norm distances between each data

and the closest cluster center

point

:

min

min

s.t.

  • Reformulation:

min

s.t.

Clustering via Concave Minimization
finite k median clustering algorithm minimizing piecewise linear concave function

Step 1 (Cluster Assignment): Assign points to the cluster with

the nearest cluster center in 1-norm

Step 2 (Center Update) Recompute location of center for each

cluster as the cluster median (closest point to all cluster

points in 1-norm)

Step3 (Stopping Criterion) Stop if the cluster centers are

unchanged,else go toStep 1

Finite K-Median Clustering Algorithm(Minimizing Piecewise-linear Concave Function)

Step 0 (Initialization): Givenkinitial cluster centers

  • Different initial centers will lead to different clusters
clustering process feature selection initial cluster centers

)

  • 6 out of 31 features selected by a linear SVM (
  • SVM separating lymph node positive (Lymph > 0)

from lymph node negative (Lymph = 0)

  • Poor1: Patients with Lymph > 4 OR Tumor
Clustering Process: Feature Selection & Initial Cluster Centers
  • Perform k-Median algorithm in 6-dimensional feature space
  • Initial cluster centers used: Medians of Good1 & Poor1
  • Good1: Patients with Lymph = 0AND Tumor < 2
  • Typical indicator for chemotherapy
clustering process

Poor1:

Lymph>=5 OR Tumor>=4

Compute Median Using 6 Features

Good1:

Lymph=0 AND Tumor<2

Compute Median Using 6 Features

Compute Initial

Cluster Centers

Cluster 113 NoChemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

Cluster 140 Chemo Patients

Use k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

44 NoChemo Poor

67 Chemo Good

73 Chemo Poor

69 NoChemo Good

Poor

Intermediate

Good

Clustering Process

253 Patients

(113 NoChemo, 140 Chemo)

nonlinear svm classifier 82 7 tenfold test correctness

Four groups from the clustering result:

Intermediate

(NoChemoPoor)

Intermediate

(ChemoGood)

Good

Poor

SVM

Poor2:

NoChemoPoor & Poor

Good2:

Good & ChemoGood

Compute

LI(x) & CI(x)

Compute

LI(x) & CI(x)

SVM

SVM

Poor

Intermediate

Good

Intermediate

Nonlinear SVM Classifier82.7% Tenfold Test Correctness
conclusion

Used five feature from a fine needle aspirate & tumor size

to cluster breast cancer patients into 3 groups:

  • First categorization of a breast cancer group for which

chemotherapy enhances longevity

  • Prescribe a SVM classification procedure to classify new

patients into one of above three groups

Conclusion
  • Good–No chemotherapy recommended
  • Intermediate– Chemotherapy likely to prolong survival
  • Poor – Chemotherapy may or may not enhance survival
  • 3 groups have very distinct survival curves
key objective1
Key Objective
  • Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time
  • Main Difficulty: Cannot carry out comparative tests on human subjects
    • Similar patients must be treated similarly
  • Our Approach: Classify patients into: good, intermediate & poor groups
    • Characterize classes by: Tumor size & lymph node status
    • Classification based on: 5 cytological features plus tumor size
clustering process feature selection initial cluster centers1
Clustering Process: Feature Selection & Initial Cluster Centers
  • 6 out of 31 features selected by a linear SVM
    • SVM separating lymph node positive (Lymph>0) from lymph node negative (Lymph=0)
  • Clustering performed in 6-dimensional feature space
  • Initial cluster centers used:
      • Good: Median in 6-dimensional space of patients with Lymph=0 AND Tumor <2
    • Poor: Median in 6-dimensional space of patients with of Lymph>4 OR Tumor >4
      • Typical indicator for chemotherapy
conclusion1
Conclusion
  • By using five features from a fine needle aspirate & tumor size, breast cancer patients can be classified into 3 classes
    • Good – Requiring no chemotherapy
    • Intermediate – Chemotherapy recommended for longer survival
    • Poor – Chemotherapy may or may not enhance survival
  • 3 classes have very distinct survival curves
  • First categorization of a breast cancer group for which chemotherapy enhances longevity
ad