Breast Cancer Diagnosis via Linear Hyper-plane Classifier. Presented by Joseph Maalouf December 14, 2001. Breast Cancer is second only to lung cancer as a tumor-related cause of death in women .
Presented by Joseph Maalouf
December 14, 2001
Breast Cancer is second only to lung cancer as a tumor-related cause of death in women.
Although there exists reasonable agreement on the criteria for benign/malignant (Benign means that the lump or other problem was not cancer, and Malignant means that the tissue does contain cancer cells) diagnoses using fine needle aspirate (FNA) and mammogram data, the application of these criteria are often quite subjective and a time consuming task for the physician.
The idea of the project is to come up with a discriminant function (a separating plane in this case) to determine if an unknown sample is benign or malignant.
The project will use the Wisconsin Diagnosis Breast Cancer Database (WDBC) made publicly available by Dr.William H.Wolberg of the Department of Surgery of the University of Wisconsin Medical School.
There are 569 samples (357 benign, 212 malignant) with 32 attributes (patient ID, diagnosis type, 30 real-valued input features).
The most effective pair of the attributes in determining a correct diagnosis will be determined and used to plot all the testing set points on a two dimensional figure.Problem Description
Formally, given two sets B and M in the 30-dimensional real space R30, we wish to construct a discriminant function f, from R30 into R, such that:
f (x) > 0 => x M ,
f (x) 0 => x B,
Two approaches will be used to find the optimal hyper-plane:
Linear Optimization Problem Formulation:
The discriminant function f can be given by : f(x) = w’x - , determining a plane w’x = that separates, to the extent possible, malignant points from benign ones in R30.
It remains to show how to determine w R30 and R from the training data. If we let the sets of m points, M, be represented by a matrix M Rmxn and the set of k points, B, be represented by a matrix B Rkxn,Solution Methods
then the problem becomes one of choosing w and to:
min w, (1/m)*|| (-M*w + e* +e)+||1 + (1/k)*|| (B*w - e * +e)+||1
This can be implemented using matlaboptimization tool box lp.
The function f is given by : f(x) = w’x + b.
For xi being a support vector,
For xiM di = 1, f(xi) = w’*xi + b = |w| wo’* xi + bo = 1
For xiB di = 1, f(xi) = w’*xi + b = |w| wo’*xi + bo = 1
The objective is to find w and b such that (wo) = (wo’*wo)/2 is minimized subject to the constraints:
di * (wo’* xi + bo) 1, where 1 i N (number of support vectors)
To implement the quadratic optimization solution, I’ll use the OSU support
The linear optimization method determined a correct diagnosis with a success rate of 97.2 %.
The following plot shows a figure of one of the most effective pair of the attributes in determining a correct diagnosis.