slide1
Download
Skip this Video
Download Presentation
Qianren (Tim) Xu

Loading in 2 Seconds...

play fullscreen
1 / 32

Qianren (Tim) Xu - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns. M.A.Sc. Candidate:. Qianren (Tim) Xu. Supervisors:. Dr. M. Kamel Dr. M. M. A. Salama. Neural Networks. STFS. ROC analysis. Highlight.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Qianren (Tim) Xu' - howe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns

M.A.Sc. Candidate:

Qianren (Tim) Xu

Supervisors:

Dr. M. Kamel

Dr. M. M. A. Salama

highlight
Neural Networks

STFS

ROC analysis

Highlight

Proteomic Pattern Analysis for Prostate Cancer Detection

Significance Test-Based Feature Selection (STFS):

  • STFS can be generally used for any problems of supervised pattern recognition
  • Very good performances have been obtained on several benchmark datasets, especially with a large number of features
  • Sensitivity 97.1%, Specificity 96.8%
  • Suggestion of mistaken label by prostatic biopsy
outline of part i
Outline of Part I

Significance Test-Based Feature Selection (STFS) on Supervised Pattern Recognition

  • Introduction
  • Methodology
  • Experiment Results on Benchmark Datasets
  • Comparison with MIFS
introduction
Introduction

Problems on Features

Increasing computational complexity

  • Large number
  • Irrelevant
  • Noise
  • Correlation

Reducing recognition rate

mutual information feature selection
Mutual Information Feature Selection
  • One of most important heuristic feature selection methods, it can be very useful in any classification systems.
  • But estimation of the mutual information is difficult:
  • Large number of features and the large number of classes
  • Continuous data
problems on feature selection methods
Problems on Feature Selection Methods

Two key issues:

  • Computational complexity
  • Optimal deficiency
proposed method
Proposed Method

Criterion of Feature Selection

Significance of feature

Significant difference

=

X

Independence

Pattern separabilityon individual candidate features

Noncorrelation betweencandidate feature and already-selected features

measurement of pattern separability of individual features
Measurement of Pattern Separability of Individual Features

Statistical Significant Difference

Continuous data with normal distribution

Continuous data with non-normal distribution

or rank data

Categorical data

Chi-squaretest

Two classes

More than two classes

Two classes

More than two classes

t-test

ANOVA

Mann-Whitneytest

Kruskal-Wallistest

independence
Independence

Independence

Continuous data with normal distribution

Continuous data with non-normal distribution

or rank data

Categorical data

Pearson contingency coefficient

Spearman rank correlation

Pearson correlation

selecting procedure
Selecting Procedure

MSDI: Maximum Significant Differenceand Independence Algorithm

MIC: Monotonically IncreasingCurve Strategy

maximum significant difference and independence msdi algorithm
Maximum Significant Difference and Independence (MSDI) Algorithm

Compute the significance difference (sd) of every initial features

Select the feature with maximum sd as the first feature

Computer the independence level (ind) between every candidate feature and the already-selected feature(s)

Select the feature with maximum feature significance (sf = sd x ind) as the new feature

monotonically increasing curve mic strategy
Monotonically Increasing Curve (MIC) Strategy

Performance Curve

The feature subset selected by MSDI

1

Plot performance curve

0.8

Rate of recognition

Delete the features that have “no good” contribution to the increasing of recognition

0.6

0.4

0

10

20

30

Number of features

Until the curve is monotonically increasing

example i handwritten digit recognition
Example I: Handwritten Digit Recognition
  • 32-by-32 bitmaps are divided into 8X8=64 blocks
  • The pixels in each block is counted
  • Thus 8x8 matrix is generated, that is 64 features
performance curve
MSDI

MIFS(β=0.2)

MIFS(β=0.4)

MIFS(β=0.6)

MIFS(β=0.8)

MIFS(β=1.0)

MSDI: Maximum Significant Difference and Independence

MIFS: Mutual Information Feature Selector

Performance Curve

1

0.9

Battiti’s MIFS:

0.8

Rate of recognition

0.7

It is need to determined β

0.6

Random ranking

0.5

0.4

0

10

20

30

40

50

60

Number of features

computational complexity
Computational Complexity
  • Selecting 15 features from the 64 original feature set
    • MSDI: 24 seconds
    • Battiti’s MIFS: 1110 seconds

(5 vales of β are searched in the range of 0-1)

example ii handwritten digit recognition
Example II: Handwritten digit recognition

The 649 features that distribute over the following six feature sets:

  • 76 Fourier coefficients of the character shapes,
  • 216 profile correlations,
  • 64 Karhunen-Love coefficients,
  • 240 pixel averages in 2 x 3 windows,
  • 47 Zernike moments,
  • 6 morphological features.
performance curve1
MSDI + MIC

Random ranking

MSDI: Maximum Significant difference and independence

MIC: Monotonically Increasing Curve

Performance Curve

1

0.8

Rate of recognition

MSDI

0.6

0.4

0.2

0

10

20

30

40

50

Number of features

comparison with mifs
MSDI: Maximum Significant Difference and Independence

MIFS: Mutual Information Feature Selector

Comparison with MIFS

MSDI is much better on large number of features

1

0.9

MSDI

0.8

MIFS (β=0.2)

Rate of recognition

MIFS (β=0.5)

0.7

0.6

0.5

MIFS is better on small number of features

0.4

0

10

20

30

40

50

Number of features

summary on comparing msdi with mifs
Summary on Comparing MSDI with MIFS
  • MSDI is much more computational effective
    • MIFS need to calculate the pdfs
    • The computational effective criterion (Battiti’s MIFS) still need to determine β
    • MSDI only involves the simple statistical calculation
  • MSDI can select more optimal feature subset from a large number of feature, because it is based on relevant statistical models
  • MIFS is more suitable on small volume of data and small feature subset
outline of part ii
Outline of Part II

Mass Spectrometry-Based Proteomic Pattern Analysis for Detection of Prostate Cancer

  • Problem Statement
  • Methods
    • Feature
    • Classification
    • optimization
  • Results and Discussion
problem statement
Problem Statement

15154 points (features)

  • Very large number of features
  • Electronic and chemical noise
  • Biological variability of human disease
  • Little knowledge in the proteomic mass spectrum
the system of proteomic pattern analysis
The system of Proteomic Pattern Analysis

STFS: Significance Test-Based Feature Selection

PNN: Probabilistic Neural Network

RBFNN: Radial Basis Function Neural Network

Training dataset (initial features > 104)

Most significant featuresselected by STFS

Optimization of the size of featuresubset and the parameters of classifierby minimizing ROC distance

RBFNN / PNN learning

Trained neural classifier

Mature classifier

feature selection stfs
Feature Selection: STFS

Significanceof feature

Significantdifference

MSDI

=

x

Independence

StudentTest

Pearsoncorrelation

MIC

STFS: Significance Test-Based Feature Selection

MSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy

classification pnn rbfnn
Classification: PNN / RBFNN

RBFNN is a modifiedfour-layer structure

PNN is a standard structure with four layers

x

y

yd

1

y(1)

x

S1

x1

2

Pool 1

x

3

x2

y(2)

x

n

xn

Pool 2

S2

PNN: Probabilistic Neural Network

RBFNN: Radial Basis Function Neural Network

optimization roc distance
Optimization: ROC Distance

1

dROC

a

b

True positive rate(sensitivity)

Minimizing the ROC distanceto optimize:

- Feature subset numbers m - Gaussian spread σ

- RBFNN pattern decision weight λ

0

0

False positive rate(1-specificity)

1

ROC: Receiver Operating Characteristic

pattern distribution
Pattern recognizedby RBFNN

Non-Cancer

Cancer

70

60

True negative 96.8%

False negative 2.9%

50

60

40

30

50

Non-Cancer

20

40

10

30

Labelled byBiopsies

0

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

20

10

True positive 97.1%

False positive 3.2%

Cancer

0

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Pattern Distribution

Cut-point

the possible causes on the unrecognizable samples
The possible causes onthe unrecognizable samples
  • The algorithm of the classifier is not able to recognize all the samples
  • The proteomics is not able to provide enough information
  • Prostatic biopsies mistakenly label the cancer
possibility of mistaken diagnosis of prostatic biopsy
70

60

50

60

40

30

50

20

40

10

30

0

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

20

10

0

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Possibility of Mistaken Diagnosis of Prostatic Biopsy
  • Biopsy has limited sensitivity and specificity
  • Proteomic classifier has very high sensitivity and specificity correlated with biopsy
  • The results of proteomic classifier are not exactly the same as biopsy
  • All unrecognizable sample are outliers

True non-cancer

False non-cancer

False cancer

True cancer

Cut-point

summary 1
Summary (1)

Significance Test-Based Feature Selection (STFS):

  • STFS selects features by maximum significant difference and independence (MSDI), it aims to determine minimum possible feature subset to achieve maximum recognition rate
  • Feature significance (selecting criterion ) is estimated based on the optimal statistical models in accordance with the properties of the data
  • Advantages:
    • Computationally effective
    • Optimality
summary 2
Summary (2)

Proteomic Pattern Analysis for Detection of Prostate Cancer

  • The system consists of three parts: feature selection by STFS, classification by PNN/RBFNN, optimization and evaluation by minimum ROC distance
  • Sensitivity 97.1%, Specificity 96.8%, it would be an asset to early and accurately detect prostate, and to prevent a large number of aging men from undergoing unnecessary prostatic biopsies
  • Suggestion of mistaken label by prostatic biopsy through pattern analysis may lead to a novel direction in the diagnostic research of prostate cancer
ad