Kernelized discriminant analysis and adaptive methods for discriminant analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis. Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007. Clustering. Clustering : grouping of data based on similarity measures. Classification.

Download Presentation

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Haesun Park

Georgia Institute of Technology,

Atlanta, GA, USA

(joint work with C. Park)

KAIST, Korea, June 2007


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Clustering


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • Clustering :

  • grouping of data based on similarity measures


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Classification

  • Classification:

  • assign a class label to new unseen data


Data mining

Data Mining

  • Mining or discovery of new information - patterns

  • or rules - from large databases

Data Preparation

Data Reduction

  • Dimension reduction

  • Feature Selection

  • -

Preprocessing

Feature Extraction

  • Association Analysis

  • Regression

  • Probabilistic modeling …

Classification

Clustering


Feature extraction

Feature Extraction

  • Optimal feature extraction

  • - Reduce the dimensionality of data space

  • - Minimize effects of redundant features and noise

Curse of dimensionality

number of features

new data

..

..

..

feature extraction

Apply a classifier

to predict a class

label of new data

..

..

..


Linear dimension reduction

Linear dimension reduction

Maximize class separability

in the reduced dimensional space


Linear dimension reduction1

Linear dimension reduction

Maximize class separability

in the reduced dimensional space


What if data is not linear separable

What if data is not linear separable?

Nonlinear Dimension Reduction


Contents

Contents

  • Linear Discriminant Analysis

  • Nonlinear Dimension Reduction based on Kernel Methods

    - Nonlinear Discriminant Analysis

  • Application to Fingerprint Classification


Linear discriminant analysis lda

Linear Discriminant Analysis (LDA)

For a given data set {a1,┉,an }

Centroids :

  • Within-class scatter matrix

  • trace(Sw)


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • Between-class scatter matrix

  • trace(Sb)

a1┉ an

GTa1┉ GTan

GT

trace(GTSbG)

maximize

minimize

trace(GTSwG)


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Eigenvalue problem

G

=

Sw-1 Sb

Sw-1Sb X =  X

rank(Sb) number of classes - 1


Face recognition

Face Recognition

dimension reduction to maximize

the distances among classes.

92 x 112

?

10304

GT


Text classification

Text Classification

  • A bag of words: each document is represented with frequencies of words contained

Education

Recreation

Faculty

Student

Syllabus

Grade

Tuition

….

Movie

Music

Sport

Hollywood

Theater

…..

GT


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Generalized LDA Algorithms

  • Undersampled problems:

  • high dimensionality & small number of data

  •  Can’t compute Sw-1Sb

Sb

Sw


Nonlinear dimension reduction based on kernel methods

Nonlinear Dimension Reductionbased on Kernel Methods


Nonlinear dimension reduction

Nonlinear Dimension Reduction

nonlinear mapping

linear dimension

reduction

GT


Kernel method

Kernel Method

  • If a kernel function k(x,y) satisfies Mercer’s condition, then there exists a mapping 

    for which <(x),(y)>= k(x,y) holds

A (A)

< x, y > < (x), (y) > = k(x,y)

  • For a finite data set A=[a1,…,an], Mercer’s condition can be rephrased as the kernel matrix

  • is positive semi-definite.


Nonlinear dimension reduction by kernel methods

Nonlinear Dimension Reduction by Kernel Methods

Given a kernel function k(x,y)

linear dimension

reduction

GT


Positive definite kernel functions

Positive Definite Kernel Functions

  • Gaussian kernel

  • Polynomial kernel


Nonlinear discriminant analysis using kernel methods

Nonlinear Discriminant Analysis using Kernel Methods

{a1,a2,…,an}

{(a1),…,(an)}

Want to apply LDA

<(x),(y)>= k(x,y)

Sb x= Sw x


Nonlinear discriminant analysis using kernel methods1

Nonlinear Discriminant Analysis using Kernel Methods

{a1,a2,…,an}

{(a1),…,(an)}

k(a1,a1) k(a1,an)

… ,…, …

k(an,a1) k(an,an)

Sbu= Swu

Sb x= Sw x

Apply Generalized LDA

Algorithms


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Generalized LDA Algorithms

Sb

Sw

Minimizetrace(xT Sw x)

xT Sw x = 0

x null(Sw)

Maximizetrace(xT Sb x)

xT Sb x ≠ 0

x range(Sb)


Generalized lda algorithms

Generalized LDA algorithms

RLDA

  • Add a positive diagonal matrix I

    to Swso that Sw+I is nonsingular

  • Apply the generalized singular value

  • decomposition (GSVD) to {Hw , Hb}

  • in Sb = Hb HbT and Sw=Hw HwT

LDA/GSVD

To-N(Sw)

  • Projection to null space of Sw

  • Maximize between-class scatter

  • in the projected space


Generalized lda algorithms1

Generalized LDAAlgorithms

To-R(Sb)

  • Transformation to range space of Sb

  • Diagonalize within-class scatter matrix

    in the transformed space

  • Reduce data dimension by PCA

  • Maximize between-class scatter

  • in range(Sw) and null(Sw)

To-NR(Sw)


Data sets

Data sets

From Machine Learning Repository Database

Data dim no. of data no. of classes

Musk 166 6599 2

Isolet 617 7797 26

Car 6 1728 4

Mfeature 649 2000 10

Bcancer 9 699 2

Bscale 4 625 3


Experimental settings

Experimental Settings

Original data

Split

Training data

Test data

kernel function k and a linear transf. GT

Dimension reducing

Predict class labels of test data using training data


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Prediction

accuracies

methods

  • Each color represents different data sets


Linear and nonlinear discriminant analysis

Linear and Nonlinear Discriminant Analysis

Data sets


Face recognition1

Face Recognition


Application of nonlinear discriminant analysis to fingerprint classification

Application of Nonlinear Discriminant Analysis to Fingerprint Classification


Kernelized discriminant analysis and adaptive methods for discriminant analysis

Fingerprint Classification

Left Loop Right Loop Whorl

Arch Tented Arch

From NIST Fingerprint database 4


Previous works in fingerprint classification

Previous Works in Fingerprint Classification

Apply Classifiers:

Neural Networks

Support Vector

Machines

Probabilistic NN

Feature representation

Minutiae

Gabor filtering

Directional partitioning

Our Approach

Construct core directional images by DFT

Dimension Reduction by Nonlinear Discriminant Analysis


Construction of core directional images

Construction of Core Directional Images

Left Loop Right Loop Whorl


Construction of core directional images1

Construction of Core Directional Images

Core Point


Discrete fourier transform dft

Discrete Fourier transform (DFT)


Discrete fourier transform dft1

Discrete Fourier transform (DFT)


Construction of directional images

Construction of Directional Images

  • Computation of local dominant directions by DFT and directional filtering

  • Core point detection

  • Reconstruction of core directional images

  • Fast computation of DFT by FFT

  • Reliable for low quality images


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • Computation of local dominant directions by DFT and directional filtering


Construction of directional images1

Construction of Directional Images

512 x 512

105 x 105


Nonlinear discriminant analysis

Nonlinear discriminant Analysis

105 x 105

Maximizing class separability

in the reduced dimensional space

Right loop

Whorl

Left loop

GT

Tented arch

Arch

4-dim. space

11025-dim. space


Comparison of experimental results

Comparison of Experimental Results

NIST Database 4

Rejection rate (%) 0 1.8 8.5 20.0

Nonlinear LDA/GSVD90.791.392.8 95.3

PCASYS +  89.7 90.5 92.895.6

Jain et.al. [1999,TPAMI] - 90.0 91.2 93.5

Yao et al. [2003,PR] - 90.0 92.2 95.6

prediction accuracies (%)


Summary

Summary

  • Nonlinear Feature Extraction based on Kernel Methods

    - Nonlinear Discriminant Analysis

    - Kernel Orthogonal Centroid Method (KOC)

  • A comparison of Generalized Linear and Nonlinear Discriminant Analysis Algorithms

  • Application to Fingerprint Classification


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • Dimension reduction - feature transformation :

    linear combination of original features

  • Feature selection :

    select a part of original features

    gene expression microarray data anaysis

    -- gene selection

  • Visualization of high dimensional data

  • Visual data mining


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • Core point detection

  • θi,j:dominant direction on the neighborhood centered at (i, j)

  • Measure consistency of local dominant directions

    | ΣΣi,j=-1,0,1[cos(2θi,j), sin(2θi,j)] |

    :distance from the starting point to finishing point

  • the lowest value -> Core point


References

References

  • L.Chen et al., A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, 33:1713-1726, 2000

  • P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1):165-179, 2003

  • H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34:2067-2070, 2001

  • J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36:563-566, 2003

  • H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003

  • S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999

  • B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10:1299-1319, 1998

  • G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12:2385-2404, 2000

  • V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12:568-574, 2000

..


Kernelized discriminant analysis and adaptive methods for discriminant analysis

  • S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2):263-270, 2002

  • C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp. 98-102, 2005

  • A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4):348-359,1999

  • Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2):397-406,2003

  • C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4):801-810

  • C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appear

  • C.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006


  • Login