Final exam review
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Final Exam Review PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam Material. Midterm Exam Material Dimensionality Reduction Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm. Dimensionality Reduction.

Download Presentation

Final Exam Review

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Final exam review

Final Exam Review

CS479/679 Pattern RecognitionDr. George Bebis


Final exam material

Final Exam Material

  • Midterm Exam Material

  • Dimensionality Reduction

  • Feature Selection

  • Linear Discriminant Functions

  • Support Vector Machines

  • Expectation-Maximization Algorithm


Dimensionality reduction

Dimensionality Reduction

  • What is the goal of dimensionality reduction and why is it useful?

    • Reduce the dimensionality of the data

    • Eliminate redundant and irrelevant features

    • Less training samples, faster classification

  • How is dimensionality reduction performed?

    • Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation

      y = UTx x ϵ RN, U is NxK, and y ϵ RK

    • Or, select a subset of features (feature selection)


Dimensionality reduction1

Dimensionality Reduction

  • Give two examples of linear dimensionality reduction techniques.

    • Principal Component Analysis (PCA)

    • Linear Discriminant Analysis (LDA)

  • What is the difference between PCA and LDA?

    • PCA seeks a projection that preserves as much information in the data as possible.

    • LDA seeks a projection that best separates the data.


Dimensionality reduction2

Dimensionality Reduction

  • What is the solution found by PCA?

    • “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components)

  • You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.


Dimensionality reduction3

Dimensionality Reduction

  • You need to know how to apply PCA for face recognition and face detection.

  • What practical issue arises when applying PCA for face recognition? How do we deal with it?

    • The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images)

    • Consider the alternative matrix ATA which is only MxM (M is the number of training face images)


Dimensionality reduction4

Dimensionality Reduction

  • What is the solution found by LDA?

    • Maximize the between-class scatter Sb while minimizing the within-class scatter Sw

    • Solution is given by the eigenvectors the following generalized eigenvalue problem:


Dimensionality reduction5

Dimensionality Reduction

  • What practical issue arises when applying LDA for face recognition? How do we deal with it?

    • Solution can be obtained as follows:

    • But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.


Feature selection

Feature Selection

  • What is the goal of feature selection?

    • Select features having high discrimination power while ignoring or paying less attention to the rest.

  • What are the main steps in feature selection?

    • Search the space of possible feature subsets.

    • Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).


Feature selection1

Feature Selection

  • What are the main search and evaluations strategies?

  • What is the difference between filter and wrapper methods?

    • In filter methods, evaluation is independent of the classification algorithm.

    • In wrapper methods, evaluation depends on the classification algorithm.

Search strategies: Optimal, Heuristic, Randomized

Evaluation strategies: filter, wrapper


Feature selection2

Feature Selection

  • You need to be familiar with:

    • Exhaustive and Naïve search

    • Sequential Forward/Backward Selection (SFS/SBS)

    • Plus-L Minus-R Selection

    • Bidirectional Search

    • Sequential Floating Selection (SFFS and SFBS)

    • Feature selection using GAs


Linear discriminant functions

Linear Discriminant Functions

  • General form of linear discriminant:

  • What is the form of the decision boundary? What is the meaning of w and w0?

    • The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.


Linear discriminant functions1

Linear Discriminant Functions

  • What does g(x) measure?

    • Distance of x from the decision boundary (hyperplane)


Linear discriminant functions2

Linear Discriminant Functions

  • How do we find w and w0?

    • Apply learning using a set of labeled training examples

  • What is the effect of each training example?

    • Places a constraint on the solution

a2

solution space (ɑ1, ɑ2)

feature space (y1, y2)

a1


Linear discriminant functions3

Linear Discriminant Functions

  • Iterative optimization – what is the main idea?

    • Minimize some error function J(α) iteratively

search direction

α(k)

α(k+1)

learning rate


Linear discriminant functions4

Linear Discriminant Functions

  • Gradient descent method

  • Newton method

  • Perceptron rule


Support vector machines

Support Vector Machines

  • What is the capacity of a classifier?

  • What is the VC dimension of a classifier?

  • What is structural risk minimization?

    • Find solutions that (1) minimize the empirical risk and

      (2) have low VC dimension.

    • It can be shown that:

with probability(1-δ)


Support vector machines1

Support Vector Machines

  • What is the margin of separation? How is it defined?

  • What is the relationship between VC dimension and margin of separation?

    • VC dimension is minimized by maximizing the margin of separation.

support vectors


Support vector machines2

Support Vector Machines

  • What is the criterion being optimized by SVMs?

maximize

margin:


Support vector machines3

Support Vector Machines

  • SVM solutiondepends only on the support vectors:

  • Soft margin classifier – tolerate “outliers”


Support vector machines4

Support Vector Machines

  • Non-linear SVM – what is the main idea?

    • Map data to a high dimensional space h


Support vector machines5

Support Vector Machines

  • What is the kernel trick?

    • Compute dot products using a kernel function

K(x,y)=(x . y) d

polynomial kernel:


Support vector machines6

Support Vector Machines

  • Important comments about SVMs

    • SVM is based on exact optimization (no local optima).

    • Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space.

    • Performance depends on the choice of the kernel and its parameters.


Expectation maximization em

Expectation-Maximization (EM)

  • What is the EM algorithm?

    • An iterative method to perform ML estimation

      max p(D/ θ)

  • When is EM useful?

    • Works best for problems where the data is incompleteor can be thought as being incomplete.


Expectation maximization em1

Expectation-Maximization (EM)

  • What are the steps of the EM algorithm?

    • Initialization:θ0

    • Expectation Step:

    • Maximization Step:

    • Test for convergence:

  • Convergence properties of EM ?

    • Solution depends on the initial estimate θ0

    • No guarantee to find global maximum but stable


Expectation maximization em2

Expectation-Maximization (EM)

  • What is a mixture of Gaussians?

  • How are the parameters of MoGs estimated?

    • Using the EM algorithm

  • What is the main idea behind using EM for estimating the MoGs parameters?

    • Introduce “hidden variables:


Expectation maximization em3

Expectation-Maximization (EM)

  • Explain the EM steps for MoGs


Expectation maximization em4

Expectation-Maximization (EM)

  • Explain the EM steps for MoGs


  • Login