Final exam review
1 / 28

Final Exam Review - PowerPoint PPT Presentation

  • Uploaded on

Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam Material. Midterm Exam Material Dimensionality Reduction Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm. Dimensionality Reduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Final Exam Review' - nita-mcclain

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Final exam review

Final Exam Review

CS479/679 Pattern RecognitionDr. George Bebis

Final exam material
Final Exam Material

  • Midterm Exam Material

  • Dimensionality Reduction

  • Feature Selection

  • Linear Discriminant Functions

  • Support Vector Machines

  • Expectation-Maximization Algorithm

Dimensionality reduction
Dimensionality Reduction

  • What is the goal of dimensionality reduction and why is it useful?

    • Reduce the dimensionality of the data

    • Eliminate redundant and irrelevant features

    • Less training samples, faster classification

  • How is dimensionality reduction performed?

    • Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation

      y = UTx x ϵ RN, U is NxK, and y ϵ RK

    • Or, select a subset of features (feature selection)

Dimensionality reduction1
Dimensionality Reduction

  • Give two examples of linear dimensionality reduction techniques.

    • Principal Component Analysis (PCA)

    • Linear Discriminant Analysis (LDA)

  • What is the difference between PCA and LDA?

    • PCA seeks a projection that preserves as much information in the data as possible.

    • LDA seeks a projection that best separates the data.

Dimensionality reduction2
Dimensionality Reduction

  • What is the solution found by PCA?

    • “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components)

  • You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.

Dimensionality reduction3
Dimensionality Reduction

  • You need to know how to apply PCA for face recognition and face detection.

  • What practical issue arises when applying PCA for face recognition? How do we deal with it?

    • The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images)

    • Consider the alternative matrix ATA which is only MxM (M is the number of training face images)

Dimensionality reduction4
Dimensionality Reduction

  • What is the solution found by LDA?

    • Maximize the between-class scatter Sb while minimizing the within-class scatter Sw

    • Solution is given by the eigenvectors the following generalized eigenvalue problem:

Dimensionality reduction5
Dimensionality Reduction

  • What practical issue arises when applying LDA for face recognition? How do we deal with it?

    • Solution can be obtained as follows:

    • But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.

Feature selection
Feature Selection

  • What is the goal of feature selection?

    • Select features having high discrimination power while ignoring or paying less attention to the rest.

  • What are the main steps in feature selection?

    • Search the space of possible feature subsets.

    • Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).

Feature selection1
Feature Selection

  • What are the main search and evaluations strategies?

  • What is the difference between filter and wrapper methods?

    • In filter methods, evaluation is independent of the classification algorithm.

    • In wrapper methods, evaluation depends on the classification algorithm.

Search strategies: Optimal, Heuristic, Randomized

Evaluation strategies: filter, wrapper

Feature selection2
Feature Selection

  • You need to be familiar with:

    • Exhaustive and Naïve search

    • Sequential Forward/Backward Selection (SFS/SBS)

    • Plus-L Minus-R Selection

    • Bidirectional Search

    • Sequential Floating Selection (SFFS and SFBS)

    • Feature selection using GAs

Linear discriminant functions
Linear Discriminant Functions

  • General form of linear discriminant:

  • What is the form of the decision boundary? What is the meaning of w and w0?

    • The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.

Linear discriminant functions1
Linear Discriminant Functions

  • What does g(x) measure?

    • Distance of x from the decision boundary (hyperplane)

Linear discriminant functions2
Linear Discriminant Functions

  • How do we find w and w0?

    • Apply learning using a set of labeled training examples

  • What is the effect of each training example?

    • Places a constraint on the solution


solution space (ɑ1, ɑ2)

feature space (y1, y2)


Linear discriminant functions3
Linear Discriminant Functions

  • Iterative optimization – what is the main idea?

    • Minimize some error function J(α) iteratively

search direction



learning rate

Linear discriminant functions4
Linear Discriminant Functions

  • Gradient descent method

  • Newton method

  • Perceptron rule

Support vector machines
Support Vector Machines

  • What is the capacity of a classifier?

  • What is the VC dimension of a classifier?

  • What is structural risk minimization?

    • Find solutions that (1) minimize the empirical risk and

      (2) have low VC dimension.

    • It can be shown that:

with probability(1-δ)

Support vector machines1
Support Vector Machines

  • What is the margin of separation? How is it defined?

  • What is the relationship between VC dimension and margin of separation?

    • VC dimension is minimized by maximizing the margin of separation.

support vectors

Support vector machines2
Support Vector Machines

  • What is the criterion being optimized by SVMs?



Support vector machines3
Support Vector Machines

  • SVM solutiondepends only on the support vectors:

  • Soft margin classifier – tolerate “outliers”

Support vector machines4
Support Vector Machines

  • Non-linear SVM – what is the main idea?

    • Map data to a high dimensional space h

Support vector machines5
Support Vector Machines

  • What is the kernel trick?

    • Compute dot products using a kernel function

K(x,y)=(x . y) d

polynomial kernel:

Support vector machines6
Support Vector Machines

  • Important comments about SVMs

    • SVM is based on exact optimization (no local optima).

    • Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space.

    • Performance depends on the choice of the kernel and its parameters.

Expectation maximization em
Expectation-Maximization (EM)

  • What is the EM algorithm?

    • An iterative method to perform ML estimation

      max p(D/ θ)

  • When is EM useful?

    • Works best for problems where the data is incompleteor can be thought as being incomplete.

Expectation maximization em1
Expectation-Maximization (EM)

  • What are the steps of the EM algorithm?

    • Initialization:θ0

    • Expectation Step:

    • Maximization Step:

    • Test for convergence:

  • Convergence properties of EM ?

    • Solution depends on the initial estimate θ0

    • No guarantee to find global maximum but stable

Expectation maximization em2
Expectation-Maximization (EM)

  • What is a mixture of Gaussians?

  • How are the parameters of MoGs estimated?

    • Using the EM algorithm

  • What is the main idea behind using EM for estimating the MoGs parameters?

    • Introduce “hidden variables:

Expectation maximization em3
Expectation-Maximization (EM)

  • Explain the EM steps for MoGs

Expectation maximization em4
Expectation-Maximization (EM)

  • Explain the EM steps for MoGs