1 / 28

# Final Exam Review - PowerPoint PPT Presentation

Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam Material. Midterm Exam Material Dimensionality Reduction Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm. Dimensionality Reduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Final Exam Review' - nita-mcclain

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Final Exam Review

CS479/679 Pattern RecognitionDr. George Bebis

• Midterm Exam Material

• Dimensionality Reduction

• Feature Selection

• Linear Discriminant Functions

• Support Vector Machines

• Expectation-Maximization Algorithm

• What is the goal of dimensionality reduction and why is it useful?

• Reduce the dimensionality of the data

• Eliminate redundant and irrelevant features

• Less training samples, faster classification

• How is dimensionality reduction performed?

• Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation

y = UTx x ϵ RN, U is NxK, and y ϵ RK

• Or, select a subset of features (feature selection)

• Give two examples of linear dimensionality reduction techniques.

• Principal Component Analysis (PCA)

• Linear Discriminant Analysis (LDA)

• What is the difference between PCA and LDA?

• PCA seeks a projection that preserves as much information in the data as possible.

• LDA seeks a projection that best separates the data.

• What is the solution found by PCA?

• “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components)

• You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.

• You need to know how to apply PCA for face recognition and face detection.

• What practical issue arises when applying PCA for face recognition? How do we deal with it?

• The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images)

• Consider the alternative matrix ATA which is only MxM (M is the number of training face images)

• What is the solution found by LDA?

• Maximize the between-class scatter Sb while minimizing the within-class scatter Sw

• Solution is given by the eigenvectors the following generalized eigenvalue problem:

• What practical issue arises when applying LDA for face recognition? How do we deal with it?

• Solution can be obtained as follows:

• But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.

• What is the goal of feature selection?

• Select features having high discrimination power while ignoring or paying less attention to the rest.

• What are the main steps in feature selection?

• Search the space of possible feature subsets.

• Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).

• What are the main search and evaluations strategies?

• What is the difference between filter and wrapper methods?

• In filter methods, evaluation is independent of the classification algorithm.

• In wrapper methods, evaluation depends on the classification algorithm.

Search strategies: Optimal, Heuristic, Randomized

Evaluation strategies: filter, wrapper

• You need to be familiar with:

• Exhaustive and Naïve search

• Sequential Forward/Backward Selection (SFS/SBS)

• Plus-L Minus-R Selection

• Bidirectional Search

• Sequential Floating Selection (SFFS and SFBS)

• Feature selection using GAs

• General form of linear discriminant:

• What is the form of the decision boundary? What is the meaning of w and w0?

• The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.

• What does g(x) measure?

• Distance of x from the decision boundary (hyperplane)

• How do we find w and w0?

• Apply learning using a set of labeled training examples

• What is the effect of each training example?

• Places a constraint on the solution

a2

solution space (ɑ1, ɑ2)

feature space (y1, y2)

a1

• Iterative optimization – what is the main idea?

• Minimize some error function J(α) iteratively

search direction

α(k)

α(k+1)

learning rate

• Newton method

• Perceptron rule

• What is the capacity of a classifier?

• What is the VC dimension of a classifier?

• What is structural risk minimization?

• Find solutions that (1) minimize the empirical risk and

(2) have low VC dimension.

• It can be shown that:

with probability(1-δ)

• What is the margin of separation? How is it defined?

• What is the relationship between VC dimension and margin of separation?

• VC dimension is minimized by maximizing the margin of separation.

support vectors

• What is the criterion being optimized by SVMs?

maximize

margin:

• SVM solutiondepends only on the support vectors:

• Soft margin classifier – tolerate “outliers”

• Non-linear SVM – what is the main idea?

• Map data to a high dimensional space h

• What is the kernel trick?

• Compute dot products using a kernel function

K(x,y)=(x . y) d

polynomial kernel:

• SVM is based on exact optimization (no local optima).

• Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space.

• Performance depends on the choice of the kernel and its parameters.

• What is the EM algorithm?

• An iterative method to perform ML estimation

max p(D/ θ)

• When is EM useful?

• Works best for problems where the data is incompleteor can be thought as being incomplete.

• What are the steps of the EM algorithm?

• Initialization:θ0

• Expectation Step:

• Maximization Step:

• Test for convergence:

• Convergence properties of EM ?

• Solution depends on the initial estimate θ0

• No guarantee to find global maximum but stable

• What is a mixture of Gaussians?

• How are the parameters of MoGs estimated?

• Using the EM algorithm

• What is the main idea behind using EM for estimating the MoGs parameters?

• Introduce “hidden variables:

• Explain the EM steps for MoGs

• Explain the EM steps for MoGs