- 72 Views
- Uploaded on
- Presentation posted in: General

Final Exam Review

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Final Exam Review

CS479/679 Pattern RecognitionDr. George Bebis

- Midterm Exam Material
- Dimensionality Reduction
- Feature Selection
- Linear Discriminant Functions
- Support Vector Machines
- Expectation-Maximization Algorithm

- What is the goal of dimensionality reduction and why is it useful?
- Reduce the dimensionality of the data
- Eliminate redundant and irrelevant features
- Less training samples, faster classification

- How is dimensionality reduction performed?
- Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation
y = UTx x ϵ RN, U is NxK, and y ϵ RK

- Or, select a subset of features (feature selection)

- Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation

- Give two examples of linear dimensionality reduction techniques.
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)

- What is the difference between PCA and LDA?
- PCA seeks a projection that preserves as much information in the data as possible.
- LDA seeks a projection that best separates the data.

- What is the solution found by PCA?
- “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components)

- You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.

- You need to know how to apply PCA for face recognition and face detection.
- What practical issue arises when applying PCA for face recognition? How do we deal with it?
- The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images)
- Consider the alternative matrix ATA which is only MxM (M is the number of training face images)

- What is the solution found by LDA?
- Maximize the between-class scatter Sb while minimizing the within-class scatter Sw
- Solution is given by the eigenvectors the following generalized eigenvalue problem:

- What practical issue arises when applying LDA for face recognition? How do we deal with it?
- Solution can be obtained as follows:
- But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.

- What is the goal of feature selection?
- Select features having high discrimination power while ignoring or paying less attention to the rest.

- What are the main steps in feature selection?
- Search the space of possible feature subsets.
- Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).

- What are the main search and evaluations strategies?
- What is the difference between filter and wrapper methods?
- In filter methods, evaluation is independent of the classification algorithm.
- In wrapper methods, evaluation depends on the classification algorithm.

Search strategies: Optimal, Heuristic, Randomized

Evaluation strategies: filter, wrapper

- You need to be familiar with:
- Exhaustive and Naïve search
- Sequential Forward/Backward Selection (SFS/SBS)
- Plus-L Minus-R Selection
- Bidirectional Search
- Sequential Floating Selection (SFFS and SFBS)
- Feature selection using GAs

- General form of linear discriminant:
- What is the form of the decision boundary? What is the meaning of w and w0?
- The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.

- What does g(x) measure?
- Distance of x from the decision boundary (hyperplane)

- How do we find w and w0?
- Apply learning using a set of labeled training examples

- What is the effect of each training example?
- Places a constraint on the solution

a2

solution space (ɑ1, ɑ2)

feature space (y1, y2)

a1

- Iterative optimization – what is the main idea?
- Minimize some error function J(α) iteratively

search direction

α(k)

α(k+1)

learning rate

- Gradient descent method
- Newton method
- Perceptron rule

- What is the capacity of a classifier?
- What is the VC dimension of a classifier?
- What is structural risk minimization?
- Find solutions that (1) minimize the empirical risk and
(2) have low VC dimension.

- It can be shown that:

- Find solutions that (1) minimize the empirical risk and

with probability(1-δ)

- What is the margin of separation? How is it defined?
- What is the relationship between VC dimension and margin of separation?
- VC dimension is minimized by maximizing the margin of separation.

support vectors

- What is the criterion being optimized by SVMs?

maximize

margin:

- SVM solutiondepends only on the support vectors:
- Soft margin classifier – tolerate “outliers”

- Non-linear SVM – what is the main idea?
- Map data to a high dimensional space h

- What is the kernel trick?
- Compute dot products using a kernel function

K(x,y)=(x . y) d

polynomial kernel:

- Important comments about SVMs
- SVM is based on exact optimization (no local optima).
- Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space.
- Performance depends on the choice of the kernel and its parameters.

- What is the EM algorithm?
- An iterative method to perform ML estimation
max p(D/ θ)

- An iterative method to perform ML estimation
- When is EM useful?
- Works best for problems where the data is incompleteor can be thought as being incomplete.

- What are the steps of the EM algorithm?
- Initialization:θ0
- Expectation Step:
- Maximization Step:
- Test for convergence:

- Convergence properties of EM ?
- Solution depends on the initial estimate θ0
- No guarantee to find global maximum but stable

- What is a mixture of Gaussians?
- How are the parameters of MoGs estimated?
- Using the EM algorithm

- What is the main idea behind using EM for estimating the MoGs parameters?
- Introduce “hidden variables:

- Explain the EM steps for MoGs

- Explain the EM steps for MoGs