1 / 33

Commonly Used Classification Techniques and Recent Developments

Commonly Used Classification Techniques and Recent Developments . Presented by Ke-Shiuan Lynn. Input Vector (Feature). Output (Class). Classifier. Terminology.

zurina
Download Presentation

Commonly Used Classification Techniques and Recent Developments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commonly Used Classification Techniques and Recent Developments Presented by Ke-Shiuan Lynn

  2. Input Vector (Feature) Output (Class) Classifier Terminology A classifier can be viewed as a function of block. A classifier assigns one class to each point of the input space. The input space is thus partitioned into disjoint subsets, called decision regions, each associated with a class.

  3. Decision regions Decision boundaries Input #1 Inputs of class A Inputs of class B Input #2 Terminology (cont.) The way a classifier classifies inputs is defined by its decision regions. The borderlines between decision regions are called decision-region boundaries or simply decision boundaries.

  4. Terminology (cont.) In practice, input vectors of different classes are rarely so neatly distinguishable. Samples of different classes may have same input vectors. Due to such a uncertainty, areas of input space can be clouded by a mixture of samples of different classes. Input #1 Input #2

  5. Terminology (cont.) • The optimal classifier is the one expected to produce the least number of misclassifications. • Such misclassifications are due to uncertainty in the problem rather than a deficiency in the decision regions. Input #1 Input #2

  6. Input #2 Input #1 Terminology (cont.) • A designed classifier is said to generalize well if the classifier achieves similar classification accuracy to both training samples and real world samples

  7. Types of Models • Decision-Region Boundaries • Probability Density Functions • Posterior Probabilities

  8. Decision-Region Boundaries • This type of model defines decision regions by explicitly constructing boundaries in the input space. • These models attempt to minimize the number of expected misclassifications by placing boundaries appropriately in the input space.

  9. Probability Density Functions (PDFs) • The models of this type attempt to construct a probability density function,p(x|C), that maps a point x in the input space to class C. • Prior probabilities,p(C), is to be estimated from the given database. • This model assigns the most probable class to an input vector x by selecting the class maximizingp(C)p(x|C).

  10. Posterior Probabilities • Let there be m possible classes denoted C1, C2, …, Cm. This type of models attempts to generate m posterior probabilities p(Ci|x), i=1, 2, …, m for any input vector x. • The classification is made in the way that the input vector is assigned to the class associated with maximal p(Ci|x).

  11. Approaches to Modeling • Fixed models • Parametric models • Nonparametric models

  12. Fixed models Fixed model is used when the exact input-output relationship is known. • Decision region boundary: A known threshold value (e.g. A particular BMI value for defining obesity) • PDF: When each class’s PDF can be obtained • Posterior probability: when the probability that any observation belongs to each class is know.

  13. Parametric Models • Parametric model is used when its parametric mathematical form can be obtained. • The development process of such models consists of two stages: (1) derive an appropriate parametric form, and (2) tune the parameters to fit data.

  14. Parametric Models (cont.) • Decision-region boundary: Linear discriminant function e.g. y=ax1+bx2+cx3+d • PDF: Multivariate Gaussian function • Posterior probability: Logistic regression

  15. Nonparametric Models • Nonparametric model is used when the relationships between input vectors and their associated classes are not well understood. • Models of varying smoothness and complexity are generated and the one with best generalization is chosen.

  16. Nonparametric Models (cont.) • Decision-region boundary: Learning Vector Quantization (LVQ), K nearest neighbor classifier, decision tree. • PDF: Gaussian mixture methods, Pazen’s window. • Posterior probability: Artificial neural network (ANN), radial basis function (RBF), group method of data handling (GMDH)

  17. Commonly Used Algorithms

  18. Practical Constraints • Memory usage • Training time • Classification time

  19. Memory Usage

  20. Training Time

  21. Classification time

  22. Comparison of Algorithms Linear regression: y = w0+w1x1+w2x2 +…+wNxN Logistic regression: • Linear and Logistic regressions both tend to explicitly construct the decision-region boundaries. • Advantages: Easy implementation, easy explanation of input-output relationship • Disadvantages: Limited complexity on the constructed boundary

  23. Comparison of Algorithms (cont) Root Binary decision tree: • Binary and Linear decision trees also tend to explicitly construct the decision-region boundaries. • Advantages: Easy implementation, easy explanation of input-output relationship • Disadvantages: Limited complexity on the constructed boundary, the tree structure may not be global optimal. xi>=c1 xi<c1 xj>=c2 xj<c2 xk>=c3 xk<c3

  24. Comparison of Algorithms (cont) Neural Network: • Feedforward neural network and radial-basis function network both tend to implicitly construct the decision-region boundaries. • Advantages: They can both approximate any complex decision boundaries provided that enough nodes are used. • Disadvantages: Long training time

  25. Comparison of Algorithms (cont) • Supporting vector machine • Supporting vector machine also tends to implicitly construct the decision-region boundaries. • Advantages: This type of classifier has been shown to have good generalization capability.

  26. Comparison of Algorithms (cont) Bay’s Rule: Unimodal Gaussian: • Unimodal Gaussian explicitly construct the PDF, compute the prior probability P(Cj) and posterior probability P(Cj|X). • Advantages: Easy implementation, confidence level can be obtained from the posterior probabilities. • Disadvantages: Sample distributions may not be Gaussian.

  27. Comparison of Algorithms (cont) • Gaussian mixture modify unimodal Gaussian in the way that the PDF is estimated by a weighted average of multiple Gaussian. • Similar to Gaussian mixture Parzen’s windows approximate PDF using weighted average of radial Gaussian. • Advantage: Given enough Gaussian components, the above architectures can approximate arbitrary complex distributions

  28. Comparison of Algorithms (cont) K nearest neighbor classifier • K nearest neighbor tends to construct posterior probabilities P(Cj|X) • Advantage: No training is required, confidence level can be obtained • Disadvantage: classification accuracy is low is complex decision-region boundary exists, large storage required.

  29. Other Useful Classifiers • Projection Pursuit: aims to decomposing the task of high-dimensional modeling into a sequence of low-dimensional modeling. • This algorithm consists of two stage: the first stage projects the input data onto a one-dimensional space while the second stage construct the mapping from projected space to the output space.

  30. Other Useful Classifiers (cont) • Multivariate adaptive regression splines (MARS) tends to approximate the decision-region boundaries in two stages. • At the first stage, the algorithm partitions the state space into small portions. • At the second stage, the algorithm construct a low-order polynomial to approximate the decision-region boundary within each partition. • Disadvantage: This algorithm is intractable for problem with high (> 10) dimensional inputs

  31. Other Useful Classifiers (cont) • Group method of data handling (GMDH) also aims to approximate the decision-region boundaries using high-order polynomial functions. • The modeling process begins with a low order polynomial, and then iteratively combines terms to produce a higher order polynomial until the modeling accuracy saturates.

  32. Keep The Following In Mind • Use multiple algorithms without bias and let your specific data help determine which model is best suited for your problem. • Occam’s Razor: Entities should not be multiplied unnecessarily -- "when you have two competing models which make exactly the same predictions to the data, the one that is simpler is the better."

  33. A New Member In Our Group

More Related