1 / 29

CES 514 – Data Mining Lecture 8 classification (contd…)

CES 514 – Data Mining Lecture 8 classification (contd…). Example: PEBLS. PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) Works with both continuous and nominal features

briansmith
Download Presentation

CES 514 – Data Mining Lecture 8 classification (contd…)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CES 514 – Data Mining Lecture 8classification (contd…)

  2. Example: PEBLS • PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) • Works with both continuous and nominal features • For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) • Each record is assigned a weight factor • Number of nearest neighbor, k = 1

  3. Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Example: PEBLS

  4. Example: PEBLS Distance between record X and record Y: where: wX 1 if X makes accurate prediction most of the time wX> 1 if X is not reliable for making predictions

  5. Find a linear hyperplane (decision boundary) that will separate the data Support Vector Machines

  6. One Possible Solution Support Vector Machines

  7. Another possible solution Support Vector Machines

  8. Other possible solutions Support Vector Machines

  9. Which one is better? B1 or B2? How do you define better? Support Vector Machines

  10. Find hyperplane maximizes the margin (e.g. B1 is better than B2.) Support Vector Machines

  11. Support Vector Machines

  12. Support Vector Machines • We want to maximize: • Which is equivalent to minimizing: • But subjected to the following constraints: • This is a constrained optimization problem • Numerical approaches to solve it (e.g., quadratic programming)

  13. Overview of optimization • Simplest optimization problem: • Maximize f(x) (one variable) • If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. • solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. • Tricky issues: • How to solve the equation f’(x) = 0? • what if there are many solutions? Each is a “local” optimum.

  14. How to solve g(x) = 0 • Even polynomial equations are very hard to solve. • Quadratic has a closed-form. What about higher-degrees? • Numerical techniques: (iteration) • bisection • secant • Newton-Raphson etc. • Challenges: • initial guess • rate of convergence?

  15. Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.

  16. When is the solution maximum or minimum? • Hessian: • if the Hessian is positive definite in the neighborhood of a, then a is a minimum. • if the Hessian is negative definite in the neighborhood of a, then a is a maximum. • if it is neither, then a is a saddle point.

  17. Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and

  18. Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + l(g(x,y) – c) Now, solve the equations:

  19. Support Vector Machines (contd) • What if the problem is not linearly separable?

  20. Support Vector Machines • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to:

  21. Nonlinear Support Vector Machines • What if decision boundary is not linear?

  22. Nonlinear Support Vector Machines • Transform data into higher dimensional space

  23. Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

  24. Artificial Neural Networks (ANN)

  25. Artificial Neural Networks (ANN) • Model is an assembly of inter-connected nodes and weighted links • Output node sums up each of its input value according to the weights of its links • Compare output node against some threshold t Perceptron Model or

  26. General Structure of ANN Training ANN means learning the weights of the neurons

  27. Algorithm for learning ANN • Initialize the weights (w0, w1, …, wk) • Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples • Objective function: • Find the weights wi’s that minimize the above objective function • e.g., backpropagation algorithm

  28. WEKA

  29. WEKA implementations • WEKA has implementation of all the major data mining algorithms including: • decision trees (CART, C4.5 etc.) • naïve Bayes algorithm and all variants • nearest neighbor classifier • linear classifier • Support Vector Machine • clustering algorithms • boosting algorithms etc.

More Related