Support Vector Machines in Data Mining

CES 514 – Data Mining Lecture 8classification (contd…)

Example: PEBLS • PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) • Works with both continuous and nominal features • For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) • Each record is assigned a weight factor • Number of nearest neighbor, k = 1

Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Example: PEBLS

Example: PEBLS Distance between record X and record Y: where: wX 1 if X makes accurate prediction most of the time wX> 1 if X is not reliable for making predictions

Find a linear hyperplane (decision boundary) that will separate the data Support Vector Machines

One Possible Solution Support Vector Machines

Another possible solution Support Vector Machines

Other possible solutions Support Vector Machines

Which one is better? B1 or B2? How do you define better? Support Vector Machines

Find hyperplane maximizes the margin (e.g. B1 is better than B2.) Support Vector Machines

Support Vector Machines

Support Vector Machines • We want to maximize: • Which is equivalent to minimizing: • But subjected to the following constraints: • This is a constrained optimization problem • Numerical approaches to solve it (e.g., quadratic programming)

Overview of optimization • Simplest optimization problem: • Maximize f(x) (one variable) • If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. • solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. • Tricky issues: • How to solve the equation f’(x) = 0? • what if there are many solutions? Each is a “local” optimum.

How to solve g(x) = 0 • Even polynomial equations are very hard to solve. • Quadratic has a closed-form. What about higher-degrees? • Numerical techniques: (iteration) • bisection • secant • Newton-Raphson etc. • Challenges: • initial guess • rate of convergence?

Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.

When is the solution maximum or minimum? • Hessian: • if the Hessian is positive definite in the neighborhood of a, then a is a minimum. • if the Hessian is negative definite in the neighborhood of a, then a is a maximum. • if it is neither, then a is a saddle point.

Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and

Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + l(g(x,y) – c) Now, solve the equations:

Support Vector Machines (contd) • What if the problem is not linearly separable?

Support Vector Machines • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to:

Nonlinear Support Vector Machines • What if decision boundary is not linear?

Nonlinear Support Vector Machines • Transform data into higher dimensional space

Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) • Model is an assembly of inter-connected nodes and weighted links • Output node sums up each of its input value according to the weights of its links • Compare output node against some threshold t Perceptron Model or

General Structure of ANN Training ANN means learning the weights of the neurons

Algorithm for learning ANN • Initialize the weights (w0, w1, …, wk) • Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples • Objective function: • Find the weights wi’s that minimize the above objective function • e.g., backpropagation algorithm

WEKA

WEKA implementations • WEKA has implementation of all the major data mining algorithms including: • decision trees (CART, C4.5 etc.) • naïve Bayes algorithm and all variants • nearest neighbor classifier • linear classifier • Support Vector Machine • clustering algorithms • boosting algorithms etc.

Support Vector Machines in Data Mining

Support Vector Machines in Data Mining

Presentation Transcript

CS 345A Data Mining Lecture 1

Data Mining: Classification

Data Mining

Data Mining

Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

CS 345 Data Mining Lecture 1

Data Mining for Malware Detection Lecture #2 May 27, 2011

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

DATA MINING LECTURE 10

DATA MINING LECTURE 10

DATA MINING LECTURE 5

DATA MINING LECTURE 11

DATA MINING LECTURE 10

Data Mining-Knowledge Presentation—ID3 algorithm

FINAL LECTURE

Clustering II

Lecture 11: Graph Data Mining

Data Mining Classification:

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Spatial and Temporal Data Mining

Data Mining with DB