Outline

Outline • K-Nearest Neighbor algorithm • Fuzzy Set theory • Classifier Accuracy Measures

Eager Learners: when given a set of training tuples, will construct a generalization model before receiving new tuples to classify Classification by decision tree induction Rule-based classification Classification by back propagation Support Vector Machines (SVM) Associative classification Chapter 6. Classification and Prediction

Lazy vs. Eager Learning • Lazy vs. eager learning • Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple • Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify • Lazy: less time in training but more time in predicting

Lazy Learner: Instance-Based Methods • Typical approaches • k-nearest neighbor approach • Instances represented as points in a Euclidean space.

The k-Nearest Neighbor Algorithm • All instances correspond to points in the n-D space • The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2) • Target function could be discrete- or real- valued • For discrete-valued, k-NN returns the most common value among the k training examples nearest to xq _ _ _ . _ + + _ + xq _ +

The k-Nearest Neighbor Algorithm • k-NN for real-valued prediction for a given unknown tuple • Returns the mean values of the k nearest neighbors • Distance-weighted nearest neighbor algorithm • Weight the contribution of each of the k neighbors according to their distance to the query xq • Give greater weight to closer neighbors • Robust to noisy data by averaging k-nearest neighbors

The k-Nearest Neighbor Algorithm • How can I determine the value of k, the number of neighbors? • In general, the larger the number of training tuples is, the larger the value of k is • Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n) • By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)

The k-Nearest Neighbor Algorithm • Example: K=5

Fuzzy Set Approaches • Rule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributes • For example: IF (years_employed>2) AND (income>50K) THEN credit_card=approved What if a customer has 10 years employed and income is 49K?

Fuzzy Set Approaches • Instead, we can discretize income into categories such as {low,medium,high}, and then apply fuzzy logic to allow “fuzzy” threshold for each category

Fuzzy Set Approaches • Fuzzy theory is also known as possibility theory, it was proposed by Lotif Zadeh in 1965 • Unlike the notion of traditional “crisp” sets where an element either belongs to a set S, in fuzzy theory, elements can belong to more than one fuzzy set

Fuzzy Set Approaches • For example, the income value $49K belongs to both the medium and high fuzzy sets: Mmedium($49K)=0.15 and Mhigh($49K)=0.96

Fuzzy Set Approaches Another example for temperature

Fuzzy Set Applications • http://www.dementia.org/~julied/logic/applications.html

Classifier Accuracy Measures

Classifier Accuracy Measures • Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos specificity = t-neg/neg precision = t-pos/(t-pos + f-pos) accuracy =

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: