Instance Based Learning

Instance Based Learning

Nearest Neighbor • Remember all your data • When someone asks a question • Find the nearest old data point • Return the answer associated with it • In order to say what point is nearest, we have to define what we mean by "near". • Typically, we use Euclidean distance between two points. Nominal attributes: distance is set to 1 if values are different, 0 if they are equal

Predicting Bankruptcy

Predicting Bankruptcy • Now, let's say we have a new person with R equal to 0.3 and L equal to 2. • What y value should we predict? And so our answer would be "no".

Scaling • The naïve Euclidean distance isn't always appropriate. • Consider the case where we have two features describing a car. • f1 = weight in pounds • f2 = number of cylinders. • Any effect of f2will be completely lost because of the relative scales. • So, rescale the inputs to put all of the features on about equal footing:

Time and Space • Learning is fast • We just have to remember the training data. • Space is n. • What takes longer is answering a query. • If we do it naively, we have to, for each point in our training set (and there are n of them) compute the distance to the query point (which takes about m computations, since there are m features to compare). • So, overall, this takes about m * n time.

Noise Someone with an apparently healthy financial record goes bankrupt.

Remedy: K-Nearest Neighbors • k-nearest neighbor algorithm: • Just like the old algorithm, except that when we get a query, we'll search for the k closest points to the query points. • Output what the majority says. • In this case, we've chosen k to be 3. • The three closest points consist of two "no"s and a "yes", so our answer would be "no". Find the optimal k using cross-validation

Other Variants • IB2: save memory, speed up classification • Work incrementally • Only incorporate misclassified instances • Problem: noisy data gets incorporated • IB3: deal with noise • Discard instances that don’t perform well • Keep a record of the number of correct and incorrect classification decisions that each exemplar makes. • Two predetermined thresholds are set on success ratio. • If the performance of exemplar falls below the low threshold it is deleted. • If the performance exceeds the upper threshold it is used for prediction.

Instance-based learning: IB2 • IB2: save memory, speed up classification • Work incrementally • Only incorporate misclassified instances • Problem: noisy data gets incorporated Data: “Who buys gold jewelry” (25,60,no) (45,60,no) (50,75,no) (50,100,no) (50,120,no) (70,110,yes) (85,140,yes) (30,260,yes) (25,400,yes) (45,350,yes) (50,275,yes) (60,260,yes)

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes) This is the final answer. I.e. we memorize only these 5 points. However, let’s compute gradually the classifier.

Instance-based learning: IB2 • Data: • (25,60,no)

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) Since so far the model has only the first instance memorized, this second instance gets wrongly classified. So, we memorize it as well.

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) So far the model has the two first instances memorized. The third instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) So far the model has the two first instances memorized. The fourth instance gets properly classified, since it happens to be closer with the second. So, we don’t memorize it.

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) So far the model has the two first instances memorized. The fifth instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.

Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) So far the model has the two first instances memorized. The sixth instance gets wrongly classified, since it happens to be closer with the second. So, we memorize it.

Instance-based learning: IB2 • Continuing in a similar way, we finally get, the figure in the right. • The colored points are the one that get memorized. This is the final answer. I.e. we memorize only these 5 points.

Instance-based learning: IB3 • IB3: deal with noise • Discard instances that don’t perform well • Keep a record of the number of correct and incorrect classification decisions that each exemplar makes. • Two predetermined thresholds are set on success ratio. • An instance is used for training: • If the number of incorrect classifications is  the first threshold, and • If the number of correct classifications  the second threshold.

Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)

Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) [1,1] • (85,140,yes) [1,1] • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (70,110,yes) [0,0] • (25,400,yes) [0,1] • (50,100,no) [0,0] • (45,350,yes) [0,0] • (50,275,yes) [0,1] • (60,260,yes) [0,0]

Instance-based learning: IB3 • The points that will be used in classification are: • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (25,400,yes) [0,1] • (50,275,yes) [0,1]

Rectangular generalizations • When a new exemplar is classified correctly, it is generalized by simply merging it with the nearest exemplar. • The nearest exemplar may be either a single instance or a hyper-rectangle.

Rectangular generalizations • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)

Classification • If the new instance lies within a rectangle then output the rectangle class • If the new instance lies in the overlap of several rectangles, then output the class of the rectangle whose center is the closest to the new data instance. • If the new instance lies outside any of the rectangles, output the class of the rectangle, which is the closest to the data instance. • The distance of a point from a rectangle is: • If an instance lies within rectangle, d=0 • If outside, d = distance from the closest rectangle part, i.e. distance from some point in the rectangle boundary. Class 1 Class 2 Separation line

Instance Based Learning