1 / 16

Instance Based Learning

Instance Based Learning. IB1 and IBK Find in text Early approach. 1- Nearest Neighbor. Basic distance function between attribute-values If real, the absolute value If nominal, d(v1,v2) = 1 if v1 =v2, else 0.

alfonzoc
Download Presentation

Instance Based Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instance Based Learning IB1 and IBK Find in text Early approach

  2. 1- Nearest Neighbor • Basic distance function between attribute-values • If real, the absolute value • If nominal, d(v1,v2) = 1 if v1 \=v2, else 0. • Distance between 2 instances is square root of sum of squares, i.e. euclidean distance • Square root of sum of squares • Sqrt( (x1-y1)^2 +….(xn-yn)^2 ) • May normalize real-value distances for fairness amongst attributes.

  3. Prediction or classification • For instance x, let y be closest instance to x in training set. • Predict class x is the class of y. • On some data sets, best algorithm. • In general, no best learning algorithm.

  4. Voronoi Diagram

  5. Voronoi Diagram • For each point, draw the boundary of all points closest to it. • Each point’s sphere of influence in convex. • If noisy, can be bad. • http://www.cs.cornell.edu/Info/People/chew/Delaunay.html - nice applet.

  6. Problems and solutions • Noise • Remove bad examples • Use voting • Bad distance measure • Use probability class vector • Memory • Remove unneeded examples

  7. Voting schemes • K nearest neighbor • Let all the closest k neighbors vote (use k odd) • Kernel K(x,y) – a similarity function • Let everyone vote, with decreasing weight according to K(x,y) • Ex: K(x,y) = e^(-distance(x,y)^2) • Ex. K(x,y) = inner product of x and y • Ex K(x,y) = inner product of f(x) and f(y) where f is some mapping of x and y into R^n.

  8. Choosing the parameter K • Divide data into train and test • Run multiple values of k on train • Choose k that does best on test.

  9. NOT • This is a serious methological error • You have used test data to pick the k. • Common in commercial evaluation of systems • Occasional in academic papers

  10. Fix: Internal Cross-validation • This can be used for selecting any parameter. • Divide Data into Train and Test. • Now do 10-fold CV on the training data to determine the appropriate value of k. • Note: never touch the test data.

  11. Probability Class Vector • Let A be an attribute with values v1, v2,..vn • Suppose class C1,C2,..Ck • Prob Class Vector for vi is: <P(C1|A=vi),P(C2|A=vi),..P(Ck|A=vi)> • Distance(vi,vj) = distance between probability class vectors.

  12. Weather data @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no

  13. Distance(sunny,rainy) =? <P(play = yes|sunny), P(play=no|sunny)> = < 2/5, 3/5> = prob class vector for sunny <P(play = yes|rainy), P(play= no|rainy)> = <3/5,2/5> Distance(sunny,rainy) = 1/5*sqrt(2). Similarly: distance(sunny,overcast) = d(<2/5,3/5>, <4/4,0/4>) = 2/5*sqrt(2)

  14. PCV • If an attribute is irrelevant and v and v’ are values, then PCV(v) ~ PCV(v’) so the distance will be close to 0. • This discounts irrelevant attributes. • It also works for real-attributes, after binning. • Binning is a way to make real-values symbolic. Simple break data into k bins, eg. K = 5 or 10 seems to work. Or use DTs.

  15. Regression by NN • If 1-NN, use value of nearest example • If k-nn, interpolate values of k nearest neighbors. • Kernel methods work to. You avoid choice of k, but hide it in choice of kernel function.

  16. Summary • NN works for multi-class and regression. • Sometimes called “poor man’s neural net’’ • With enough data, it achieves ½ the “bayes optimal” error rate. • Mislead by bad examples and bad features. • Separates classes via piecewise linear boundaries.

More Related