1 / 48

Instance based learning

Instance based learning. 基于实例的学习. 上海交通大学. 摘自 T. Mitchell Machine learning. Topics. Introduction k-Nearest Neighbor Learning (kNN) Locally Weighted Regression(LWR) Radial Basis Function (RBF) IBL--NNet Case-Based Reasoning (CBR) Conclusion. IBL: Basic Idea. Key idea:

Download Presentation

Instance based learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instance based learning 基于实例的学习 上海交通大学 摘自 T. Mitchell Machine learning

  2. Topics • Introduction • k-Nearest Neighbor Learning (kNN) • Locally Weighted Regression(LWR) • Radial Basis Function (RBF) IBL--NNet • Case-Based Reasoning (CBR) • Conclusion

  3. IBL: Basic Idea • Key idea: • Store all training examples • When seeing a new instance: • Look at most similar stored instances • Make prediction based on those instances • E.g. k-nearest-neighbour: • Find kmost similar instances • Use most frequent class (classification) or mean target value (regression) as prediction • “Nearest neighbour” = 1-nearest-neighbour

  4. e.g. MVT (now part of Agilent) • components present or absent • solder joints good or bad • Machine Vision for inspection of PCBs

  5. Components present? Absent Present

  6. Characterise image as a set of features

  7. Properties of IBL • Advantages: • Learning is very fast • No information is lost • Disadvantages: • Slow at query time • Easily fooled by irrelevant attributes (the curse of dimensionality) • Good similarity measure necessary • Easiest in numeric space (Rn)

  8. + + - - + + + + + + - + + + + + + + + + - - - - - - - - - - - Keeping All Information • Advantage: no details lost • Disadvantage: "details" may be noise

  9. Lazy v’s Eager • D-Trees, Naïve Bayes and ANN are examples of Eager ML Algorithms • D-Tree is built in advance off-line • Less work to do at run-time • k-NN is a Lazy approach • Little work done off-line • keep training examples, • find k nearest at run time

  10. Lazy v’s Eager: Differences • Differences? • Eager learner creates global approximation • 1 theory has to work for all predictions • Lazy learner creates local approximations on demand • In a sense, many different theories used • With same hypothesis space H, lazy learner is in fact more expressive

  11. Classifying apples and pears To what class does this belong?

  12. A M J C G Consider a Loan Approval System • What does similar mean? Amount Monthly_Sal Job Category Credit Score Age Amount Monthly_Sal Job Category Credit Score Age

  13. Imagine just 2 features • 2 features • Amount • Monthly_Sal o o o o o Monthly_Sal o o o o x x x x x x x x Amount

  14. k-NN and Noise • 1-NN easy to implement • susceptible to noise • a misclassification every time a noisy pattern retrieved • k-NN with k  3 will overcome this

  15. k-Nearest Neighbor Learning • n-dimensional space Rn • Instance representation V—feature vector <a1(x ), a2(x ),… an(x ), > • Distance metric(Euclidean distance) • Target function (discrete valued/real-valued)

  16. k-Nearest Neighbor Learning • Training algorithm: • For each training example <x,f(x)>, add the example to the list training-examples • Classification algorithm: • Given a query instance xq to be classified, • Let x1…xk denote the k instances from training-examples that are nearest to xq • Return Where δ (a,b)=1 if a=b δ (a,b)=0 otherwise

  17. k-Nearest Neighbor D set of training samples Find k nearest neighbors to q according to this difference criterion For each x  D where (Heterogeneous Euclidean-Overlap Metric) Category of q decided by its k Nearest Neighbours

  18. Voronoi diagram

  19. Voronoi Diagrams • Indicate areas in which prediction influenced by same set of examples query point q nearest neighbor x

  20. 3-Nearest Neighbors query point q 2x,1o

  21. 7-Nearest Neighbors query point q 7 nearest neighbors 3x,4o

  22.  functions and feature influence: an example To what class does this belong?

  23. Rescaling Numeric Features • Features with different original scale should have equal importance • E.g. A1: [0, 1]; A2: [-10, +10] • Distance w.r.t. A1 always small, hence A1 has small influence • Solution: * divide the difference by the “feature range” (1 and 20 correspondingly) * …or by standard deviation

  24. Curse of Dimensionality • The curse of dimensionality: high-dimensional instance space (many features) has bad effect on learnability (and time!!!) • Especially bad for IBL: • Assume there are 20 attributes, of which only 2 relevant • Similarity w.r.t. 18 non-relevant attributes dominates similarity w.r.t. 2 relevant ones!

  25. Dimension reduction in k-NN q best features Feature Selection • Not all features required • noisy features a hindrance • Some examples redundant • retrieval time depends on no. of examples n covering examples m examples Case Selection (Prototyping) p features

  26. Condensed NN D set of training samples Find E where E D; NN rule used with E should be as good as D choose x  D randomly, D  D \ {x}, E  {x}, DO learning?  FALSE, FOR EACH x  D classify x by NN using E, if classification incorrect then E  E  {x}, D  D \ {x}, learning  TRUE, WHILE (learning?  FALSE)

  27. 100 examples 2 categories Different CNN solutions Condensed NN

  28. Improving Condensed NN • Different outcomes depending on data order • that’s a bad thing in an algorithm • identify exemplars near decision surface • in diagram • B more useful than A • it should be first A B

  29. CNN using NUN 100 examples 2 categories Different CNN solutions Condensed NN

  30. Distance-weighted kNN • Idea: give higher weight to closer instances • Can now use all training instances instead of only k : Shepard’s method

  31. Distance-Weighted kNN • Give greater weight to closer neighbors where

  32. Discussion on kNN • Highly effective inductive inference method • Robust to noisy training data • Quite effective when the training set is large enough • Inductive bias • Nearby instances • Irrelevant attributes solution: • To stretch the axes locally ---- may be overfitting, less common • To eliminate the least relevant attributes completely [Moore and Lee, 1994] • To index the instance lib eg. kd-tree [Bentley 1975]

  33. Y + + + + + + + + X New instance Locally weighted regression • Obvious problem with following kind of data: • Given new x, what value for y would you predict? • What will k-NN (e.g., 3-NN) predict?

  34. LWR: Building local models • Build local model in region around x • e.g. linear or quadratic model, ... • Minimizing • Squared error for k neighbours • Distance-weighted squared error for all neighbours • ...

  35. Locally Weighted Regression • Terms • REGRESSION means approximating a real-valued target function • RESIDUAL(残余)is the error in approximating the target function • KERNEL FUNCTION is the function of distance that is used to determine the WEIGHT of each training example. wi=K(d(xi,xq))

  36. Locally Weighted Regression • Generalization of kNN • The neighborhood surrounding xq • Using a linear function, a quadratic function, a multi-layer neural network,…

  37. Locally Weighted Regression • Criterion Squared error Gradient descent [Atkeson,1997 & Bishop,1995]

  38. Radial Basis Function Related to distance-weighted regression & artificial neural networks [Powell,1987; Broomhead & Lowe 1988; Moody & Darken 1989] Where xu is an intance from X Ku will decrease with d increases, and generally it is a Gaussian function

  39. This function can be used to describe a two-layer network

  40. Radial Basis Function • Other methods: • To allocate a Gaussian kernel function for each training example <xi,f(xi)>, …then combine them. • To choose a subset of training examples • Summarization on RBF • To provide a global approximation to the target function • Represented by a linear combination of many local kernel functions • To neglect the values out of defined region(region/width) • Can be trained more efficiently

  41. Case-Based Reasoning • Features: • Lazy learning method • Instances’ representations are more rich symbolic • Instance retrieve methods are correspondingly more elaborate • Application • Conceptual design of mechanical devices based on previous experience [Sycara, 1992] • Reasoning about new legal cases based on previous rulings[Ashley,1990] • Solving planning and scheduling problems by reusing and combining portions of previous solutions to the similar problems[Veloso,1992]

  42. The CADET System [sycara,1992]

  43. Reference to CADET • CADET is a Case-based Design Tool. CADET is a system that aids conceptual design of electro-mechanical devices and is based on the paradigm of Case-based Reasoning. • CADET consists of sub-systems that we call CARD (Case-based Retrieval for Design). http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/cadet/ftp/docs/CADET.html

  44. CBR vs. kNN • Rich symbolic/relational descriptions for instances may require similarity metric instead of Euclidean distance. • Multiple retrieved cases may be combined relying on CBR instead of statistical methods. • Tight coupling between case retrieval, knowledge-based reasoning, and problem solving.

  45. Lazy Learning vs. Eager Learning • When to generalize beyond the training data • Whether considered the new query instance when deciding how to generalize beyond the training data • Whether the methods have the option of selecting a different hypothesis or local approximation to the target function for each query instance

  46. Conclusion • IBL: a lazy learning method • kNN: a IBL, real/discrete-valued • LWR: a generalizion of kNN • RBF: a type of artificial neural network • CBR: a IBL, more complex symbolic descriptions for instance

  47. References • Tom M. Mitchell, Machine Learning, the MIT & McGraw-Hill Companies • 蔡自兴,人工智能及其应用,清华大学出版社 • 陆汝钤,人工智能,科学出版社 • 黄梯云,智能决策支持系统,电子工业出版社 • 计算机学报,2002.6 & 2002.8 • 中文信息学报,2002.3 • 冯是聪,博士生开题报告 • 程兆伟,硕士毕业论文 • 侯明强,学士毕业论文

More Related