1 / 19

Machine Learning

Machine Learning. Learning is “any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population” (Herbert Simon, 1983) Different types of AI learning models Inductive learning Explanation based learning

tallis
Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning • Learning is “any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population” (Herbert Simon, 1983) • Different types of AI learning models • Inductive learning • Explanation based learning • Supervised learning • Unsupervised learning • Parallel distributed processing (PDP) models • Neural networks

  2. An Inductive Learning Framework • Data and goals of learning task • Representation of learned knowledge • Specific instance of concept “ball” • size(obj1, small) Ù color(obj1,red) Ù shape(obj1, round) • General concept “ball” • size(X, Y) Ù color(X,Z) Ù shape(X, round) • Operations on data • Concept space • Combination of representations and operations • Heursitic search

  3. Version Space • Version space is a set of concept descriptions consistent with the training data • Generalization and specialization • shape(box, cube) → shape(X, cube) • size(X, large) Ù shape(X, round) → shape(X, round) • color(X, blue) Ù shape(X, cube) → color(X, blue) Ù (shape(X, cube) Ú shape(X, rectangle)) • “Theory” of Generalization • Given predicate sentences p and q, let P and Q be the set of all sentences that satisfy p and q, respectively. Expression p is more general that q iif. P Ê Q

  4. Concept Space Searches • Specific to general • Goal is to create a set S (hypotheses) of maximally specific generalizations • Maintain set NL of previously observed negative examples (initially empty) • Initialize S to first positive training instance in PG •  p : p Î PG • s : s Î S, if s ≠ p, let s = most specific generalization that matches p • Remove from S all hypotheses more general than other hypotheses in S • Remove from S all hypotheses that match a previously observed negative in NL •  n : n Î NG • Add n to NL • s : s Î S  s = n, remove s from S • General to specific • Goal is to create a set G of maximally general concepts • Maintain set PL of previously observed positive examples • Initialize G to the most general concepts •  n : n Î NG • g : g Î G, if g = n, let g = most general specializations that do not match n • Remove from G all hypotheses more specific than other hypotheses in G • Remove from G all hypotheses that fail to match a previously observed positive in PL •  p : p Î PG • Add p to PL • g : g Î G  g ≠ p, remove g from G

  5. Concept Space Search Examples X = {small, large} Y = {red, white, blue} Z = {ball, cube, brick} S: {} Positive: obj(small, red, ball) S: {obj(small, red, ball)} Positive: obj(small, white, ball) S: {obj(small, Y, ball)} Positive: obj(large, blue, ball) S: {obj(X, Y, ball)} Specific => General G {obj(X, Y, Z)} Negative: obj(small, red, brick) G: {obj:(large, Y, Z), obj(X, white, Z), obj(X, blue, Z), Positive: obj(large, white, ball) obj(X, Y, ball), obj(X, Y, cube)} G: {obj(large, Y, Z), obj(X, white, Z), obj(X, Y, ball)} Negative: obj(large, blue, cube) G: {obj(large, white, Z), obj(X, white, Z), obj(X, Y, ball)} Positive: obj(small, blue, ball) G: {obj(X, Y, ball)} General => Specific

  6. Version Space Convergence • Generalization and specialization lead to version space convergence • Specialization of general models • Generalization of specific models • Candidate Elimination Algorithm • Bi-directional search - combines previous two search techniques • If G = S and |S| = |G| = 1, then a single goal concept has been found • Otherwise, there is no single concept that covers all positive instances and none of the negative instances http://www2.cs.uregina.ca/~hamilton/courses/831/notes/ml/vspace/3_vspace.html

  7. Candidate Elimination Algorithm G: {obj(X, Y, Z)} Positive: obj(small, red, ball) S: {} G: {obj(X, Y, Z)} Negative: obj(small, blue, brick) S: {obj(small, red, ball)} G: {obj(X, red, Z), obj(X, Y, ball)} Positive: obj(large, red, ball) S: {obj(small, red, ball)} G: {obj(X, red, Z), obj(X, Y, ball)} Negative: obj(large, red, cube) S: {obj(X, red, ball)} G: {obj(X, red, ball)} S: {obj(X, red, ball)}

  8. Explanation-Based Learning • Explanation-Based Algorithm • Target concept • Agent must find an effective definition of this concept • Training example • Domain theory (premise(X) → conclusion(X)) • liftable(X)  container(X) → cup(X) • part(Z, W)  concave(W)  points_up(W) → container(Z) • light(Y)  part(Y, handle) → liftable(Y) • small(A) → light(A) • made_of(Z, feathers) → light(A) • Operational criteria • Means of describing the form of concept definitions

  9. Explanation-Based Learning Example Specific and Generalized Proof Trees cup(obj1) cup(obj1) small(obj1) part(obj1, handle) owns(bob, obj1) part(obj1, bottom) part(obj1, bowl) points_up(bowl) concave(bowl) color(obj1, red) liftable(obj1) container(obj1) light(obj1) part(obj1, handle) part(obj1, bowl) points_up(bowl) concave(bowl) small(obj1) cup(X) liftable(X) container(X) light(obj1) part(X, handle) part(X, W) points_up(W) concave(W) small(X)

  10. Benefits of Explanation-Based Learning • Domain theory allows the learner to select relevant aspects of the training instance • Ignores irrelevant aspects such as color of the cup • EBL forms generalizations that we know to be relevant to specific goals and consistent with the domain theory • Many instances may allow numerous possible generalizations that are either meaningless or wrong • Allows the learner to learn from a single instance • Allows the learner to hypothesize unstated relationships between goals and experiences

  11. Unsupervised Learning • Supervised learning assumes the existence of an external method to correctly classify training data • Unsupervised learning require that the learner evaluate concepts on its own • AM is an early example of a discovery program • Discovered natural numbers by modifying its notion of “bags,” or multisets. • Figured out addition, multiplication, division, and prime numbers by evaluating “interesting” concepts • Failed beyond rudimentary number theory – space grew combinatorially and percentage of interesting concepts diminished

  12. Concept Clustering • The goal of the clustering problem is to organize a collection of objects into some hierarchy of classes that meet some standard of quality • Necessary to have some means of measuring similarity between objects • Numeric taxonomy represents objects as collections of features and assigns numeric values to these features • Similarity metric treats object as a point in n-dimensional space, where n is the number of features. Similarity between two objects is thus the Euclidean distance between them

  13. Agglomerative Clustering • Bottom-up approach to the clustering problem • Examine all pairs of objects, and make the pair with the highest degree of similarity a cluster • Define the features of this cluster as some function (i.e. average) of the features of the members and replace the component objects with this definition • Repeat until all objects have been reduced to a single cluster • Difficult to compare objects defined using symbolic features rather than numeric features • Define similarity as a proportion of common features • However does not adequately take into account underlying semantic knowledge or take into account goals or background knowledge • Traditional algorithms are extensional, enumerate all members. No intensional definition to classify both known and future members

  14. Parallel Distributed Processing (PDP) • Also known as subsymbolic approaches/models • View intelligence as the behavior of a collection of large numbers of simple interacting components • Symbolic systems suffer from brittleness • Encompasses phenomena related to the nature of a two-value system • Human performance degrades as a problem gets harder, but almost always generates some answer; expert systems either perform perfectly or not at all • Neural networks • Connectionalist approach inspired by biological brains

  15. Neural Networks • Neurons • Inputs values: xi • Usually discrete, real values, or within {0, 1} or {-1, 1} • Weights: wi • Usually real valued • Activation values • Output of the neuron • Activation function: F • Neural network • Network topology • Learning algorithm • Environment

  16. Simple Neural Networks • McCulloch-Pitts Neuron (McCulloch and Pitts, 1943) • Inputs are either +1 or -1, activation function multiplies each input by its weight as sums the result • If the sum is greater than or equal to 0, the output is 1. Otherwise the output is 0 Output X  Y X  Y ⌐X Weight +1 +1 -2 +1 +1 -1 -1 Inputs X Y +1 X Y +1 X

  17. Perceptrons • Devised by Frank Rosenblatt in the late 1950s • A single-layer network where all inputs and activation values are either 0 or 1, and the weights are real valued • Activation function is a simple linear threshold • 1 if ∑ xiwi > t • 0 otherwise • Supervised learning, perceptron changes weights based on correct results • If output is correct, do nothing • If output is 0 and should be 1, increment weights on the active lines (input of 1) by some amount d. • If output is 1 and should be 0, decrement weights on the active lines by some amount d.

  18. Limits of Perceptrons • Single-layer networks are only capable of learning classes that are linearly separable • For example, exclusive-or is not linearly separable, and thus cannot be represented by a perceptron • For any n-dimensional space, a classification is linearly separable if these groups can be separated with a single n-1 dimensional hyperplane Y 1 X xor Y = 0 X xor Y = 1 X 0 1

  19. Modern Machine Learning Topics • Asymptotic Model Selection for Naive Bayesian Networks • Dimension Reduction in Text Classification with Support Vector Machines • Stability of Randomized Learning Algorithms • Diffusion Kernels on Statistical Manifolds • Multiclass Boosting for Weak Classifiers • Denoising Source Separation • Learning with Decision Lists of Data-Dependent Features • Generalization Bounds and Complexities Based on Sparsity and Clustering for Convex Combinations of Functions from Random Classes • Characteristics of a Family of Algorithms for Generalized Discriminant Analysis of Undersampled Problems Journal of Machine Learning Research, http://jmlr.csail.mit.edu

More Related