1 / 56

CS 446: Machine Learning

INTRODUCTION. CS446-Fall 06. 2. . Office hours: after most classes and Thur @ 3Text: Mitchell's Machine LearningMidterm:Oct. 4Final:Dec. 12each a thirdHomeworks / projectsSubmit at the beginning of classLate penalty: 20% / day up to 3 daysProgramming, some in-class assignmentsClass web site soonCheating: none allowed! We adopt dept. policy.

betty_james
Download Presentation

CS 446: Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. CS 446: Machine Learning Gerald DeJong mrebl@.uiuc.edu 3-0491 3320 SC Recent approval for a TA to be named later

    2. INTRODUCTION CS446-Fall 06 2 Office hours: after most classes and Thur @ 3 Text: Mitchells Machine Learning Midterm: Oct. 4 Final: Dec. 12 each a third Homeworks / projects Submit at the beginning of class Late penalty: 20% / day up to 3 days Programming, some in-class assignments Class web site soon Cheating: none allowed! We adopt dept. policy

    3. INTRODUCTION CS446-Fall 06 3 Please answer these and hand in now Name Department Where (If?*) you had Intro AI course Who taught it (esp. if not here) 1) Why interested in Machine Learning? 2) Any topics you would like to see covered? * may require significant additional effort

    4. INTRODUCTION CS446-Fall 06 4 Approx. Course Overview / Topics Introduction: Basic problems and questions A detailed examples: Linear threshold units Basic Paradigms: PAC (Risk Minimization); Bayesian Theory; SRM (Structural Risk Minimization); Compression; Maximum Entropy; Generative/Discriminative; Classification/Skill; Learning Protocols Online/Batch; Supervised/Unsupervised/Semi-supervised; Delayed supervision Algorithms: Decision Trees (C4.5) [Rules and ILP (Ripper, Foil)] Linear Threshold Units (Winnow, Perceptron; Boosting; SVMs; Kernels) Probabilistic Representations (nave Bayes, Bayesian trees; density estimation) Delayed supervision: RL Unsupervised/Semi-supervised: EM Clustering, Dimensionality Reduction, or others of student interest As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    5. INTRODUCTION CS446-Fall 06 5 What to Learn Classifiers: Learn a hidden function Concept Learning: chair ? face ? game ? Diagnosis: medical; risk assessment Models: Learn a map (and use it to navigate) Learn a distribution (and use it to answer queries) Learn a language model; Learn an Automaton Skills: Learn to play games; Learn a Plan / Policy Learn to Reason; Learn to Plan Clusterings: Shapes of objects; Functionality; Segmentation Abstraction Focus on classification (importance, theoretical richness, generality,) As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    6. INTRODUCTION CS446-Fall 06 6 What to Learn? Direct Learning: (discriminative, model-free[bad name]) Learn a function that maps an input instance to the sought after property. Model Learning: (indirect, generative) Learning a model of the domain; then use it to answer various questions about the domain In both cases, several protocols can be used Supervised learner is given examples and answers Unsupervised examples, but no answers Semi-supervised some examples w/answers, others w/o Delayed supervision As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    7. INTRODUCTION CS446-Fall 06 7 Supervised Learning Given: Examples (x,f (x)) of some unknown function f Find: A good approximation to f x provides some representation of the input The process of mapping a domain element into a representation is called Feature Extraction. (Hard; ill-understood; important) x 2 {0,1}n or x 2 <n The target function (label) f(x) 2 {-1,+1} Binary Classification f(x) 2 {1,2,3,.,k-1} Multi-class classification f(x) 2 < Regression Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    8. INTRODUCTION CS446-Fall 06 8 Example and Hypothesis Spaces

    9. INTRODUCTION CS446-Fall 06 9 Supervised Learning: Examples Disease diagnosis x: Properties of patient (symptoms, lab tests) f : Disease (or maybe: recommended therapy) Part-of-Speech tagging x: An English sentence (e.g., The can will rust) f : The part of speech of a word in the sentence Face recognition x: Bitmap picture of persons face f : Name the person (or maybe: a property of) Automatic Steering x: Bitmap picture of road surface in front of car f : Degrees to turn the steering wheel Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    10. INTRODUCTION CS446-Fall 06 10

    11. INTRODUCTION CS446-Fall 06 11

    12. INTRODUCTION CS446-Fall 06 12 Hypothesis Space Complete Ignorance: How many possible functions? 216 = 56536 over four input features. After seven examples how many possibilities for f? 29 possibilities remain for f How many examples until we figure out which is correct? We need to see labels for all 16 examples! Is Learning Possible? Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    13. INTRODUCTION CS446-Fall 06 13 Another Hypothesis Space Simple Rules: There are only 16 simple conjunctive rules of the form y=xi xj xk... No simple rule explains the data. The same is true for simple clauses Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    14. INTRODUCTION CS446-Fall 06 14 Third Hypothesis Space m-of-n rules: There are 29 possible rules of the form y = 1 if and only if at least m of the following n variables are 1 Found a consistent hypothesis. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    15. INTRODUCTION CS446-Fall 06 15 Views of Learning Learning is the removal of our remaining uncertainty: Suppose we knew that the unknown function was an m-of-n Boolean function, then we could use the training data to infer which function it is. Learning requires guessing a good, small hypothesis class: We can start with a very small class and enlarge it until it contains an hypothesis that fits the data. We could be wrong ! Our prior knowledge might be wrong: y=x4 ? one-of (x1, x3) is also consistent Our guess of the hypothesis class could be wrong If this is the unknown function, then we will make errors when we are given new examples, and are asked to predict the value of the function Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    16. INTRODUCTION CS446-Fall 06 16 General strategy for Machine Learning H should respect our prior understanding: Excess expressivity makes learning difficult Expressivity of H should match our ignorance Understand flexibility of std. hypothesis spaces: Decision trees, neural networks, rule grammars, stochastic models Hypothesis spaces of flexible size; Nested collections of hypotheses. ML succeeds when these interrelate Develop algorithms for finding a hypothesis h that fits the data h will likely perform well when the richness of H is less than the information in the training set Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    17. INTRODUCTION CS446-Fall 06 17 Terminology Training example: An pair of the form (x, f (x)) Target function (concept): The true function f (?) Hypothesis: A proposed function h, believed to be similar to f. Concept: Boolean function. Example for which f (x)= 1 are positive examples; those for which f (x)= 0 are negative examples (instances) (sometimes used interchangeably w/ Hypothesis) Classifier: A discrete valued function. The possible value of f: {1,2,K} are the classes or class labels. Hypothesis space: The space of all hypotheses that can, in principle, be output by the learning algorithm. Version Space: The space of all hypothesis in the hypothesis space that have not yet been ruled out. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    18. INTRODUCTION CS446-Fall 06 18 Key Issues in Machine Learning Modeling How to formulate application problems as machine learning problems ? Learning Protocols (where is the data coming from, how?) Project examples: [complete products] EMAIL Given a seminar announcement, place the relevant information in my outlook Given a message, place it in the appropriate folder Image processing: Given a folder with pictures; automatically rotate all those that need it. My office: have my office greet me in the morning and unlock the door (but do it only for me!) Context Sensitive Spelling: Incorporate into Word Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    19. INTRODUCTION CS446-Fall 06 19 Key Issues in Machine Learning Modeling How to formulate application problems as machine learning problems ? Learning Protocols (where is the data coming from, how?) Representation: What are good hypothesis spaces ? Any rigorous way to find these? Any general approach? Algorithms: What are good algorithms? How do we define success? Generalization Vs. over fitting The computational problem Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    20. INTRODUCTION CS446-Fall 06 20 Example: Generalization vs Overfitting What is a Tree ? A botanist Her brother A tree is something with A tree is a green thing leaves Ive seen before Neither will generalize well Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    21. INTRODUCTION CS446-Fall 06 21 Self-organize into Groups of 4 or 5 Assignment 1 The Badges Game Prediction or Modeling? Representation Background Knowledge When did learning take place? Learning Protocol? What is the problem? Algorithms Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    22. INTRODUCTION CS446-Fall 06 22 Linear Discriminators I dont know {whether, weather} to laugh or cry How can we make this a learning problem? We will look for a function F: Sentences? {whether, weather} We need to define the domain of this function better. An option: For each word w in English define a Boolean feature xw : [xw =1] iff w is in the sentence This maps a sentence to a point in {0,1}50,000 In this space: some points are whether points some are weather points As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    23. INTRODUCTION CS446-Fall 06 23 Whats Good? Learning problem: Find a function that best separates the data What function? Whats best? How to find it? A possibility: Define the learning problem to be: Find a (linear) function that best separates the data As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    24. INTRODUCTION CS446-Fall 06 24 Exclusive-OR (XOR) (x1 x2) (:{x1} :{x2}) In general: a parity function. xi 2 {0,1} f(x1, x2,, xn) = 1 iff ? xi is even This function is not linearly separable. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    25. INTRODUCTION CS446-Fall 06 25 Sometimes Functions Can be Made Linear x1 x2 x4 x2 x4 x5 x1 x3 x7 Space: X= x1, x2,, xn input Transformation New Space: Y = {y1,y2,} = {xi,xi xj, xi xj xj} As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    26. INTRODUCTION CS446-Fall 06 26 Data are not separable in one dimension Not separable if you insist on using a specific class of functions Feature Space

    27. INTRODUCTION CS446-Fall 06 27 Blown Up Feature Space Data are separable in <x, x2> space

    28. INTRODUCTION CS446-Fall 06 28 A General Framework for Learning Goal: predict an unobserved output value y 2 Y based on an observed input vector x 2 X Estimate a functional relationship y~f(x) from a set {(x,y)i}i=1,n Most relevant - Classification: y ? {0,1} (or y ? {1,2,k} ) (But, within the same framework can also talk about Regression, y 2 < What do we want f(x) to satisfy? We want to minimize the Loss (Risk): L(f()) = E X,Y( [f(x)?y] ) Where: E X,Y denotes the expectation with respect to the true distribution. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    29. INTRODUCTION CS446-Fall 06 29 A General Framework for Learning (II) We want to minimize the Loss: L(f()) = E X,Y( [f(X)?Y] ) Where: E X,Y denotes the expectation with respect to the true distribution. We cannot do that. Why not? Instead, we try to minimize the empirical classification error. For a set of training examples {(Xi,Yi)}i=1,n Try to minimize the observed loss (Issue I: when is this good enough? Not now) This minimization problem is typically NP hard. To alleviate this computational problem, minimize a new function a convex upper bound of the classification error function I(f(x),y) =[f(x) ?y] = {1 when f(x)?y; 0 otherwise} As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    30. INTRODUCTION CS446-Fall 06 30 Learning as an Optimization Problem A Loss Function L(f(x),y) measures the penalty incurred by a classifier f on example (x,y). There are many different loss functions one could define: Misclassification Error: L(f(x),y) = 0 if f(x) = y; 1 otherwise Squared Loss: L(f(x),y) = (f(x) y)2 Input dependent loss: L(f(x),y) = 0 if f(x)= y; c(x)otherwise. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.

    31. INTRODUCTION CS446-Fall 06 31 How to Learn? Local search: Start with a linear threshold function. See how well you are doing. Correct Repeat until you converge. There are other ways that do not search directly in the hypotheses space Directly compute the hypothesis?

    32. INTRODUCTION CS446-Fall 06 32 Learning Linear Separators (LTU) f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? } x= (x1 ,x2, ,xn) 2 {0,1}n is the feature based encoding of the data point w= (w1 ,w2, ,wn) 2 <n is the target function. ? determines the shift with respect to the origin

    33. INTRODUCTION CS446-Fall 06 33 Expressivity f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? } Many functions are Linear Conjunctions: y = x1 x3 x5 y = sgn{1 x1 + 1 x3 + 1 x5 - 3} At least m of n: y = at least 2 of {x1 ,x3, x5 } y = sgn{1 x1 + 1 x3 + 1 x5 - 2} Many functions are not Xor: y = x1 x2 x1 x2 Non trivial DNF: y = x1 x2 x3 x4 But some can be made linear

    34. INTRODUCTION CS446-Fall 06 34 Canonical Representation f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? } sgn {x w - ?} sgn {x w} Where: x = (x, -?) and w = (w,1) Moved from an n dimensional representation to an (n+1) dimensional representation, but now can look for hyperplans that go through the origin.

    35. INTRODUCTION CS446-Fall 06 35 LMS: An online, local search algorithm A local search learning algorithm requires: Hypothesis Space: Linear Threshold Units Loss function: Squared loss LMS (Least Mean Square, L2) Search procedure: Gradient Descent

    36. INTRODUCTION CS446-Fall 06 36 Good treatment in Bishop, Chp 3 Classic Weiner filtering solution; text omits 0.5 factor; In any case we use the gradient and eta (text) or R (these notes) to modulate the step sizeGood treatment in Bishop, Chp 3 Classic Weiner filtering solution; text omits 0.5 factor; In any case we use the gradient and eta (text) or R (these notes) to modulate the step size

    37. INTRODUCTION CS446-Fall 06 37 Gradient Descent We use gradient descent to determine the weight vector that minimizes Err (w) ; Fixing the set D of examples, E is a function of wj At each step, the weight vector is modified in the direction that produces the steepest descent along the error surface.

    38. INTRODUCTION CS446-Fall 06 38

    39. INTRODUCTION CS446-Fall 06 39

    40. INTRODUCTION CS446-Fall 06 40

    41. INTRODUCTION CS446-Fall 06 41

    42. INTRODUCTION CS446-Fall 06 42

    43. INTRODUCTION CS446-Fall 06 43

    44. INTRODUCTION CS446-Fall 06 44

    45. INTRODUCTION CS446-Fall 06 45

    46. INTRODUCTION CS446-Fall 06 46

    47. INTRODUCTION CS446-Fall 06 47

    48. INTRODUCTION CS446-Fall 06 48

    49. INTRODUCTION CS446-Fall 06 49 Fisher Linear Discriminant This is a classical method for discriminant analysis. It is based on dimensionality reduction finding a better representation for the data. Notice that just finding good representations for the data may not always be good for discrimination. [E.g., O, Q] Intuition: Consider projecting data from d dimensions to the line. Likely results in a mixed set of points and poor separation. However, by moving the line around we might be able to find an orientation for which the projected samples are well separated. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    50. INTRODUCTION CS446-Fall 06 50 Fisher Linear Discriminant Sample S= {x1, x2, xn } 2 <d P, N are the positive, negative examples, resp. Let w 2 <d. And assume ||w||=1. Then: The projection of a vector x on a line in the direction w, is wt x. If the data is linearly separable, there exists a good direction w. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    51. INTRODUCTION CS446-Fall 06 51 Finding a Good Direction Sample mean (positive, P; Negative, N): Mp = 1/|P| ?P xi The mean of the projected (positive, negative) points mp = 1/|P| ?P wt xi= 1/|P| ?P yi = wt Mp Is simply the projection of the sample mean. Therefore, the distance between the projected means is: |mp - mN|= |wt (Mp- MN )| Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    52. INTRODUCTION CS446-Fall 06 52 Finding a Good Direction (2) Scaling w isnt the solution. We want the difference to be large relative to some measure of standard deviation for each class. S2p = ?P (y-mp )2 s2N = ?N (y-mN )2 1/ ( S2p + s2N ) within class scatter: estimates the variances of the sample. The Fischer linear discriminant employs the linear function wt x for which J(w) = | mP mN|2 / S2p + s2N is maximized. How to make this a classifier? How to find the optimal w? Some Algebra Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    53. INTRODUCTION CS446-Fall 06 53 J as an explicit function of w (1) Compute the scatter matrices: Sp = ?P (x-Mp )(x-Mp )t SN = ?N (x-MN )(x-MN )t and SW = Sp + Sp We can write: S2p = ?P (y-mp )2 = ?P (wt x -wt Mp )2 = = ?P wt (x- Mp ) (x- Mp )t w = wt Sp w Therefore: S2p + S2N = wt SW w SW is the within-class scatter matrix. It is proportional to the sample covariance matrix for the d-dimensional sample. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    54. INTRODUCTION CS446-Fall 06 54 J as an explicit function of w (2) We can do a similar computation for the means: SB = (MP-MN )(MP-MN )t and we can write: (mP-mN )2 = (wt MP-wt MN )2 = = wt (MP-MN) (MP-MN) t w = wt SB w Therefore: SB is the between-class scatter matrix. It is the outer product of two vectors and therefore its rank is at most 1. SB w is always in the direction of (MP-MN ) Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    55. INTRODUCTION CS446-Fall 06 55 J as an explicit function of w (3) Now we can compute explicitly: We can do a similar computation for the means: J(w) = | mP mN|2 / S2p + s2N = wt SB w / wt SW w We are looking for a the value of w that maximizes this expression. This is a generalized eigenvalue problem; when SW is nonsingular, it is just a eigenvalue problem. The solution can be written without solving the problem, as: w=S-1W (MP-MN ) This is the Fisher Linear Discriminant. 1: We converted a d-dimensional problem to a 1-dimensional problem and suggested a solution that makes some sense. 2: We have a solution that makes sense; how to make it a classifier? And, how good it is? Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    56. INTRODUCTION CS446-Fall 06 56 Fisher Linear Discriminant - Summary It turns out that both problems can be solved if we make assumptions. E.g., if the data consists of two classes of points, generated according to a normal distribution, with the same covariance. Then: The solution is optimal. Classification can be done by choosing a threshold, which can be computed. Is this satisfactory? Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

    57. INTRODUCTION CS446-Fall 06 57 Introduction - Summary We introduced the technical part of the class by giving two examples for (very different) approaches to linear discrimination. There are many other solutions. Questions 1: But this assumes that we are linear. Can we learn a function that is more flexible in terms of what it does with the features space? Question 2: Can we say something about the quality of what we learn (sample complexity, time complexity; quality) Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it. Badges game Dont give me the answer Start thinking about how to write a program that will figure out whether my name has + or next to it.

More Related