1 / 10

ICS 178 Introduction Machine Learning & data Mining

ICS 178 Introduction Machine Learning & data Mining. Instructor max Welling Lecture 6: Logistic Regression. Logistic Regression. This is also regression but with targets Y=(0,1). I.e. it is classification! We will fit a regression function on P(Y=1|X). linear regression.

nbryant
Download Presentation

ICS 178 Introduction Machine Learning & data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS 178Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression

  2. Logistic Regression • This is also regression but with targets Y=(0,1). I.e. it is classification! • We will fit a regression function on P(Y=1|X) linear regression logistic regression

  3. Sigmoid function f(x) data-points with Y=1 data-points with Y=0

  4. In 2 Dimensions A,b determine 1) orientation 2) thickness (margin) 3) offset of decision surface sigmoid f(x)

  5. Cost Function • We want a different error measure that is better suited for 0/1 data. • This can be derived from maximizing the probability of the data again.

  6. Learning A,b • Again, we take the derivatives of the Error wrt to the parameters. • This time however, we can’t solve them analytically, so we use gradient descent.

  7. Gradients for Logistic Regression • After the math (on the white-board) we find: Note: first term in each eqn. (multiplied by Y) only sums over data with Y=1, while second term (multiplied by (1-Y) only sums over data with Y=0. Follow the gradient until the change in A,b falls below a small theshold (e.g. 1E-6).

  8. Classification • Once we have found the optimal values for A,b we classify future data with: • Least squares and Logistic regression are parametric methods since • all the information in the data is stored in the parameters A,b, i.e. • after learning you can toss out the data. • Also, the decision surface is always linear, its complexity does not grow • with the amount of data. • We have imposed our prior knowledge that the decision surface should be linear.

  9. A Real Example collaboration with S. Cole) • Fingerprints are matched against a data-base. • Each match is scored. • Using Logistic Regression we try to predict if a future match is a real or false. • Human fingerprint examiners claim 100% accuracy. Is this true?

  10. Exercise • You have layed your hands on a dataset where data have a single attribute • and a class label (0 or 1). You train a logistic regression classifier. • A new data-case is presented. What do you do to decide in what class it falls • (use an equation or pseudo-code) • How many parameters are there to tune for this problem? • Explain what these parameters mean in terms of the function P(Y=1|X).

More Related