670 likes | 804 Views
CSC 4510 – Machine Learning. 4: Regression (continued). Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. T he slides in this presentation are adapted from:
E N D
CSC 4510 – Machine Learning 4: Regression (continued) Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ • The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University
Last time • Introduction to linear regression • Intuition – least squares approximation • Intuition – gradient descent algorithm • Hands on: Simple example using excel CSC 4510 - M.A. Papalaskari - Villanova University
Today • How to apply gradient descent to minimize the cost function for regression • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: sample problem Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet2) CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Notation Training set of housing prices (Portland, OR) Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Learning algorithm for hypothesis function h Training Set Learning Algorithm Linear Hypothesis: Size of house Estimate price h Univariate linear regression) CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Learning algorithm for hypothesis function h Training Set Learning Algorithm Linear Hypothesis: Size of house Estimate price h Univariate linear regression) CSC 4510 - M.A. Papalaskari - Villanova University
Linear Regression Model Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University
Today • How to apply gradient descent to minimize the cost function for regression • a closer look at the cost function • applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University
Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 Simplified Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = x CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = 0.5x CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = 0 CSC 4510 - M.A. Papalaskari - Villanova University
What if θ0 ≠ 0? Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) Price ($) in 1000’s Size in feet2 (x) hθ (x) = 10 + 0.1x CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University
Today • How to apply gradient descent to minimize the cost function for regression • a closer look at the cost function • applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University
Have some function Want • Gradient descent algorithm outline: • Start with some • Keep changing to reduce until we hopefully end up at a minimum CSC 4510 - M.A. Papalaskari - Villanova University
Have some function Want Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University
Have some function Want Gradient descent algorithm learning rate CSC 4510 - M.A. Papalaskari - Villanova University
If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. CSC 4510 - M.A. Papalaskari - Villanova University
at local minimum Current value of CSC 4510 - M.A. Papalaskari - Villanova University
Gradient descent can converge to a local minimum, even with the learning rate α fixed. CSC 4510 - M.A. Papalaskari - Villanova University
Linear Regression Model Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University
Gradient descent algorithm update and simultaneously CSC 4510 - M.A. Papalaskari - Villanova University
J(0,1) 1 0 CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University
“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. • The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University
What’s next? We are not in univariate regression anymore: CSC 4510 - M.A. Papalaskari - Villanova University
What’s next? We are not in univariate regression anymore: CSC 4510 - M.A. Papalaskari - Villanova University
Today • How to apply gradient descent to minimize the cost function for regression • a closer look at the cost function • applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University
Linear Algebra Review CSC 4510 - M.A. Papalaskari - Villanova University
Matrix: Rectangular array of numbers Matrix Elements (entries of matrix) “i, j entry” in the ithrow, jth column Dimension of matrix: number of rows x number of columns eg: 4 x 2 CSC 4510 - M.A. Papalaskari - Villanova University
Another Example: Representing communication links in a network b b a c a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0
Vector: An n x 1 matrix. element CSC 4510 - M.A. Papalaskari - Villanova University n-dimensional vector
Vector: An n x 1 matrix. 1-indexed vs 0-indexed: element CSC 4510 - M.A. Papalaskari - Villanova University n-dimensional vector