CSC 4510 – Machine Learning

CSC 4510 – Machine Learning 4: Regression (continued) Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ • The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University

Last time • Introduction to linear regression • Intuition – least squares approximation • Intuition – gradient descent algorithm • Hands on: Simple example using excel CSC 4510 - M.A. Papalaskari - Villanova University

Today • How to apply gradient descent to minimize the cost function for regression • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: sample problem Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet2) CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: Notation Training set of housing prices (Portland, OR) Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: Learning algorithm for hypothesis function h Training Set Learning Algorithm Linear Hypothesis: Size of house Estimate price h Univariate linear regression) CSC 4510 - M.A. Papalaskari - Villanova University

Linear Regression Model Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University

Today • How to apply gradient descent to minimize the cost function for regression • a closer look at the cost function • applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University

Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 Simplified Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = x CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = 0.5x CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) y x hθ (x) = 0 CSC 4510 - M.A. Papalaskari - Villanova University

What if θ0 ≠ 0? Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) Price ($) in 1000’s Size in feet2 (x) hθ (x) = 10 + 0.1x CSC 4510 - M.A. Papalaskari - Villanova University

CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University

Have some function Want • Gradient descent algorithm outline: • Start with some • Keep changing to reduce until we hopefully end up at a minimum CSC 4510 - M.A. Papalaskari - Villanova University

Have some function Want Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University

Have some function Want Gradient descent algorithm learning rate CSC 4510 - M.A. Papalaskari - Villanova University

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. CSC 4510 - M.A. Papalaskari - Villanova University

at local minimum Current value of CSC 4510 - M.A. Papalaskari - Villanova University

Gradient descent can converge to a local minimum, even with the learning rate α fixed. CSC 4510 - M.A. Papalaskari - Villanova University

Linear Regression Model Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University

Gradient descent algorithm update and simultaneously CSC 4510 - M.A. Papalaskari - Villanova University

J(0,1) 1 0 CSC 4510 - M.A. Papalaskari - Villanova University

CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed , this is a function of x) (function of the parameters ) CSC 4510 - M.A. Papalaskari - Villanova University

“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. • The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University

What’s next? We are not in univariate regression anymore: CSC 4510 - M.A. Papalaskari - Villanova University

Linear Algebra Review CSC 4510 - M.A. Papalaskari - Villanova University

Matrix: Rectangular array of numbers Matrix Elements (entries of matrix) “i, j entry” in the ithrow, jth column Dimension of matrix: number of rows x number of columns eg: 4 x 2 CSC 4510 - M.A. Papalaskari - Villanova University

Another Example: Representing communication links in a network b b a c a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0

Vector: An n x 1 matrix. element CSC 4510 - M.A. Papalaskari - Villanova University n-dimensional vector

Vector: An n x 1 matrix. 1-indexed vs 0-indexed: element CSC 4510 - M.A. Papalaskari - Villanova University n-dimensional vector

CSC 4510 – Machine Learning

CSC 4510 – Machine Learning

Presentation Transcript

An Introduction to Machine Learning with Perl

Introduction to Machine Learning

Heart/Lung Machine

Text Classification

MACHINE TRANSLATION AT MICROSOFT Chris Wendt Chris.Wendt@microsoft.com

Some Useful Machine Learning Tools

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

A New Parallel Framework for Machine Learning

TCS for Machine Learning Scientists

Submodularity in Machine Learning

Machine Learning for Analyzing Brain Activity

Support Vector Machine 支持向量機

A few methods for learning binary classifiers

Relation Extraction and Machine Learning for IE

CS 59000 Statistical Machine learning Lecture 3

Machine Learning Models on Random Graphs

Lecture 7: Turning Machines

Using Statistical Machine Learning in Cloud Computing

Learning Embeddings for Similarity-Based Retrieval

M240B Machine Gun Operators Course

Seminar: Statistical NLP