110 likes | 244 Views
RECITATION 2 APRIL 28. Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation. Penalized Cubic Regression Splines. gam() in library “ mgcv ” gam( y ~ s(x, bs =“ cr ”, k= n.knots ) , knots=list(x=c(…)), data = dataset)
E N D
RECITATION 2APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation
Penalized Cubic Regression Splines • gam() in library “mgcv” • gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset) • By default, the optimal smoothing parameter selected by GCV • R Demo 1
Kernel Method • Nadaraya-Watson locally constant model • locally linear polynomial model • How to define “local”? • By Kernel function, e.g. Gaussian kernel • R Demo 1 • R package: “locfit” • Function: locfit(y~x, kern=“gauss”, deg= , alpha= ) • Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= , alpha= bandwidth range)
Gaussian Processes • Distribution on functions • f~ GP(m,κ) • m: mean function • κ: covariance function • p(f(x1), . . . , f(xn)) ∼ Nn(μ, K) • μ = [m(x1),...,m(xn)] • Kij = κ (xi,xj) • Idea: If xi, xjare similar according to the kernel, then f(xi) is similar to f(xj)
Gaussian Processes – Noise free observations • Example task: • learn a function f(x) to estimate y, from data (x, y) • A function can be viewed as a random variable of infinite dimensions • GP provides a distribution over functions.
Gaussian Processes – Noise free observations • Model • (x, f) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noise free data (x, f), • Length-scale • R Demo 2
Gaussian Processes – Noisy observations(GP for Regression) • Model • (x, y) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noisy data (x, y), • R Demo 3
Reference • Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams • 527 lecture notes by Emily Fox
Mixture Models – Density Estimation • EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC) • Remember: • EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD • MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)
EM algorithm • Iterative procedure that attempts to maximize log-likelihood ---> MLE estimates of the mixture model parameters. • I.e. one final density estimate
Bayesian Mixture Modeling (MCMC) • Uses an iterative procedure to DRAW SAMPLES from posterior (then you can average draws, etc.) • Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.