110 likes | 250 Views
This comprehensive guide delves into advanced statistical methods for density estimation using penalized cubic regression splines, kernel methods, and Gaussian processes. It illustrates the application of Generalized Additive Models (GAM) via the “mgcv” library in R, focusing on optimal smoothing through Generalized Cross-Validation (GCV). The guide covers both Nadaraya-Watson kernel methods and Bayesian Mixture Modeling using MCMC. Real-life R demos are included for hands-on experience with the concepts presented. Ideal for statisticians and data scientists interested in advanced modeling techniques.
E N D
RECITATION 2APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation
Penalized Cubic Regression Splines • gam() in library “mgcv” • gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset) • By default, the optimal smoothing parameter selected by GCV • R Demo 1
Kernel Method • Nadaraya-Watson locally constant model • locally linear polynomial model • How to define “local”? • By Kernel function, e.g. Gaussian kernel • R Demo 1 • R package: “locfit” • Function: locfit(y~x, kern=“gauss”, deg= , alpha= ) • Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= , alpha= bandwidth range)
Gaussian Processes • Distribution on functions • f~ GP(m,κ) • m: mean function • κ: covariance function • p(f(x1), . . . , f(xn)) ∼ Nn(μ, K) • μ = [m(x1),...,m(xn)] • Kij = κ (xi,xj) • Idea: If xi, xjare similar according to the kernel, then f(xi) is similar to f(xj)
Gaussian Processes – Noise free observations • Example task: • learn a function f(x) to estimate y, from data (x, y) • A function can be viewed as a random variable of infinite dimensions • GP provides a distribution over functions.
Gaussian Processes – Noise free observations • Model • (x, f) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noise free data (x, f), • Length-scale • R Demo 2
Gaussian Processes – Noisy observations(GP for Regression) • Model • (x, y) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noisy data (x, y), • R Demo 3
Reference • Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams • 527 lecture notes by Emily Fox
Mixture Models – Density Estimation • EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC) • Remember: • EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD • MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)
EM algorithm • Iterative procedure that attempts to maximize log-likelihood ---> MLE estimates of the mixture model parameters. • I.e. one final density estimate
Bayesian Mixture Modeling (MCMC) • Uses an iterative procedure to DRAW SAMPLES from posterior (then you can average draws, etc.) • Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.