1 / 11

Introduction to Non Parametric Statistics Kernel Density Estimation

Introduction to Non Parametric Statistics Kernel Density Estimation. Nonparametric Statistics. Fewer restrictive assumptions about data and underlying probability distributions. Population distributions may be skewed and multi-modal. Kernel Density Estimation (KDE).

mikkel
Download Presentation

Introduction to Non Parametric Statistics Kernel Density Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Non Parametric Statistics Kernel Density Estimation

  2. Nonparametric Statistics • Fewer restrictive assumptions about data and underlying probability distributions. • Population distributions may be skewed and multi-modal.

  3. Kernel Density Estimation (KDE) Kernel Density Estimation (KDE) is a non-parametric technique for density estimation in which a known density function (the kernel) is averaged across the observed data points to create a smooth approximation.

  4. Density Estimation and Histograms Let b denote the bin-width then the histogram estimation at a point x from a random sample of size n is given by, Two choices have to be made when constructing a histogram: • Positioning of the bin edges • Bin-width

  5. KDE – Smoothing the Histogram Let be a random sample taken from a continuous, univariate density f. The kernel density estimator is given by, • K is a function satisfying • The function K is referred to as the kernel. • h is a positive number, usually called the bandwidth or window width.

  6. Kernels • Refer to Table 2.1 Wand and Jones, page 31. • … most unimodal densities perform about the same as each other when used as a kernel. • Typically K is chosen to be a unimodal PDF. • Use the Gaussian kernel. • Gaussian • Epanechnikov • Rectangular • Triangular • Biweight • Uniform • Cosine Wand M.P. and M.C. Jones (1995), Kernel Smoothing, Monographs on Statistics and Applied Probability 60, Chapman and Hall/CRC, 212 pp.

  7. KDE – Based on Five Observations Kernel density estimate constructed using five observations with the kernel chosen to be the N(0,1) density. x=c(3, 4.5, 5.0, 8, 9)

  8. Histogram - Positioning of Bin Edges x=c(3, 4.5, 5.0, 8, 9) Area=1 • hist(x,right=T,freq=F), R-default • (a,b] right closed (left-open) • hist(x,right=F,freq=F) • [a,b) left closed (right-open)

  9. Histogram - Bin Width hist(x,breaks=5,right=F,prob=T) hist(x,breaks=2,right=F,prob=T) Area=1

  10. KDE – Numerical Implementation "kde" <- function(x,h) { npt=100 r <- max(x) - min(x); xmax <- max(x) + 0.1*r; xmin <- min(x) - 0.1*r n <- length(x) xgrid <- seq(from=xmin, to=xmax, length=npt) f = vector() for (i in 1:npt){ tmp=vector() for (ii in 1:n){ z=(xgrid[i] - x[ii])/h density=dnorm(z) tmp[ii]=density } f[i]=sum(tmp) } f=f/(n*h) lines(xgrid,f,col="grey") } #end function Variable description x = xgrid X = x

  11. Bandwidth Estimators • Optimal Smoothing • Normal Optimal Smoothing • Cross-validation • Plug-in bandwidths

More Related