Introduction to smoothing splines
Download
1 / 39

Introduction to Smoothing Splines - PowerPoint PPT Presentation


  • 289 Views
  • Uploaded on

Introduction to Smoothing Splines. Tongtong Wu Feb 29, 2004. Outline. Introduction Linear and polynomial regression, and interpolation Roughness penalties Interpolating and Smoothing splines Cubic splines Interpolating splines Smoothing splines Natural cubic splines

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Smoothing Splines' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to smoothing splines

Introduction to Smoothing Splines

Tongtong Wu

Feb 29, 2004


Outline
Outline

  • Introduction

    • Linear and polynomial regression, and interpolation

    • Roughness penalties

  • Interpolating and Smoothing splines

    • Cubic splines

    • Interpolating splines

    • Smoothing splines

    • Natural cubic splines

    • Choosing the smoothing parameter

    • Available software


Key words
Key Words

  • roughness penalty

  • penalized sum of squares

  • natural cubic splines





Motivation3
Motivation

Spline(y18)


Introduction
Introduction

  • Linear and polynomial regression :

    • Global influence

    • Increasing of polynomial degrees happens in discrete steps and can not be controlled continuously

  • Interpolation

    • Unsatisfactory as explanations of the given data


Roughness penalty approach
Roughness penalty approach

  • A method for relaxing the model assumptions in classical linear regression along lines a little different from polynomial regression.


Roughness penalty approach1
Roughness penalty approach

  • Aims of curving fitting

    • A good fit to the data

    • To obtain a curve estimate that does not display too much rapid fluctuation

  • Basic idea: making a necessary compromise between the two rather different aims in curve estimation


Roughness penalty approach2
Roughness penalty approach

  • Quantifying the roughness of a curve

    • An intuitive way:

      (g: a twice-differentiable curve)

    • Motivation from a formalization of a mechanical device: if a thin piece of flexible wood, called a spline, is bent to the shape of the graph g, then the leading term in the strain energy is proportional to


Roughness penalty approach3
Roughness penalty approach

  • Penalized sum of squares

    • g: any twice-differentiable function on [a,b]

    • : smoothing parameter (‘rate of exchange’ between residual error and local variation)

  • Penalized least squares estimator


Roughness penalty approach4
Roughness penalty approach

Curve for a large value of


Roughness penalty approach5
Roughness penalty approach

Curve for a small value of


Interpolating and smoothing splines
Interpolating and Smoothing Splines

  • Cubic splines

  • Interpolating splines

  • Smoothing splines

  • Choosing the smoothing parameter


Cubic splines
Cubic Splines

  • Given a<t1<t2<…<tn<b, a function g is a cubic spline if

    • On each interval (a,t1), (t1,t2), …, (tn,b), g is a cubic polynomial

    • The polynomial pieces fit together at points ti (called knots) s.t. g itself and its first and second derivatives are continuous at each ti, and hence on the whole [a,b]


Cubic splines1
Cubic Splines

  • How to specify a cubic spline

  • Natural cubic spline (NCS) if its second and third derivatives are zero at a and b, which implies d0=c0=dn=cn=0, so that g is linear on the two extreme intervals [a,t1] and [tn,b].


Natural cubic splines
Natural Cubic Splines

Value-second derivative representation

  • We can specify a NCS by giving its value and second derivative at each knot ti.

  • Define

    which specify the curve g completely.

  • However, not all possible vectors represent a natural spline!


Natural cubic splines1
Natural Cubic Splines

Value-second derivative representation

  • Theorem 2.1

    The vector and specify a natural spline g if and only if

    Then the roughness penalty will satisfy


Natural cubic splines2
Natural Cubic Splines

Value-second derivative representation


Natural cubic splines3
Natural Cubic Splines

Value-second derivative representation

  • R is strictly diagonal dominant, i.e.

     R is positive definite, so we can define


Interpolating splines
Interpolating Splines

  • To find a smooth curve that interpolate (ti,zi), i.e. g(ti)=zi for all i.

  • Theorem 2.2

    Suppose and t1<…<tn. Given any values z1,…,zn, there is a unique natural cubic spline g with knots ti satisfying


Interpolating splines1
Interpolating Splines

  • The natural cubic spline interpolant is the unique minimizer of over S2[a,b] that interpolate the data.

  • Theorem 2.3

    Suppose g is the interpolant natural cubic spline,

    then


Smoothing splines
Smoothing Splines

  • Penalized sum of squares

    • g: any twice-differentiable function on [a,b]

    • : smoothing parameter (‘rate of exchange’ between residual error and local variation)

  • Penalized least squares estimator


Smoothing splines1
Smoothing Splines

1. The curve estimator is necessarily a natural cubic spline with knots at ti, for i=1,…,n.

Proof: suppose g is the NCS


Smoothing splines2
Smoothing Splines

2. Existence and uniqueness

Let then

since be precisely the vector of . Express ,


Smoothing splines3
Smoothing Splines

2. Theorem 2.4

Let be the natural cubic spline with knots at ti for which . Then for any in S2[a,b]


Smoothing splines4
Smoothing Splines

3. The Reinsch algorithm

The matrix has bandwidth 5 and is symmetric and strictly positive-definite, therefore it has a Cholesky decomposition


Smoothing splines5
Smoothing Splines

3. The Reinsch algorithm for spline smoothing

Step 1: Evaluate the vector .

Step 2: Find the non-zero diagonals of

and hence the Cholesky decomposition factors L and D.

Step 3: Solve

for by forward and back substitution.

Step 4: Find g by .


Smoothing splines6
Smoothing Splines

4. Some concluding remarks

  • Minimizing curve essentially does not depend on a and b, as long as all the data points lie between a and b.

  • If n=2, for any , setting to be the straight line through the two points (t1,Y1) and (t2,Y2) will reduce S(g) to zero.

  • If n=1, the minimizer is no longer unique, since any straight line through (t1,Y1) will yield a zero value S(g).


Choosing the smoothing parameter
Choosing the Smoothing Parameter

  • Two different philosophical approaches

    • Subjective choice

    • Automatic method – chosen by data

      • Cross-validation

      • Generalized cross-validation


Choosing the smoothing parameter1
Choosing the Smoothing Parameter

  • Cross-validation

  • Generalized cross-validation


Available software
Available Software

smooth.spline in R

  • Description:

    Fits a cubic smoothing spline to the supplied data.

  • Usage:

    plot(speed, dist)

    cars.spl <- smooth.spline(speed, dist)

    cars.spl2 <- smooth.spline(speed, dist, df=10)

    lines(cars.spl, col = "blue")

    lines(cars.spl2, lty=2, col = "red")


Available software1
Available Software

Example 1

library(modreg)

y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4))

xx <- seq(1,length(y18), len=201)

(s2 <- smooth.spline(y18)) # GCV

(s02 <- smooth.spline(y18, spar = 0.2))

plot(y18, main=deparse(s2$call), col.main=2)

lines(s2, col = "blue");

lines(s02, col = "orange");

lines(predict(s2, xx), col = 2)

lines(predict(s02, xx), col = 3);

mtext(deparse(s02$call), col = 3)



Available software3
Available Software

Example 2

data(cars) ## N=50, n (# of distinct x) =19

attach(cars)

plot(speed, dist, main = "data(cars) & smoothing splines")

cars.spl <- smooth.spline(speed, dist)

cars.spl2 <- smooth.spline(speed, dist, df=10)

lines(cars.spl, col = "blue")

lines(cars.spl2, lty=2, col = "red")

lines(smooth.spline(cars, spar=0.1))

## spar: smoothing parameter (alpha) in (0,1]

legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)), "s( * , df = 10)"), col = c("blue","red"), lty = 1:2, bg='bisque')

detach()



Extensions of roughness penalty approach
Extensions of Roughness penalty approach

  • Semiparametric modeling: a simple application to multiple regression

  • Generalized linear models (GLM)

  • To allow all the explanatory variables to be nonlinear

  • Additive model approach


Reference
Reference

  • P.J. Green and B.W. Silverman (1994) Nonparametric Regression and Generalized Linear Models. London: Chapman & Hall


ad