Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004

1 / 22

# Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 - PowerPoint PPT Presentation

One-Dimensional Curve-Fitting. Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004. INTRODUCTION. Curve-fitting : Sample data:{(x 0 ,y 0 ), (x 1 ,y 1 ), ... (x n , y n )} interpolation & extrapolation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

One-Dimensional Curve-Fitting

Presented by Wenli Li, Shuhong Li,

and Vivian Tam

Venables and Ripley Section 8.7

Novemeber 22, 2004

INTRODUCTION

• Curve-fitting:
• Sample data:{(x0,y0), (x1,y1), ... (xn, yn)}
• interpolation & extrapolation
• One-dimensional curve-fitting (section 8.7):
• The functional form is not pre-specified
• SPLINES (ns, smooth.spline)
• Local Regression (LOESS, SUPSMU, KERNEL SMOOTHER and LOCPOLY)
• Data set:
• One independent & one dependent

Examples: GAGurine & Mercury level

Dataset:

Variables:

Age: independent

GAG: dependent

Sample size: 314

Classical way:

library(MASS)

attach(GAGurine)

plot(Age, GAG, main=”Degree 6 polynomial”)

GAG.lm<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6) +I(Age^7) +I(Age^8))

anova(GAG.lm)

GAG.lm2<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6))

xx<-seq(0, 17, len=200)

lines(xx, predict(GAG.lm2, data.frame(Age=xx), col=“red”)

Age: 0.00 0.00……0.46 0.47.….17.30 7.67

GAG 23.0 23.8……18.6 26.4.…..1.9 9.3

=======================================

Terms added sequentially (first to last)

Df Sum of Sq Mean Sq F-value Pr(F)

Age 1 12590 12590 593.58 0.0000

I(Age^2) 1 3751 3751 176.84 0.0000

I(Age^3) 1 1492 1492 70.32 0.0000

I(Age^4) 1 449 449 21.18 0.00001

I(Age^5) 1 174 174 8.22 0.00444

I(Age^6) 1 286 286 13.48 0.00028

I(Age^7) 1 57 57 2.70 0.10151

I(Age^8) 1 45 45 2.12 0.14667

GAGurine (MASS)
SPLINES
• Algorithm:
• Function: ns( )
• Generate a Basis Matrix for Natural Cubic Splines
• Usage: ns(x, df, knots, intercept=F, Boundary.knots,derivs)
• Arguments:
• Required: x the predictor variable.
• Optional:
• Df: degrees of freedom. One can supply df rather than knots; ns then chooses df-1-intercept knots at suitably chosen quantiles of x. This argument is ignored if knots is supplied.
• Knots: breakpoints that define the spline.
SPLINES

Function: smooth.spline( )

• Fits a cubic B-spline smooth to the input data.
• Usage: smooth.spline(x, y, w = <<see below>>, df = <<see below>>, spar = 0, cv = F, all.knots = F, df.offset = 0, penalty = 1)
• Arguments:
• Required: X, values of the predictor variable. There should be at least ten distinct x values.
• Optional:
• Y: response variable, of the same length as x.
• Df:a number which supplies the degrees of freedom = trace(S)rather than a smoothing parameter.
SPLINES

library(splines)

plot(Age, GAG, type=”n”, main=”Spline”)#splines

lines(Age, fitted(lm(GAG~ns(Age, df=5))), col=”red”)

lines(Age, fitted(lm(GAG~ns(Age, df=10))), lty=3, col=”green”)

lines(Age, fitted(lm(GAG~ns(Age, df=20))), lty=4, col=”blue”)

lines(smooth.spline(Age, GAG), lwd=3, col=”black”)# Smoothing splines

legend(12, 50, c(“red: df=5”, “green:df=10”, “blue:df=20”, “Smoothing”), lty=c(1,3, 4,1), lwd=c(1, 1,1, 3), bty=”n”)

KERNEL SMOOTH

Function: ksmooth( )

• Estimates a probability density or performs scatterplot smoothing using kernel estimates.
• Usage: ksmooth(x, y=NULL, kernel="box", bandwidth=0.5, range.x=range(x), n.points=length(x), x.points=<<see below>>)
• Arguments:
• Required: X, vector of x data
• Optional:
• Y: vector of y data. This must be same length as x, and missing values are not accepted.
• Kernel: "box“,"triangle“,"parzen“,"normal”
• Bandwidth:Larger values of bandwidth make smoother estimates, smaller values of bandwidth make less smooth estimates.
Kernel Smoother

#kernel smoother:

plot(Age, GAG, type=”n”, main=”ksmooth”)

lines(ksmooth(Age, GAG, “normal”, bandwidth=1), col=”red”)

lines(ksmooth(Age, GAG, “normal”, bandwidth=5))

legend(12, 50, c(“red: bandwidth=1”, “black: bandwidth=5”),bty=”n”)

LOESS
• Using Local Polynomial Regression fit a curve determined by one or more numerical predictors
• gets a predicted value at each point by fitting a weighted linear regression, where the weights decrease with distance from the point of interest
LOESS Parameters
• f:controls the window size
• weights: distance from some point x
• span: the parameter alpha which controls the degree of smoothing
• degree: the degree of the polynomials to be used, up to 2

LOESSCode: library(MASS)attach(GAGurine)plot(Age,GAG,type="n",main="loess")lines(loess.smooth(Age,GAG,span=2/3,degree=1),col="red",lwd=1)lines(loess.smooth(Age,GAG,span=2/3,degree=4),col="blue",lwd=2)lines(loess.smooth(Age,GAG,span=1/3,degree=4),col="green",lwd=1)legend(10,45, c("Red: span=2/3,deg=1","Blue: span=2/3,deg=4",”green: span=1/3,deg=4"),bty="n")

SUPSMU
• Serves a purpose similar to that of the function loess
• The best of the three smoothers is chosen by cross-validation
• If there are substantial correlations in x-value, then a pre-specified fixed span smoother should be used. Reasonable span values are 0.2 to 0.4
SUPSMU Parameters:
• span: the fraction of the observations in the span of the running（lines smoother, or ‘“cv”’ to choose this by leave-one-out cross-validation）
• bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness
• periodic: if TRUE, the smoother assumes x is a periodic variable with values in the range [0.0, 1.0] and period 1.0. An error occurs if x has values outside this range

References:

Friedman, J. H. (1984) A variable span scatter-plot smoother. Laboratory for Computational Statistics, Stanford University Technical Report No. 5

Code:plot(Age,GAG,type="n",main="supsmu")lines(supsmu(Age,GAG))lines(supsmu(Age,GAG,bass=3),lty=3)lines(supsmu(Age,GAG,bass=10),lty=4)legend(12,50,c("default","bass=3","bass=10"),lty=c(1,3,4),bty="n")Code:plot(Age,GAG,type="n",main="supsmu")lines(supsmu(Age,GAG))lines(supsmu(Age,GAG,bass=3),lty=3)lines(supsmu(Age,GAG,bass=10),lty=4)legend(12,50,c("default","bass=3","bass=10"),lty=c(1,3,4),bty="n")

LOCPOLY
• Estimates a probability density function using local polynomials
• A fast binned implementation over an equally-spaced grid is used
• Use approximations over an equally-spaced grid for fast computation
• In a simple form : locpoly(x, y, degree=#, bandwidth=# )

Parameters:

• locpoly(x, y, drv=0, degree=<<see below>>, kernel="normal“

bandwidth,gridsize=401, bwdisc=25, range.x=<<see below>>,

binned=FALSE, truncate=TRUE )

• drv: order of derivative to be estimated
• degree: degree of local polynomial used
• bandwidth: the kernel bandwidth smoothing parameter
• range.x: vector containing the minimum and maximum values of 'x' at which to compute the estimate
LOCPOLY

Code:

library(MASS)

attach(GAGurine)

library(KernSmooth)

plot(Age, GAG, type="n", main="(Age, GAG) Locpoly")

(h<- dpill(Age, GAG))

lines(locpoly(Age, GAG, degree=0, bandwidth=h), col="red",lty=1,lwd=2)

lines(locpoly(Age, GAG, degree=1, bandwidth=h), col="blue",lty=3,lwd=3)

lines(locpoly(Age, GAG, degree=2, bandwidth=h), col="green",lty=4,lwd=3)

detach()

Example: Mercury Level
• Model : Mercury and Alkalinity
• In 1990 to 1991, largemouth bass fish were studied in 53 different Florida lakes to examine the Mercury contamination level and the factors that influenced the level of mercury absorpsion in the fish
• One factor studied was the Alkaliniity level of the water
• The graph of Mercury level and Alkalinity level is plotted to study the relationship
Mercury Level Graphs Coding:
• #1 loess
• plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Loess")
• lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 1), col="red",lwd=2)
• lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 2), col="blue",lwd=2)
• legend(65,1.0, c("deg=1 Red","deg=2 Blue"),bty="n")
• #2 supsmu
• plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Supsmu")
• lines(supsmu(Alkalinity,Mercury, bass=1), lty=1,col="red",lwd=2)
• lines(supsmu(Alkalinity,Mercury, bass=10), lty=3,col="blue",lwd=3)
• legend(58,1.0, c("base=1red","base=10blue"),lty=c(1,3),bty="n",lwd=2)
• #3 ksmooth
• plot(Alkalinity, Mercury, type="n", main="Alkalinity and Mercury, Ksmooth")
• lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=1),col="green",lwd=2)
• lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=5),col="red", lty=2,lwd=2)
• legend(75,1.0, c("bw=1","bw=5"),lty=c(1,2),bty="n")
• #4 locpoly
• library(KernSmooth)
• plot( Alkalinity, Mercury, type="n",main="Alkalinity and Mercury, Locpoly")
• #select bandwidth
• (h <- dpill(Alkalinity,Mercury))
• lines(locpoly(Alkalinity,Mercury,degree=0, bandwidth=h),lty=1,col="green",lwd=2)
• lines(locpoly(Alkalinity,Mercury,degree=1, bandwidth=h),lty=2,col="red",lwd=2)
• lines(locpoly(Alkalinity,Mercury,degree=2, bandwidth=h),lty=3,col="purple",lwd=3)
SUMMARY
• Use One-Dimensional Curve-Fitting when:

Scatter Plot does not result in a Linear Model

Data Transformation does not give satisfactory

Linear Model result

Accommodate future data

Include previous outliers

• Several methods discussed including:

1. SPLINES

2. LOESS

3. SUPSMU

4. KSMOOTH

5. LOCPOLY

• Parameters: such as bandwidth, df, derivative, smoothness, degree etc can help the curve fitting.