- 92 Views
- Uploaded on
- Presentation posted in: General

What Could We Do better? Alternative Statistical Methods

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

What Could We Do better?Alternative Statistical Methods

Jim Crooks and

Xingye Qiao

- We take measurements of the beam displacement, y, at times t1,…,tn
- What we actually observe is ỹ which is a noisy version of y,
- or

- is the error resulting from imperfect measurement at time tj

- Under the spring model it is assumed that the displacement over time is governed by the spring model:
So we could write:

- Remember the assumptions:
- Data at different time points are independent
- Residuals are normally distributed
- Residuals’ variance is constant over time

- With these assumptions the model can be written:

- We may have repeated measurements of the same beam
- Notation: Let tij be the jth time point; then i indexes the repeats at tj
- Denote the repeated measurement of the beam displacement at tj by ỹi(tj)=ỹij.
- If we believe C and K are the same across replicates then we may write the model as:
independent over time and replicate

- Because of our independence assumption, the likelihood of the model (which we think of as a function of the parameters C and K, not the data) is the product of the individual density functions evaluated at the data ỹ:
where N(x;m,s2) denotes the normal density with mean m and variance s2 evaluated at x.

- The Maximum Likelihood Estimates (MLE’s) for C and K, denoted and , are the values of C and K that maximize the likelihood function
- Given s2 known, the MLE’s are the same as what you’d get with a Least-Squares procedure (the former tends to justify the use of the latter)

- We can asses the goodness of fit by using the spring model to “predict” the observed measurements
- The “predicted” (AKA “fitted”) values, ŷ(t) are obtained by evaluating the spring model with and :
- We can compare the fitted values at the observed times ŷ(tij) = ŷij to the observed values ỹij.
- Run the MATLAB file ‘inv_beam.m’ by typing:
> inv_beam

- We need to know the difference between beam displacement data and our model’s predictions for the beam displacement:
- These are called the model residuals
- The residuals are our best guess for the values of eij. Hence from our current model we would expect the eij to look independent and normally distributed with constant variance

- Are the residuals normally distributed?
- Are they independent (i.e., is there correlation in time)?
- Is their variance constant?
- Run the MATLAB file plotresidual.m by typing:
> plotresidual

- One criteria to use when judging a model is the fraction of the variability in the data it can explain:
- SSTot is the total variability in the data ỹ
- SSE is the variability left over after fitting the model (SSE ≤ SSTot)
- So R2 represents the fraction of variability in the data that is explained by the model

- In the example shown above we can find that:
- This means the spring model accounts for about 52% of the variability in our displacement measurements
- Is 52% a lot?

- Brief aside: note that we can also get an estimate of s2 from SSE:
where n is the number of data points and df is the number of ‘degrees of freedom’ (AKA the number of unknown parameters)

- Is R2 = 52% any good? It depends.
- It can be useful (or even necessary) to set up a naïve “straw man” alternative against which to compare a physical model
- There are many possible alternatives and choosing between them is subjective
- To illustrate we will use a smoothing spline alternative

- A cubic spline is a function that is a piecewise cubic polynomial:
- Between each sequential pair of time points the function is a cubic polynomial
- At each time point the function is continuous and has continuous first and second derivatives
- The time points are called “knots”

- A smoothing spline is a type of cubic spline where:
- The time points are specifically those at which measurements are made, tj
- Given (yj,tj) for all j, is determined to be that cubic spline that minimizes
- Here a is called the smoothing parameter

- What happens to the smoothness as α→∞?

- The value of a parameterizes the relative importance of smoothness to fit
- Larger values of a result in a bigger penalty for curvature and hence results in a smoother fit that may not fit the data
- Smaller values of a result in a wigglier spline that more closely follows the data

Straight Line

Exact Interpolator

- But how do we choose the value of a?
- Another choice without an objectively correct answer!
- One useful answer is the value that minimizes the “leave-one-out” predictive error:
- Fit the spline to all the displacement data except one point
- Use the spline to predict the displacement at this time point
- Repeat over all displacement points and sum the residual errors

- This is called “leave-one-out” cross-validation

- Are the residuals normally distributed?
- Are they independent (i.e., is there correlation in time)?
- Is their variance constant?
- You can make your own cross-validated spline using the MATLAB file splineplot.m
- Don’t do it now!!!

- If you compare residuals, those for the spline are generally smaller (i.e., it fits the data better)
- Spline Coefficient of variation is
- Our spring model explains less of the variation than does a naive spline (52% < 88%)

- Is the difference big enough to reject the use of the spring model?
- Again, this is subjective, but can use statistical tests to answer the question as objectively as possible
- Such tests are beyond the scope of this workshop, but if you are interested in supercharging your group project using them please ask me