Essential Concepts in Statistical Inference

ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010

Lecture outline • Reading: 9.1-9.2 • Confidence Intervals • Central limit theorem • Student t-distribution • Linear regression

Confidence interval • Consider an estimator for unknown  • We fix a confidence level, 1- • For every  replace the single point estimator with a lower estimate and upper one s.t. • We call , a 1- confidence interval

Confidence interval - example • Observations Xi’s are i.i.d normal with unknown mean  and known variance /n • Let =0.05 • Find the 95% confidence interval

Confidence interval (CI) • Wrong: the true parameter lies in the CI with 95% probability…. • Correct: Suppose that  is fixed • We construct the CI many times, using the same statistical procedure • Obtain a collection of n observations and construct the corresponding CI for each • About 95% of these CIs will include 

A note on Central Limit Theorem (CLT) • Let X1, X2, X3, ... Xn be a sequence of n independent and identically distributed RVs with finite expectation µ and variance σ2 > 0 • CLT: as the sample size n increases, PDF of the sample average of the RVs approaches N(µ,σ2/n) irrespective of the shape of the original distribution

CLT A probability density function Density of a sum of two variables Density of a sum of three variables Density of a sum of four variables

CLT • Let the sum of n random variables be Sn, given by Sn = X1 + ... + Xn. Then, defining a new RV • The distribution of Zn converges towards the N(0,1) as n approaches  (this is convergence in distribution),written as • In terms of the CDFs

Confidence interval approximation • Suppose that the observations Xi are i.i.d with mean  and variance  that are unknown • Estimate the mean and (unbiased) variance • We may estimate the variance /n of the sample mean by the above estimate • For any given , we may use the CLT to approximate the confidence interval in this case From the normal table:

Confidence interval approximation • Two different approximations in effect: • Treating the sum as if it is a normal RV • The true variance is replaces by the estimated variance from the sample • Even in the special case where the Xi’s are i.i.d normal, the variance is an estimate and the RV Tn (below) is not normally distributed

t-distribution • For normal Xi, it can be shown that the PDF of Tn does not depend on  and  • This is called t-distribution with n-1 degrees of freedom

t-distribution • Its is also symmetric and bell-shaped (like normal) • The probabilities of various intervals are available in tables • When the Xi’s are normal and n is relatively small, a more accurate CI is (z=1-/2)

Example • The weight of an object is measured 8 times using an electric scale • It reports true weight + random error ~N(0,) .5547, .5404, .6364, .6438, .4917, .5674, .5564, .6066 • Compute the 95% confidence interval Using the t-distribution

Linear regression • Building a model of relation between two or more variables of interest • Consider two variables x and y, based on a collection of data points (xi,yi), i=1,…,n • Assume that the scatter plot of these two variables show a systematic, approximately linear relationship between xi and yi • It is natural to build a model: y0+1x

Linear regression • Often, we cannot build a model, but we can estimate the parameters: • The i-th residual is:

Linear regression • The parameters are chosen to minimize the sum of squared residuals • Always keep in mind that the postulated model may not be true • To perform the optimization, we set the partial derivatives to zero w.r.t0and 1

Linear regression • Given n data pairs (xi,yi), the estimates that minimize the sum of the squared residuals are

Example • The leaning tower of Pisa continuously tilts • Measurements bw 1975-1987 • Find the linear regression

Solution

Justification of the least square • Maximum likelihood • Approximation of Bayesian linear LMS (under a possibly nonlinear model) • Approximation of Bayesian LMS estimation (linear model)

Maximum likelihood justification • Assume that xi’s are given numbers • Assume yi’s are realizations of a RV Yias below where Wi’s are i.i.d ~N(0,2) Yi = 0 + 1xi + Wi • The likelihood function has the form • ML is equivalent to minimizing the sum of square residuals

Approximate Bayesian linear LMS • Assume xi and yi are realizations of RVs Xi & Yi, • (Xi,Yi) pairs are i.i.d with unknown joint PDF • Assume an additional independent pair (X0,Y0) • We observe X0 and want to estimate Y0 by a linear estimator • The linear estimator is of the form

Approximate Bayesian LMS • For the previous scenario, make the additional assumption of linear model Yi = 0 + 1xi + Wi • Wi’s are i.i.d ~N(0,2), independent of Xi • We know that E[Y0|X0] minimizes the mean squared estimation error, for E[Y0|X0]=0+1Xi • As n,

Multiple linear regression • Many phenomena involve multiple underlying variables, also called explanatory variables • Such models are called multiple regression • E.g., for a triplet of data points (xi,yi,zi) we wish to estimate the model: y0 + 1x + 2z • Minimize: i (yi - 0 - 1xi - 2zi)2 • In general, we can consider the model y 0 + j jhj(x)

Nonlinear regression • Sometimes the expression is nonlinear in the unknown parameter • Variables x and y obey the form yh(x;) • Min i (yi – h(xi ;))2 • The minimization is not typically closed-form • Assuming Wi’s are N(0,2), Yi = h(xi;) + Wi • The ML function

Practical considerations • Heteroskedasticity • Nonlinearity • Multicollinearity • Overfitting • Causality

Essential Concepts in Statistical Inference