Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Pr

Non/Semiparametric Regression and Clustered/Longitudinal Data Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Program: http://stat.tamu.edu/B3NC

Where am I From? Wichita Falls, my hometown College Station, home of Texas A&M Big Bend National Park I-45 I-35

Acknowledgments Raymond Carroll Oliver Linton Alan Welsh Series of papers are on my web site Lin, Wang and Welsh: Longitudinal data (Mammen & Linton for pseudo-observation methods) Linton and Mammen: time series data Xihong Lin Naisyin Wang Enno Mammen

Outline • Longitudinal models: • panel data • Background: • splines = kernels for independent data • Correlated data: • do splines = kernels? • Semiparametric case: • partially linear model: • does it matter what nonparametric method is used?

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster

Panel Data (for simplicity) • i = 1,…,n clusters/individuals • j = 1,…,m observations per cluster • Important points: • The cluster size m is meant to be fixed • This is not a multiple time series problem where the cluster size increases to infinity • Some comments on the single time series problem are given near the end of the talk

The Marginal Nonparametric Model • Y = Response • X = time-varying covariate • Question: can we improve efficiency by accounting for correlation?

The Marginal Nonparametric Model • Important assumption • Covariates at other waves are not conditionally predictive, i.e., they are surrogates • This assumption is required for any GLS fit, including parametric GLS

Independent Data • Splines (smoothing, P-splines, etc.) with penalty parameter = l • Ridge regression fit • Some bias, smaller variance • is over-parameterized least squares • is a polynomial regression

Independent Data • Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h • As the bandwidth h 0, only observations with X near t get any weight in the fit

Independent Data • Major methods • Splines • Kernels • Smoothing parameters required for both • Fits: similar in many (most?) datasets • Expectation: some combination of bandwidths and kernel functions look like splines 12

Independent Data • Splines and kernels are linear in the responses • Silverman showed that there is a kernel function and a bandwidth so that the weight functions are asymptotically equivalent • In this sense, splines = kernels • This talk is about the same result for correlated data

The weight functions Gn(t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

Working Independence • Working independence: Ignore all correlations • Fix up standard errors at the end • Advantage: the assumption is not required • Disadvantage: possible severe loss of efficiency if carried too far

Working Independence • Working independence: • Ignore all correlations • Should posit some reasonable marginal variances • Weighting important for efficiency • Weighted versions: Splines and kernels have obvious analogues • Standard method: Zeger & Diggle, Hoover, Rice, Wu & Yang, Lin & Ying, etc.

Working Independence • Working independence: • Weighted splines and weighted kernels are linear in the responses • The Silverman result still holds • In this sense, splines = kernels

Accounting for Correlation • Splines have an obvious analogue for non-independent data • Let be a working covariance matrix • Penalized Generalized least squares (GLS) • GLS ridge regression • Because splines are based on likelihood ideas, they generalize quickly to new problems

Accounting for Correlation • Splines have an obvious analogue for non-independent data • Kernels are not so obvious • Local likelihood kernel ideas are standard in independent data problems • Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

Kernels and Correlation • Problem: how to define locality for kernels? • Goal: estimate the function at t • Let be a diagonal matrix of standard kernel weights • Standard Kernel method: GLS pretending inverse covariance matrix is • The estimate is inherently local

Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the locality of the kernel method The weight functions Gn(t=.25,x) in a specific case 18

Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red: r = 0.0 Green: r = 0.4 Blue: r= 0.8 Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case

Splines and Correlation Specific case: m=3, n=35 Complex correlation structure Red: Nearly singular Green: r = 0.0 Blue: r= AR(0.8) Note the lack of locality of the spline method The weight functions Gn(t=.25,x) in a specific case

Splines and Standard Kernels • Accounting for correlation: • Standard kernels remain local • Splines are not local • Numerical results can be confirmed theoretically • Don’t we want our nonparametric regression estimates to be local?

Results on Kernels and Correlation • GLS with weights • Optimal working covariance matrix is working independence! • Using the correct covariance matrix • Increases variance • Increases MSE • Splines Kernels (or at least these kernels) 24

Pseudo-Observation Kernel Methods • Better kernel methods are possible • Pseudo-observation: original method • Construction: specific linear transformation of Y • Mean = Q(X) • Covariance = diagonal matrix • This adjusts the original responses without affecting the mean

Pseudo-Observation Kernel Methods • Construction: specific linear transformation of Y • Mean = Q(X) • Covariance = diagonal • Iterative: • Efficiency: More efficient than working independence • Proof of Principle: kernel methods can be constructed to take advantage of correlation

Efficiency of Splines and Pseudo-Observation Kernels Exchng: Exchangeable with correlation 0.6 AR: autoregressive with correlation 0.6 Near Sing: A nearly singular matrix

Better Kernel Methods: SUR • Simulations of the original pseudo-observation method: it is not as efficient as splines • Suggests room for a better estimate • Naisyin Wang: her talk will describe an even better kernel method • Basis: seemingly unrelated regression ideas • Generalizable: based on likelihood ideas

SUR Kernel Methods • It is well known that the GLS spline has an exact, analytic expression • We have shown that the Wang SUR kernel method has an exact, analytic expression • Both methods are linear in the responses

SUR Kernel Methods • The two methods differ only in one matrix term • This turns out to be exactly the same matrix term considered by Silverman in his work • Relatively nontrivial calculations show that Silverman’s result still holds • Splines = SUR Kernels 29

Nonlocality • The lack of locality of GLS splines and SUR kernels is surprising • Suppose we want to estimate the function at t • If any observation has an X near t, then all observations in the cluster contribute to the fit, not just those with covariates near t • Splines, pseudo-kernels and SUR kernels all borrow strength

Nonlocality • Wang’s SUR kernels = BLUP-like pseudo kernels with a clever linear transformation. Let • SUR kernels are working independence kernels

Locality of Kernels • Original pseudo-observation method: pseudo observations uncorrelated • SUR kernels: pseudo-observations are correlated • SUR kernels are not local • SUR kernels are local in (the same!) pseudo-observations

Locality of Splines • Splines = SUR kernels (Silverman-type result) • GLS spline: • Iterative • standard independent spline smoothing • SUR pseudo-observations at each iteration • GLS splines are not local • GLS splines are local in (the same!) pseudo-observations

Time Series Problems • Time series problems: many of the same issues arise • Original pseudo-observation method • Two stages • Linear transformation • Mean Q(X) • Independent errors • Single standard kernel applied • Potential for great gains in efficiency (even infinite for AR problems with large correlation)

Time Series: AR(1) Illustration, First Pseudo Observation Method • AR(1), correlation r: • Regress Yt0 on Xt

Time Series Problems • More efficient methods can be constructed • Series of regression problems: for all j, • Pseudo observations • Mean • White noise errors • Regress for each j: fits are asymptotically independent • Then weighted average • Time series version of SUR-kernels for longitudinal data?

Time Series: AR(1) Illustration, New Pseudo Observation Method • AR(1), correlation r: • Regress Yt0 on Xt and Yt1 on Xt-1 • Weights: 1 and r2

Time Series Problems • AR(1) errors with correlation r • Efficiency of original pseudo-observation method to working independence: • Efficiency of new (SUR?) pseudo-observation method to original method: 36

The Semiparametric Model • Y = Response • X,Z = time-varying covariates • Question: can we improve efficiency for bby accounting for correlation?

Profile Methods • Given b, solve for Q, say • Basic idea: Regress • Working independence • Standard kernels • Pseudo –observations kernels • SUR kernels

Profile Methods • Given b, solve for Q, say • Then fit GLS or W.I. to the model with mean • Question: does it matter what kernel method is used? • Question: How bad is using W.I. everywhere? • Question: are there efficient choices?

The Semiparametric Model: Special Case • If X does not vary with time, simple semiparametric efficient method available • The basic point is that has common mean and covariance matrix • If were a polynomial, GLS likelihood methods would be natural

The Semiparametric Model: Special Case • Method: Replace polynomial GLS likelihood with GLS local likelihood with weights • Then do GLS on the derived variable • Semiparametric efficient

Profile Method: General Case • Given b, solve for Q, say • Then fit GLS or W.I. to the model with mean • In this general case, how you estimate Q matters • Working independence • Standard kernel • Pseudo-observation kernel • SUR kernel

Profile Methods • In this general case, how you estimate Q matters • Working independence • Standard kernel • Pseudo-observation kernel • SUR kernel • We have published the asymptotically efficient score, but not how to implement it

Profile Methods • Naisyin Wang’s talk will describe • These phenomena • Search for an efficient estimator • Loss of efficiency for using working independence to estimate Q • Examples where ignoring the correlation can change conclusions

Conclusions (1/3): Nonparametric Regression • In nonparametric regression • Kernels = splines for working independence (W.I.) • Weighting is important for W.I. • Working independence is inefficient • Standard kernels splines for correlated data

Conclusions (2/3): Nonparametric Regression • In nonparametric regression • Pseudo-observation methods improve upon working independence • SUR kernels = splines for correlated data • Splines and SUR kernels are not local • Splines and SUR kernels are local in pseudo-observations

Conclusions (3/3): Semiparametric Regression • In semiparametric regression • Profile methods are a general class • Fully efficient parameter estimates are easily constructed if X is not time-varying • When X is time-varying, method of estimating affects properties of parameter estimates • Ignoring correlations can change conclusions (see N. Wang talk)

Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Pr