1 / 33

Raymond J. Carroll Texas A&M University and University of Technology Sydney

Bayesian Methods for Density and Regression Deconvolution. Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll. Co-Authors.  Bani Mallick Abhra Sarkar .  John Staudenmayer Debdeep Pati .

evelyn
Download Presentation

Raymond J. Carroll Texas A&M University and University of Technology Sydney

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Methods for Density and Regression Deconvolution Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll

  2. Co-Authors  BaniMallick Abhra Sarkar   John Staudenmayer Debdeep Pati 

  3. Longtime Collaborators in Deconvolution Peter Hall Aurore Delaigle Len Stefanski

  4. Overview • My main application interest is in nutrition • Nutritional intake is necessarily multivariate • Smart nutritionists have recognized that in cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat • To affect public health practice, nutritionists have developed scores that characterize how well one eats • Healthy Eating Index, Dash score, Mediterranean score, etc.

  5. Overview • One day of French fries/Chips will not kill you • It is your long-term average pattern that is important • In population public health science, long term averages cannot be measured • The best you can get is some version of self-report, e.g., multiple 24 hour recalls • This fact has been the driver behind much of measurement error modeling, especially including density deconvolution

  6. Overview • Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc. • My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros.

  7. Why it Matters • What % of kids U.S. have alarmingly bad diets? • Ignore measurement error, 28% • Account for it, 8% • What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40? • Ignore measurement error, decrease 10% • Account for it, decrease 35%

  8. Overview • We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors • The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology • For the multivariate case, computation is “Bayesian” • Eventually though, anything random is assumed to be Gaussian • Can we not do better?

  9. Background • In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable • Instead, a proxy for it, W, is observed • In the density problem, the goal is to estimate the density of X using only observations on W • Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this)

  10. Background • In the regression problem, there is a response Y • One goal is to estimate E(Y | X) • Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters

  11. Background • In the classic problem, W = X + U, with U independent on X. • Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and  ) • They are kernel density estimates with kernel function

  12. Background • In the classic problem, W = X + U, with Uindependentof X. • The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h, • Lots of results on rates of convergence, etc.

  13. Background • There is an R package called decon • However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors • Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection

  14. Problem Considered Here • Here is a general class of models. Here are W and X • The W’s are independent given X

  15. Background • There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu) • The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold)

  16. Problem Considered Here • Here is a general class of models, First, Y • The classical heteroscedastic model where the variance is important • Identified if there are 2 replicate W’s

  17. Background • The econometric literature invariably uses sieves with orthogonal basis functions • The theory follows X. Shen’s 1997 paper

  18. Background • In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions • Constraints (such as being positive and integrating to 1 for densities) are often ignored • In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around • Maybe use multivariate Hermite series?

  19. Problem Considered Here • There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X

  20. Problem Considered Here • It seems to me that there are two ways to handle this problem in general • Sieves  be an econometrician • Bayesian with flexible models • Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach

  21. Model Formulation • We borrow from Hu and Schennach’s example and also Staudenmayer, Ruppert and Buonaccorsi • Here, U is assumed independent of X • Also, e is independent of X

  22. Model Formulation • Our model is • Like previous authors, we model as B-splines with positive coefficients • We model as B-spline • As frequentists, we could model the densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory • We have not investigated this

  23. Model Formulation • Our model is • As Bayesians, we have modeled the densities of X, U, and e by DPMM • We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically

  24. Model Formulation • We found that by fixing the number of components to a largish number works best • The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context) • There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc.

  25. Model Formulation • Here is a simulation example of density deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U • The settings come from a paper not by us • There are 3 replicates, so the density of U is also estimated by our method (we let DKDE know the truth) • I ran our R code as is, with no fine tuning

  26. Model Formulation

  27. Model Formulation • Here is another example • Y = sodium intake as measured by a food frequency questionnaire (known to be biased) • W = same thing, but measured by a 24 hour recall (known to be almost unbiased) • We have R code for this

  28. Model Formulation The dashed line is the Y=X line, indicating the bias of the FFQ

  29. Multivariate Deconvolution • There are also multivariate problems of density deconvolution • We have found 4 papers about this • 3 deconvoluting kernel papers, all assume the density of the measurement errors is known • 1 of those papers has a bandwidth selector • Bovy et al (2011, AoAS) model X as a mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm.

  30. Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as • Again, X is a mixture of multivariate normals, as is U • However, standard multivariate inverse Wishart computations fail miserably

  31. Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as • We use a factor analyticrepresentation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson) • This is crucial in flexibly lowering the dimension of the covariance matrices

  32. Multivariate Deconvolution Multivariate inverse Wisharts on top, Latent factor model on bottom Blue = MIW, green = MLFA. Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium

  33. Conclusion • I still want to get to my problem of multiple nutrients/foods, excess zeros and measurement error • Dimension reduction and flexible models seem a practical way to go • Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence

More Related