10 generalized linear models
Download
1 / 44

10. Generalized linear models - PowerPoint PPT Presentation


  • 162 Views
  • Uploaded on

10. Generalized linear models. 10.1 Homogeneous models Exponential families of distributions, link functions, likelihood estimation 10.2 Example: Tort filings 10.3 Marginal models and GEE 10.4 Random effects models 10.5 Fixed effects models

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 10. Generalized linear models' - merritt-pickett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
10 generalized linear models
10. Generalized linear models

  • 10.1 Homogeneous models

    • Exponential families of distributions, link functions, likelihood estimation

  • 10.2 Example: Tort filings

  • 10.3 Marginal models and GEE

  • 10.4 Random effects models

  • 10.5 Fixed effects models

    • Maximum likelihood, conditional likelihood, Poisson data

  • 10.6 Bayesian Inference

  • Appendix 10A Exponential families of distributions


10 1 homogeneous models
10.1 Homogeneous models

  • Section Outline

    • 10.1.1 Exponential families of distributions

    • 10.1.2 Link functions

    • 10.1.3 Likelihood estimation

  • In this section, we consider only independent responses.

    • No serial correlation.

    • No random effects that would induce serial correlation.


Exponential families of distributions
Exponential families of distributions

  • The basic one parameter exponential family is

    • Here, y is a response and q is the parameter of interest.

    • The parameter f is a scale parameter that we often will assume is known.

    • The term b(q) depends only on the parameter q, not the responses.

    • S(y, f) depends only on the responses and the scale parameter, not the parameter q.

    • The response y may be discrete or continuous.

  • Some straightforward calculations show that

    E y = b¢(q) and Var y = b²(q) f.


Special cases of the basic exponential family
Special cases of the basic exponential family

  • Normal

    • The probability density function is

    • Take m = q, s2 = f , b(q) = q 2/2 and

      S(y, f ) = - y2 / (2f) - ln(2 pf))/2 .

    • Note that E y = b¢(q) = q =m and Var y = b¢¢(m) s2 = s2.

  • Binomial, n trials and prob p of success

    • The probability mass function is

    • Take ln (p/(1-p))= logit (p) = q, 1 = f , b(q) = n ln (1 + eq) and S(y, f ) = ln((n choose y)) .

    • Note that E y = b¢(q) = neq/(1 + eq) = np and Var y = b¢¢(q) (1) = neq/(1 + eq)2 = np(1-p) , as anticipated.


Another special case of the basic exponential family
Another special case of the basic exponential family

  • Poisson

    • The probability mass function is

    • Take ln (l) = q, 1 = f , b(q) =eq and S(y, f ) = -ln( y!)) .

    • Note that E y = b¢(q) = eq = l and

    • Var y = b¢¢(q) (1) = eq = l , as anticipated.


10 1 2 link functions
10.1.2 Link functions

  • To link up the univariate exponential family with regression problems, we define the systematic component of yit to be

    hit = xitb .

  • The idea is to now choose a “link” between the systematic component and the mean of yit , say mit , of the form:

    hit = g(mit) .

    • g(.) is the link function.

  • Linear combinations of explanatory variables, hit = xitb, may vary between negative and positive infinity.

    • However, means may be restricted to smaller range. For example, Poisson means vary between zero and infinity.

    • The link function serves to map the domain of the mean function onto the whole real line.


Bernoulli illustration of links
Bernoulli illustration of links

  • Bernoulli means vary between 0 and 1, although linear combinations of explanatory variables may vary between negative and positive infinity.

  • Here are three important examples of link functions for the Bernoulli distribution:

    • Logit: h = g(m) = logit(m) = ln (m/(1- m)) .

    • Probit: h = g(m) = F-1(m) , where F-1 is the inverse of the standard normal distribution function.

    • Complementary log-log: h = g(m) = ln ( -ln(1- m) ) .

  • Each function maps the unit interval (0,1) onto the whole real line.


Canonical links
Canonical links

  • As we have seen with the Bernoulli, there are several link functions that may be suitable for a particular distribution.

  • When the systematic component equals the parameter of interest (h = q ), this is an intuitively appealing case.

    • That is, the parameter of interest, q , equals a linear combination of explanatory variables, h.

    • Recall that h = g(m) and m = b¢(q).

    • Thus, if g-1 = b¢, then h = g(b¢(q)) = q.

    • The choice of g, such that g-1 = b¢, is called a canonical link.

  • Examples: Normal: g(q) = q, Binomial: g(q) = logit(q), Poisson: g(q) = ln q.


10 1 3 estimation
10.1.3 Estimation

  • Begin with likelihood estimation for canonical links

  • Consider responses yit, with mean mit, systematic component hit = g(mit) = xitb and canonical link so that hit = qit.

    • Assume the responses are independent.

  • Then, the log-likelihood is


Mles canonical links
MLEs - Canonical links

  • The log-likelihood is

  • Taking the partial derivative with respect to b yields the score equations:

    • because mit = b¢(qit) = b¢(xit¢b ).

  • Thus, we can solve for the mle’s of b through:

    0 = Sitxit (yit - mit).

    • This is a special case of the method of moments.


Mles general links
MLEs - general links

  • For general links, we no longer assume the relation qit = xit¢b.

  • We assume that bis related to qit through

    mit = b¢(qit) and hit = xit¢b = g(mit).

  • Recall that the log-likelihood is

    • Further, E yit = mit and Var yit = b¢¢(qit) / f .

  • The jth element of the score function is

    • because b ¢(qit) = mit


Mles more on general links
MLEs - more on general links

  • To eliminate qit, we use the chain rule to get

  • Thus,

  • This yields

  • This is called the generalized estimating equations form.


Overdispersion
Overdispersion

  • When fitting models to data with binary or count dependent variables, it is common to observe that the variance exceeds that anticipated by the fit of the mean parameters.

    • This phenomenon is known as overdispersion.

    • A probabilistic models may be available to explain this phenomenon.

  • In many situations, analysts are content to postulate an approximate model through the relation

    Var yit = 2 b(xitβ) / wit.

    • The scale parameter  is specified through the choice of the distribution

    • The scale parameter σ2 allows for extra variability.

  • When the additional scale parameter σ2 is included, it is customary to estimate it by Pearson’s chi-square statistic divided by the error degrees of freedom. That is,



Offsets
Offsets

  • We assume that yit is Poisson distribution with parameter

    POPit exp(xitβ),

    • where POPit is the population of the ith state at time t.

  • In GLM terminology, a variable with a known coefficient equal to 1 is known as an offset.

  • Using logarithmic population, our Poisson parameter for yit is

  • An alternative approach is to use the average number of tort filings as the response and assume approximate normality.

    • Note that in the Poisson model above the expectation of the average response is

    • whereas the variance is


Tort filings
Tort filings

  • Purpose: to understand ways in which state legal, economic and demographic characteristics affect the number of filings.

  • Table 10.3 suggests more filings under JSLIAB and PUNITIVE but less under CAPS

  • Table 10.5

    • All variables under the homogenous model are statistically significant

    • However, estimated scale parameter seems important

      • Here, only JSLIAB is (positively) statistically significant

    • Time (categorical) variable seems important


10 3 marginal models
10.3 Marginal models

  • This approach reduces the reliance on the distributional assumptions by focusing on the first two moments.

  • We first assume that the variance is a known function of the mean up to a scale parameter, that is, Var yit = v(mit) f .

    • This is a consequence of the exponential family, although now it is a basic assumption.

    • That is, in the GLM setting, we have Var yit = b¢¢(qit) f and mit = b¢(qit).

    • Because b(.) and f are assumed known, Var yitis a known function of mit .

  • We also assume that the correlation between two observations within the same subject is a known function of their means, up to a vector of parameters t.

    • That is corr(yir , yis ) = r(mir, mis, t) , for r( .) known.


Marginal model

This framework incorporates the linear model nicely; we simply use a GLM with a normal distribution.

However, for nonlinear situations, a correlation is not always the best way to capture dependencies among observations.

Here is some notation to help see the estimation procedures.

Define mi= (mi1,mi2, ..., miTi)´ to be the vector of means for the ith subject.

To express the variance-covariance matrix, we

define a diagonal matrix of variances

Vi = diag(v(mi1),..., v(miTi) )

and the matrix of correlations Ri(t) to be a matrix with r(mir, mis , t) in the rth row and sth column.

Thus, Var yi = Vi1/2Ri(t) Vi1/2.

Marginal model


Generalized estimating equations
Generalized estimating equations simply use a

  • These assumptions are suitable for a method of moments estimation procedure called “generalized estimating equations” (GEE) in biostatistics, also known as the generalized method of moments (GMM) in econometrics.

  • GEE with known correlation parameter

    • Assuming t is known, the jth row of the GEE is

    • Here, the matrix

    • is Tix K*.

  • For linear models with mit= zit ai + xitb, this is the GLS estimator introduced in Section 3.3.


Consistency of gees
Consistency of GEEs simply use a

  • The solution, bEE, is asymptotically normal with covariance matrix

    • Because this is a function of the means, mi, it can be consistently estimated.


Robust estimation of standard errors
Robust estimation of standard errors simply use a

  • empirical standard errors may be calculated using the following estimator of the asymptotic variance of bEE


Gee correlation parameter estimation
GEE - correlation parameter estimation simply use a

  • For GEEs with unknown correlation parameters, Prentice (1988) suggests using a second estimating equation of the form:

    • where

    • Diggle, Liang and Zeger (1994) suggest using the identity matrix for most discrete data.

  • However, for binary responses,

    • they note that the last Ti observations are redundant because yit = yit2 and should be ignored.

    • they recommend using


Tort filings1
Tort filings simply use a

  • Assume an independent working correlation

    • This yields at the same parameter estimators as in Table 10.5, under the homogenous Poisson model with an estimated scale parameter.

    • JSLIAB is (positively) statistically significant, using both model-based and robust standard errors.

  • To test the robustness of this model fit, we fit the same model with an AR (1) working correlation.

    • Again, JSLIAB is (positively) statistically significant.

    • Interesting that CAPS is now borderline but in the opposite direction suggested by Table 10.3


10 4 random effects models
10.4 Random effects models simply use a

  • The motivation and sampling issues regarding random effects were introduced in Chapter 3.

  • The model is easiest to introduce and interpret in the following hierarchical fashion:

    • 1. Subject effects {ai} are a random sample from a distribution that is known up to a vector of parameters t.

    • 2. Conditional on {ai}, the responses

    • {yi1,yi2, ... , yiTi } are a random sample from a GLM with systematic component hit = zit ai + xitb .


Random effects models
Random effects models simply use a

  • This model is a generalization of:

    • 1. The linear random effects model in Chapter 3 - use a normal distribution.

    • 2. The binary dependent variables random effects model of Section 9.2 - using a Bernoulli distribution. (In Section 9.2, we focused on the case zit =1.)

  • Because we are sampling from a known distribution with a finite/small number of parameters, the maximum likelihood method of estimation is readily available.

  • We will use this method, assuming normally distributed random effects.

  • Also available in the literature is the EM (for expectation-maximization) algorithm for estimation - See Diggle, Liang and Zeger (1994).


Random effects likelihood
Random effects likelihood simply use a

  • Conditional on ai, the likelihood for the ith subject at the tth observation is

  • where b¢(qit) = E (yit | ai) and hit = zit ai + xitb= g(E (yit | ai) ).

  • Conditional on ai, the likelihood for the ith subject is:

  • We take expectations over ai to get the (unconditional) likelihood.

  • To see this explicitly, let’s use the canonical link so that qit= hit. The (unconditional) likelihood for the ith subject is

  • Hence, the total log-likelihood is Si ln li.

    • The constant SitS(yit , f) is unimportant for determining mle’s.

    • Although evaluating, and maximizing, the likelihood requires numerical integration, it is easy to do on the computer.


Random effects and serial correlation
Random effects and serial correlation simply use a

  • We saw in Chapter 3 that permitting subject-specific effects, ai, to be random induced serial correlation in the responses yit.

    • This is because the variance-covariance matrix of yit is no longer diagonal.

  • This is also true for the nonlinear GLM models. To see this,

    • let’s use a canonical link and

    • recall that E (yit | ai) ) = b¢(qit) = b¢(hit ) = b¢(ai + xit b).


Covariance calculations
Covariance calculations simply use a

  • The covariance between two responses, yi1 and yi2 , is

    Cov(yi1 , yi2 ) = E yi1yi2 - E yi1 E yi2

    = E {b¢(ai+xi1b) b¢(ai+xi2b)}

    - E b¢(ai+xi1b) E b¢(ai+xi2b)

  • To see this, using the law of iterated expectations,

    E yi1yi2 = E E (yi1yi2| ai)

    = E {E (yi1| ai) E(yi2 | ai)}

    = E {b¢(ai+ xi1 b) b¢(ai+ xi2 b)}


More covariance calculations
More covariance calculations simply use a

  • Normality

  • For the normal distribution we have b¢(a) = a.

  • Thus, Cov(yi1 , yi2 )

    = E {(ai+ xi1b) (ai + xi2b)} - E (ai + xi1b) E (ai + xi2b)

    = E ai2 + (xi1b) (xi2b)- (xi1b) (xi2b)= Var ai.

  • For the Poisson, we have b¢(a) = ea. Thus,

    E yit = E b¢(ai+ xitb) = E exp(ai+ xitb)

    = exp(xitb) E exp(ai) and

  • Cov(yi1 , yi2 )

    = E {exp(ai+ xi1b) exp(ai+ xi2b)} - exp((xi1+xi2)b) {E exp(ai)}2

    = exp((xi1+xi2)b) {E exp(2a) - (E exp(a))2 }

    = exp((xi1+xi2)b) Var exp(a) .


Random effects likelihood1
Random effects likelihood simply use a

  • Recall, from Section 10.2, that the (unconditional) likelihood for the ith subject is

  • Here, we use zit = 1,f = 1, and g(a) is the density of ai.

  • For the Poisson, we have b(a) = ea , and S(y, f) = -ln(y!), so the likelihood is

  • As before, evaluating and maximizing the likelihood requires numerical integration, yet it is easy to do on the computer.


10 5 fixed effects models
10.5 Fixed effects models simply use a

  • Consider responses yit, with mean mit, systematic component hit = g(mit) = zitai + xitb and canonical link so that hit = qit.

    • Assume the responses are independent.

  • Then, the log-likelihood is

  • Thus, the responses yitdepend on the parameters through only summary statistics.

    • That is, the statistics Styitzit are sufficient for ai .

    • The statistics Sityitxit are sufficient for b.

    • This is a convenient property of the canonical links. It is not available for other choices of links.


Mles canonical links1
MLEs - Canonical links simply use a

  • The log-likelihood is

  • Taking the partial derivative with respect to ai yields:

    • because mit = b¢(qit) = b¢(zit¢ai + xit¢b ).

  • Taking the partial derivative with respect to b yields:

  • Thus, we can solve for the mle’s of ai and b through:

    0 = Stzit (yit - mit), and 0 = Sitxit (yit - mit).

    • This is a special case of the method of moments.

    • This may produce inconsistent estimates of b , as we have seen in Chapter 9.


Conditional likelihood estimation
Conditional likelihood estimation simply use a

  • Assume the canonical link so that qit= hit = zitai + xitb .

  • Define the likelihood for a single observation to be

  • Let Si be the random vector representing St zityit and let sumi be the realization of St zityit .

    • Recall that St zityitare sufficient for ai.

  • The conditional likelihood of the data set is

    • This likelihood does not depend on {ai}, only on b.

    • Maximizing it with respect to b yields root-n consistent estimates.

  • The distribution of Si is messy and is difficult to compute.


Poisson distribution
Poisson distribution simply use a

  • The Poisson is the most widely used distribution for counted responses.

    • Examples include the number of migrants from state to state and the number of tort filings within a state.

  • A feature of the fixed effects version of the model is that the mean equals the variance.

  • To illustrate the application of Poisson panel data models, let’s use the canonical link and zit = 1, so that

    ln E (yit | ai) = g(E (yit | ai) ) = qit = hit = ai + xit b .

  • Through the log function, it links the mean to a linear combination of explanatory variables. It is the basis of the so-called “log-linear” model.


Conditional likelihood estimation1
Conditional likelihood estimation simply use a

  • We first examine the fixed effects model and thus assume that {ai} are fixed parameters.

    • Thus, E yit = exp (ai + xit b).

    • The distribution is

    • From Section 10.1, St yit is a sufficient statistic for ai.

  • The distribution of Styit turns out to be Poisson, with mean exp(ai) St exp(xit b) .

  • Note that the ratio of means,

    • does not depend on ai.


Conditional likelihood details
Conditional likelihood details simply use a

  • Thus, as in Section 10.1, the conditional likelihood for the ith subject is


Conditional likelihood details1
Conditional likelihood details simply use a

  • where

  • This is a multinomial distribution.


Multinomial distribution
Multinomial distribution simply use a

  • Thus, the joint distribution of yi1, ..., yiTi given Styit has a multinomial distribution.

  • The conditional likelihood is:

  • Taking partial derivatives yields:

    • where

    • .

  • Thus, the conditional MLE, b, is the solution of:


ad