1 / 26

Vorlesung Generalized Linear Regressionmodels - PowerPoint PPT Presentation

  • Uploaded on

Vorlesung Generalized Linear Regressionmodels. Antonia Rom. Chapter 4 - Modeling of B inary Data Introduction What is important in modeling? Problems, Obstacles 4.1 Maximum Likelihood Estimation What is the ML-estimation? Single Binary Response Grouped Data

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Vorlesung Generalized Linear Regressionmodels' - jerzy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Generalized Linear Regressionmodels

Antonia Rom

Chapter 4 - Modeling of Binary Data


What is important in modeling?

Problems, Obstacles

4.1 Maximum Likelihood Estimation

What is the ML-estimation?

Single Binary Response

Grouped Data

Asymptotic Properties

Existence of ML-Estimates

Estimation conditioned on predictor values


A generalized linear model consists of:

- a probability distribution from the exponential family

- a linear predictor η = Xβ

- a link function (with a response function h such that E(Y) = h-1(η))

Binary regression model

h is a fully specified function.

In this chapter the logit model is used. In this case h is the logistic distribution function

Linear predictor


Response function


The link function is the inverse of the response function = h’

It determines functional form of the response probabilities.

The Linear predictor determines which variables are included and in what form they determine the response - The unknown parameters, β, can be estimated with maximum likelihood estimation.

The maximum likelihood estimation is a iterative algorithm.

-> Linear predictors can contain polynomial versions of continuous variables, dummy variables and interaction effects.

Care should be taken when specifying constituents of the model like

the linear predictors!


Discrepancy between data and model. Does the fit of the model support the inferences drawn in the model?

Relevance of variables and form of the linear predictor. Which variables should be included and how?

Explanatory power of the covariates

Prognostic power of the model

Choice of link funkction. Which link funktion fits the data well and has a simple interpretation!

Aspects are not independent:

Model should present appropriate approximation with simple predictor, specification determines the goodness-of-fit

linear predictor aims at finding an adequate form of covariates, reducing variable set, explanatory value aims at quantifying the effect of the covariates within the model

First chapter about estimation - Maximum likelihood estimation!!

Maximum-Likelihood Estimation

Basic principle is to construct the likelihood of the unknown parameters for the sample data. (Which parameter (mean, variance) makes the sample the most likely.) The distribution has to be known!

The likelihood represents the joint probability or probability density of the observed data, considered as a function of the unknown parameters.

What does this mean in praxis?


2 MP3 – Players , exact the same, only shuffle mode without display!

1 with 5 songs 1 with 20 songs

Each MP3-Player contains your favorite song,

Unfortunately you mixed both of them. So you take one, turn it on and your favorite song is played.

If you would have to bet, which one would you choose? – The one with 5 songs!!!

Maximum-Likelihood Estimation

An event A happened. One tries to find inference on an underlying variable B (e.g. a special parameter).

Therefore one looks on the conditional probability for A for all possible estimations ˆbi of B, if ˆbi is true.

The value of ˆbi, for which P(A|ˆbi) is a maximum, is the best predictor for b.

The conditional probability P(A|ˆbi) counts for the given event A.

P(A|ˆbi) is also called L(ˆbi) (Likelihood of ˆbi).

The ML-estimator is the value, for which the likelihood is a maximum.

-> therefore the name Maximum Likelihood.

Maximum-Likelihood Estimation

If there are too many ˆbi, it is not possible to calculate every L(ˆbi).In this case a Likelihood-Function is built and the maximum is calculated with L’(ˆb) = 0.

Fact (1)

The likelihood L(X) is not the probability for the event x to happen, but the conditional probability for the already happened event y, if x already happened before.

L(X) = P(Z|X)

Fact (2)

Sum of all Likelihoods is not 1.

MLE – Definition: Single binary response

The likelihood for the response is given by

Observations are considered independent. The maximum likelihood of β are those values of β^ that maximizes the likelihood.

L values can get very small so log-likelihood is used instead

The value β^ can be obtained by solving the system of equations

Derivatives are the so-called score function s(β)

(iterative solving)

MLE – Definition:

grouped data – binomially distributed responses

Several, independent binary responses

P is assumed to depend on x only, the mean is assumed to be the same for all the binary observations at this value.

The model has the form:

For the collection of binary variables the likelihood has the form

The likelihood for the number of success defined as Lbin(β) and the binary observation likelihood L(β) differ in the binomial factor , which is irrelevant during maximization, because it doesn‘t depend on β. Therefore the log-likelihood is:

The score function of the logit model is:

MLE – Definition: Asymptotic Properties

The MLE has some favorable properties.

The MLE estimator exists and is unique asymptotically.

It is consistent and asymptotically normally distributed.

It is asymptotically efficient.


Likelihood is a smooth function and behaves in a nice way, and it‘s maximum is achieved in a unique point

Two functions Ln and L are getting closer, the points of each maximum should also get closer which exactly means that

MLE – Definition: Asymptotic Properties

Asymptotic normality:

The estimator not only converges to the unknown parameter, but it converges fast enough.

In MLE theory the asymptotic variance of the estimator is determined by the information or the Fischer-Matrix

For binary data

For grouped data

MLE – Definition:

Existence of maximum Likelihood Estimates

For a finite sample size it may happen, that ML estimators do not exist.

ML-Estimates do not exist, when you have a data set with complete separation

ML-Estimates may not exist, if you have a data set with quasi-complete separation.

ML-Estimates do exist, when you have a data set with overlap.

ML- Estimates do exist, when you have a data set with linear dependency.

ML-Estimates exist, if there is no hyper plane that separates the 0 and the 1 responses.

MLE – Definition:

Estimation conditioned on Predictor Values

Sometimes samples can be conditional on the response y.

In such stratified samples one observes x values given at y=1 and x values given at y=0.

A common case is case-control studies in biomedicine., where y = 1(cases) and y=0 (controls)

(choice-based sampling in econometrics)

Let us consider the most simple case of binary predictor with y={0,1} and x={0,1}


is the odds ratio, which contains the association between y and x

Parameter of association is the same e estimate coefficient β of the original logit model

MLE – Definition:

Estimation conditioned on Predictor Values

This way might be motivated by the specific structure of the logit model.

We go back to chapter 2.2.2 Derivation of the binary logit model to assume that perdictors are normally distributed.

denoting the density given y=r

denoting the marginal probability

From the Bayer‘s theorem, follows:

Therefore or


This shows that a logit model holds if has a linear form and contains

and only the intercept depends on the marginal probabilities.

The important point is, that the marginals determine only the intercept!

MLE – Definition:

Estimation conditioned on Predictor Values

The likelihood for a given y differs from the likelihood given predictors.

By using one obtains for the log-likelihood conditional on y

Marginal distribution of y

(fixed by the sampling)

Equivalent to the

conditional log-likelihood

Marginal distribution of x

(can be maximized by

empirical distribution)


general binary model:

link function and linear predictor

Care should be taken when estimating these constituents!

Maximum – Likelihood Estimation

Basic principle is to construct the likelihood of the unknown parameters for the sample data!

MLE can cope with difficult and complicated linear predictors (interactions, dummy variables etc.)

iterative algorithm

Properties of MLE

It is consistent and asymptotically normally distributed.

It is asymptotic efficient. (Fischer-Matrix)

Maximum-Likelihood estimators might not exist. They do exist when the data set has overlap or linear dependency.

Depending on the data set, ML can also be conditional on the response y.

Man beachte den feinen Unterschied:

für die Wahrscheinlichkeitsfunktion interessierten wir uns, weil sie uns die Eintrittswahrscheinlichkeiten von Realisationen für gegebene Parameter θ angibt.

Bei der Likelihoodfunktion nehmen wir die Stichprobe als gegeben an und interessieren uns für den unbekannten Parameter θ, der die

Realisation der gegebenen Stichprobe ‘am wahrscheinlichsten’ macht!

MLE - Example

MLE - Example

  • Youcannowcalculatetheprobabilityof Bryant scoringtheamounts he actuallyscored.

  • Basic principleof MLE!!!

  • toconstructthelikelihoodoftheunknownparametersforthe sample data

  • Let f(ε) denote the density function for ε. (Recall that the density function is like a probability

  • function, and that the density for a normal variable is a bell curve with its maximum at ε=0.)

  • Given the prediction M and the density function, you can compute the probability of Bryant scoring any particular point total Y. This is given by the formula f(Y-M) = f(ε).

  • - For example, if you believe that M=32, then the probability that Bryant scores 35 is

  • given by f(35-32) = f(3).

  • If σ=6, for example, then examination of the normal table reveals f(3) = 08

  • Assume that Bryant’s scoring in one game is independent of what he scored in the prior game.

  • - Recall that the probability of two independent events occurring is just the product of the

  • probability that each occurs.

  • - It follows that the probability, or likelihood, of Bryant scoring exactly 33, 22, 25, 40, and 30 points is just the product of the probabilities of his getting each of these scores.

  • Given any prediction M, you can write the likelihood score as:

  • Likelihood score = L = f(33-M) · f(22-M) · f(25-M) · f(40-M) · f(30-M).

MLE - Example

  • You want to find “maximum likelihood estimator” (MLE) of M!

  • This is the value of M that maximizes L

  • - Intuitively, you know that the MLE of M would not be 15 or 50 or some number far from his typical scoring output. It is almost impossible that a player who is predicted to score 15 points per game would actually score 33, 22, 25 40, and 30.

  • In fact, if M = 15 and σ= 6, then

  • L= f(33-15) · f(22-15) · f(25-15) · f(40-15) · f(30-15) = f(18) · f(7) · f(10) · f(25) · f(15) < .0000001

  • But 32 might be a good candidate to be the MLE. Someone predicted to score 32 points per game has a reasonable chance of scoring 33, 22, 25, 40, and 30.

  • - In this case, L= f(1) · f(-10) · f(-7) · f(8) · f(-2) ≈.00005

  • - It turns out that MLE estimate of M is given by the mean of the realized values of Y.

  • That is, M = 30 and L= .00014