1 / 54

Logistic regression

Logistic regression. Recall the simple linear regression model: y = b 0 + b 1 x + e. where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model:

wanda-ortiz
Download Presentation

Logistic regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic regression

  2. Recall the simple linear regression model: y = b0+ b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0+ b1x1+ b2x2+ … + + bpxp+ e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1, x2, … , xp .

  3. Now suppose the dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used

  4. Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum

  5. The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

  6. Example: odds ratio, log odds ratio Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

  7. The logisitic Regression Model Assumes the log odds ratiois linearly related to x. i. e. : In terms of the odds ratio

  8. The logisitic Regression Model Solving for p in terms x. or

  9. Interpretation of the parameter b0(determines the intercept) p x

  10. Interpretation of the parameter b1(determines when p is 0.50 (along with b0)) p when x

  11. Also when is the rate of increase in p with respect to x when p = 0.50

  12. Interpretation of the parameter b1(determines slope when p is 0.50 ) p x

  13. The data The data will for each case consist of • a value for x, the continuous independent variable • a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

  14. Estimation of the parameters The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

  15. Using SPSS to perform Logistic regression Open the data file:

  16. Choose from the menu: Analyze -> Regression -> Binary Logistic

  17. The following dialogue box appears Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

  18. Here is the output The Estimates and their S.E.

  19. The parameter Estimates

  20. Interpretation of the parameter b0(determines the intercept) Interpretation of the parameter b1(determines when p is 0.50 (along with b0))

  21. Another interpretation of the parameter b1 is the rate of increase in p with respect to x when p = 0.50

  22. Nonparametric Statistical Methods

  23. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a non-parametric model Statistical techniques that assume a non-parametric model are called non-parametric.

  24. Example – Parametric model Normal distribution – known except for the two parameters mand s. s m

  25. 0 0 Example – Non parametric model No assumptions are made about the distribution could be • normal, • skewed • bimodal etc

  26. The sign test A nonparametric test for the central location of a distribution

  27. We want to test: H0: median = m0 against HA: median m0 (or against a one-sided alternative)

  28. The Sign test: • The test statistic: S = the number of observations that exceed m0 Comment: If H0: median =m0 is true we would expect 50% of the observations to be above m0, and 50% of the observations to be below m0,

  29. If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. 50% 50% median = m0

  30. If H 0 is not true then S will still have a binomial distribution. However p will not be equal to 0.50. m0 > median p< 0.50 p median m0

  31. m0 < median p> 0.50 p median m0 p= the probability that an observation is greater than m0.

  32. Summarizing: If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. n = 10

  33. The critical and acceptance region: n = 10 Choose the critical region so that a is close to 0.05 or 0.01. e. g. If critical region is {0,1,9,10} then a= .0010 + .0098 + .0098 +.0010 = .0216

  34. e. g. If critical region is {0,1,2,8,9,10} then a= .0010 + .0098 +.0439+.0439+ .0098 +.0010 = .1094 n = 10

  35. If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and standard deviation

  36. Hence for large n,use as the test statistic (in place of S) Choose the critical region for z from the Standard Normal distribution. i.e. Reject H0 if z < -za/2 or z > za/2 two tailed ( a one tailed test can also be set up.

  37. Nonparametric Confidence Intervals

  38. Assume that the data, x1, x2, x3, … xn is a sample from an unknown distribution. Now arrange the data x1, x2, x3, … xn in increasing order x(1) < x(2) < x(3) < … < x(n) Hence x(1) = the smallest observation x(2) = the 2nd smallest observation x(n) = the largest observation

  39. Consider the kth smallest observation and the kth largest observation in the data x1, x2, x3, … xn x(k) and x(n – k + 1) P[x(k) < median < x(n – k + 1) ] Hence = P[at least k observations lie below the median and at least k observations lie above the median ] If at least k observations lie below the median than x(k) < median If at least k observations lie above the median than median < x(n – k + 1)

  40. Thus P[x(k) < median < x(n – k + 1) ] = P[at least k observations lie below the median and at least k observations lie above the median ] = P[The number of observations below the median is at least k and at most n-k] = P[k  S n-k] where S = the number of observations below the median S has a binomial distribution with n = the sample size and p =1/2.

  41. Hence P[x(k) < median < x(n – k + 1) ] = P[k  S n-k] = p(k) + p(k + 1) + … + p(n-k) = P where p(i)’sare binomial probabilities with n = the sample size and p =1/2. This means that x(k) to x(n – k + 1) is a (1 – P)100% confidence interval for the median

  42. Summarizing x(k) to x(n – k + 1) is a (1 – P)100% confidence interval for the median where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’sare binomial probabilities with n = the sample size and p =1/2.

  43. n = 10 and k =2 Example: Binomial probabilities P = p(2) + p(3) + p(4) + p(5) + p(6) + p(7) + p(8)= .9784 Hence x(2) to x(9) is a 97.84% confidence interval for the median

  44. Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

  45. The data

  46. The data arranged in order x(2) = -3 to x(9) =15 is a 97.84% confidence interval for the median

  47. Example In the previous example to repeat the study with n = 20 patients with high cholesterol.

  48. The data

  49. The binomial distribution with n = 20, p = 0.5 Note: p(6) + p(7) + p(8) + p(9) + p(10) + p(11) + p(12) + p(13) + p(14) = 0.037 + 0.0739 + 0.1201 + 0.1602 + 0.1762 + 0.1602 + 0.1201 + 0.0739 + 0.037 = 0.9586 Hence x(6) to x(15) is a 95.86% confidence interval for the median reduction in cholesterol

More Related