1 / 60

Regression

Regression. Jennifer Kensler. Laboratory for Interdisciplinary Statistical Analysis. LISA helps VT researchers benefit from the use of Statistics. Experimental Design • Data Analysis • Interpreting Results Grant Proposals • Software (R, SAS, JMP, SPSS...). Walk-In Consulting

alissa
Download Presentation

Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Jennifer Kensler

  2. Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...) Walk-In Consulting Monday—Friday 12-2PM for questions requiring <30 mins Collaboration From our website request a meeting for personalized statistical advice Great advice right now:Meet with LISA before collecting your data Short Courses Designed to help graduate students apply statistics in their research All services are FREE for VT researchers. We assist with research—not class projects or homework. www.lisa.stat.vt.edu

  3. Topics • Simple Linear Regression • Multiple Linear Regression • Regression with Categorical Variables

  4. Types of Statistical Analyses

  5. Simple Linear Regression

  6. Simple Linear Regression • Simple Linear Regression (SLR) is used to model the relationship between two continuous variables. • Scatterplots are used to graphically examine the relationship between two quantitative variables. Sullivan (pg. 193)

  7. Types of Relationships Between Two Continuous Variables • Positive and negative linear relationships

  8. Types of Relationships Between Two Continuous Variables • Curved Relationship • No Relationship

  9. Correlation • The Pearson Correlation Coefficient measures the strength of a linear relationship between two quantitative variables. The sample correlation coefficient is where and are the sample means of the x and y variables respectively, and and are the sample standard deviations of the x and y variables respectively.

  10. Properties of the Correlation Coefficient • Positive values of r indicate a positive linear relationship. • Negative values of r indicate a negative linear relationship. • Values close to +1 or -1 indicate a strong linear relationship. • Values close to 0 indicate that there is no linear relation between the variables. • We only use r to discuss linear relationships between two variables. • Note: Correlation does not imply causation.

  11. Simple Linear Regression Can we describe the behavior between the two variables with a linear equation? • The variable on the x-axis is often called the explanatory or predictor variable. • The variable on the y-axis is called the response variable.

  12. Simple Linear Regression • Objectives of Simple Linear Regression • Determine the significance of the predictor variable in explaining variability in the response variable. • (i.e. Is per capita GDP useful in explaining the variability in life expectancy?) • Predict values of the response variable for given values of the explanatory variable. • (i.e. if we know the per capita GDP can we predict life expectancy?) • Note: The predictorvariable does not necessarily cause the response.

  13. Simple Linear Regression Model • The Simple Linear Regression model is given by where is the response of the ith observation is the y-intercept is the slope is the value of the predictor variable for the ith observation is the random error

  14. SLR Estimation of Parameters • The equation for the least-squares regression line is given by where is the predicted value of the response for a given value of x

  15. The Residual • The residual is the observed value of y minus the predicted value of y. • The residual for observation i is given by

  16. Simple Linear Regression Assumptions • Linearity • Observations are independent • Based on how data is collected. • Check by plotting residuals in the order of which the data was collected. • Constant variance • Check using a residual plot (plot residuals vs. ). • The error terms are normally distributed. • Check by making a histogram or normal quantile plot of the residuals.

  17. Diagnostics: Residual Plot • A residual plot is used to check the assumption of constant variance and to check model fit (is a line a good fit). • Good residual plot: no pattern

  18. Diagnostics • Left: Residuals show non-constant variance. • Right: Residuals show non-linear pattern.

  19. Diagnostics: Normal Quantile Plot • Left: Residuals are not normal • Right: Normality assumption appropriate

  20. ANOVA Table for Simple Linear Regression The F-test tests whether there is a linear relationship between the two variables. Null Hypothesis Alternative Hypothesis

  21. Test for Parameters • Test whether the true y-intercept is different from 0. • Test whether the true slope is different from 0. • Note: For simple linear regression this test is equivalent to the overall F-test.

  22. Coefficient of Determination • The coefficient of determination, , is the percent of variation in the response variable explained by the least squares regression line. • Note: • We also have

  23. Muscle Mass Example • A nutritionist randomly selected 15 women from each ten year age group beginning with age 40 and ending with age 79. The nutritionist recorded the age and muscle mass of each women. The nutritionist would like to fit a model to explore the relationship between age and muscle mass. (Kutner et al. pg. 36)

  24. JMP: Making a Scatterplot • To analyze the data click Analyze and then select Fit Y by X.

  25. JMP: Making a Scatterplot • As shown below Y, Response: Muscle Mass X, Factor: Age

  26. JMP: Scatterplot • This results in a scatter plot.

  27. JMP: Simple Linear Regression • To perform the simple linear regression click on the Red Arrow and then select Fit Line.

  28. Simple Linear Regression Results • The results on the right are displayed.

  29. JMP: Diagnostics • Click on the Red Arrow next to Linear Fit and select Plot Residuals.

  30. Diagnostic Plots • The plots to the right are then added to the JMP output.

  31. Multiple Linear Regression

  32. Multiple Linear Regression • Similar to simple linear regression, except now there is more than one explanatory variable. • Body fat can be difficult to measure. A researcher would like to come up with a model that uses the more easily obtained measurements of triceps skinfold thickness, thigh circumference and midarm circumference to predict body fat. (Kutner et al. pg. 256)

  33. First Order Multiple Linear Regression Model • The multiple linear regression model with p-1 independent variables is given by where are parameters are known constants

  34. Multiple Linear Regression ANOVA Table The ANOVA F-test tests Tests can also be performed for individual parameters. (i.e. vs.

  35. Coefficient of Multiple Determination • The coefficient of multiple determination, , is the percent of variation in the response y explained by the set of explanatory variables. • The adjusted coefficient of determination, , introduces a penalty for more explanatory variables.

  36. Assumptions of Multiple Linear Regression • Observations are independent • Based on how data is collected (plot residuals in the order of which the data was collected). • Constant variance • Check using a residual plot (plot residuals vs. , plot residuals vs. each predictor variable). • The error terms are normally distributed. • Check by making a histogram or normal quantile plot of the residuals.

  37. Commercial Rental Rates • A real estate company would like to build a model to help clients make decisions about properties. The company has information about rental rate (Y), age (X1), operating expenses and taxes (X2), vacancy rates (X3), and total square footage (X4). The information is regarding luxury real estate in a specific location. (Kutner et al. pg. 251)

  38. JMP: Commercial Rental Rates • First, examine the data. Click Analyze, then Multivariate Methods, then Multivariate.

  39. JMP: Scatterplot Matrix • For Y, Columns enter Y, X1, X2, X3 and X4. Then click OK.

  40. JMP: Correlations and Scatterplot Matrix

  41. JMP: Fitting The Regression Model • Click Analyze and then select Fit Model.

  42. JMP: Fitting the Regression Model • Y: Y, Highlight X1, X2, X3 and X4 and click Add. Then click Run.

  43. Fitting the Model • Examining the parameter estimates we see that X3 is not significant. • Fit a new model this time omittingX3.

  44. Some JMP Output

  45. JMP: Checking Assumptions • Included output • Need residuals: • Click the red arrow next to Y Response → Save Columns → Residuals

  46. JMP: Check Normality Assumption • Analyze → Distribution → Y, Columns: Residual Y • Click the red arrow next to Distribution Residual Y and select Normal Quantile Plot.

  47. JMP: Checking Residuals vs. Independent Variables • Analyze →Fit Y by X → Y, Columns: Residual Y X, Factor: X1, X2, X4

  48. Other Multiple Linear Regression Issues • Outliers • Higher Order Terms • Interaction Terms • Multicollinearity • Model Selection

  49. Regression with Categorical Variables

  50. Regression with Categorical Variables • Sometimes there are categorical explanatory variables that we would like to incorporate into our model. • Suppose we would like to model the profit or loss of banks last year based on bank size and type of bank (commercial, mutual savings, or savings and loan). (Kutner et al. pg. 340)

More Related