1 / 49

APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE

APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE. CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez. Perspective. Research Techniques Accessing, Examining and Saving Data Univariate Analysis – Descriptive Statistics Constructing (Manipulating) Variables Association – Bivariate Analysis

reina
Download Presentation

APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez

  2. Perspective • Research Techniques • Accessing, Examining and Saving Data • Univariate Analysis – Descriptive Statistics • Constructing (Manipulating) Variables • Association – Bivariate Analysis • Association – Multivariate Analysis • Comparing Group Means – Bivariate • Multivariate Analysis - Regression

  3. Lecture 7 Multivariate Analysis With Linear Regression

  4. Lectures 5 and 6 examined methods for testing relationships between 2 variables: bivariate analysis • Many projects, however, require testing the association of multiple independent variables with a dependent variable: multivariate analysis • Multivariate analysis is performed after the researchers understand the characteristics of individual variables (univariate) and the relationships between any 2 variables (bivariate)

  5. Reasons for Multivariate Analysis • Social behavior is usually associated with many factors and can not be explained by the association with just one variable. By including more than one variable in the statistical model, the researcher can create a more accurate model to predict or explain social behavior

  6. Reasons for Multivariate Analysis • Multivariate analysis can account for the influence of spurious factors by introducing control variables

  7. Linear Regression • Used when the increase in an independent variable is associated with a consistent and constant change in the dependent variable. • The dependent variable should be numeric and conform to a normal distribution

  8. LR: Bivariate Example • Using the States data, we will study the relationship between poverty and teen births.

  9. LR: A Bivariate example • The graph indicates that teenage births seem to increase with poverty rate. • Using Linear Regression, we will create an equation that can be used to illustrate this tendency • Load the States dataset

  10. LR: A Bivariate example

  11. LR: A Bivariate example

  12. LR: A Bivariate example

  13. LR: A Bivariate example • The R2 measures the usefulness of the model: • A value of 1 indicates that 100% of the variation in the dependent variable is explained by variations in the independent variable • A value of 0.455 indicates that 45.5% of the variation in the teenage birth rate from state to state can be explained by variations in poverty rates. The remaining 54.5% can be explained by other factors not included in the model

  14. LR: A Bivariate example • The ANOVA measured if the model fitted the data: • The results indicated that the variation explained by the regression model was about 41 times larger than that explained by other factors. • The P value lower than 0.001 indicated that the chances of this being due to random chance were very small, i.e. the model used fitted the data

  15. LR: A Bivariate example • B, (slope) is the size of the difference in the dependent variable corresponding to a change of one unit in the independent variable • The value of 2.735 in this model indicates that for every 1% change in poverty rate there is a predicted increase in the teen birth rate of nearly 3 births (2.735) • The significance score of 0.000 indicates that there is a significant association between teen birth rate and poverty

  16. LR: A Bivariate example • The constant (intercept) is the predicted value of the dependent variable when the independent variable is zero. • In this case, the constant indicates that there would be 15 teen births per 1000 teenage women even if there were no poor people in a state

  17. Making Predictions • The linear regression equation is: Y’ = a + bX • Y’ is the predicted value of the dependent variable • a is the constant • b is the slope • X is the value of the independent variable

  18. Making Predictions • In our case, the regression equation is: Y’ = 15.16 + 2.735X • If we wanted to predict the teenage birth rate for a poverty rate of 20%: • Y’ = 15.16 + 2.735 x 20 = 69.86 • Predictions should be limited to the available range of values of the independent variable (in our case between 1% and 22%)

  19. Graphing Bivariate Regression lines

  20. Graphing Bivariate Regression lines

  21. Graphing Bivariate Regression lines

  22. Graphing Bivariate Regression lines

  23. Graphing Bivariate Regression lines

  24. Graphing Bivariate Regression lines

  25. Multiple Linear Regression • Regression model includes more than one independent variable • We’ll look at some factors affecting teenage birth rate: • Poverty (PVS500) • Expenditures per pupil (SCS141) • Unemployment rate (EMS171) • Amount of welfare a family gets (PVS526)

  26. Multiple Linear Regression

  27. Multiple Linear Regression

  28. Multiple Linear Regression

  29. MLR: Coefficients • Looking at the significance tests for the coefficients, only 2 are significant: • States with higher poverty rates have higher teenage birth rates (1.506 per 10000 women) for every 1% raise in poverty rates. • States that give more welfare aid had lower teen birth rates (-0.0379) for every $1 given as welfare aid.

  30. MLR: R - Squared • MLR uses the AdjustedR2 instead of the R2 to account for only those variables that contribute significantly to the model • The AR2 in this case, 0.594, indicates that the model accounts for 59.4% of the variation in the teenage birth rate

  31. MLR: R - Squared • The ANOVA indicates that the variables considered account for about 19 times of the variation due to other causes. The P<0.001 indicates that the model is a good fit to the data.

  32. Multiple Regression Equation • The equation is: Y’ = 41.874 + 1.506X1 - 0.0009X2 + 2.515X3 -0.037X4 • X1 : Poverty Rate in 1998 – PVS500 • X2 : Expenditures per pupil – SCS141 • X3 : Unemployment rate – EMS171 • X4 : Amount of welfare received – PVS526

  33. Graphing the Multiple Regression • The multiple regression equation is: Y’ = a + b1X1 + b2X2 + b3X3 + b4X4 • Y’ is the predicted value of the dependent variable • a is the constant • bi is the slope for variable i • Xi is the value of the independent variable i

  34. Graphing the Multiple Regression • Dependent variable is plotted against one independent variable at a time • The other variables are held constant, at any value, but usually at their mean value • We will graph the association between welfare benefits and teenage birth rates holding poverty rates, school expenditures and unemployment rates at their mean values • This requires computing TEENPRE, the predicted value of teen birth rate according to the equation

  35. Graphing the Multiple Regression • Transform • Compute • Target Variable: TEENPRE • Numeric Expression: 41.874 + (1.506*12.73) + (-0.0009*6341.98) + (2.515*4.16) + (-0.037*PVS526) • Type and Label • Label: Predicted Teenage Birth Rate • Continue • OK

  36. Graphing the Multiple Regression

  37. Graphing the Multiple Regression

  38. Graphing the Multiple Regression

  39. Graphing the Multiple Regression

  40. Graphing the Multiple Regression

  41. Linear Regression Concerns • Linear Relationships • A numerical dependent variable • Normality of residuals • The residuals should follow a normal distribution with a mean of 0 • Check is this is the case by saving and plotting the residuals when doing the MLR

  42. Normality of Residuals

  43. Normality of Residuals

  44. Normality of Residuals

  45. Normality of Residuals

  46. Normality of Residuals

  47. Normality of Residuals

  48. Normality of Residuals

  49. Normality of Residuals

More Related