1 / 22

Statistics Project

Statistics Project. By: Rich Miktus, Christopher Geigel, Brandon Butch. 2004 Data - Raw New Jersey Counties. Abuse and Neglect Referrals of Children Special Education Enrollment Number of Child Arrests Average Income of Families with Children Child Poverty Child Population

kanoa
Download Presentation

Statistics Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Project By: Rich Miktus, Christopher Geigel, Brandon Butch

  2. 2004 Data - Raw New Jersey Counties • Abuse and Neglect Referrals of Children • Special Education Enrollment • Number of Child Arrests • Average Income of Families with Children • Child Poverty • Child Population • Total Population • School Enrollment

  3. VariablesPer Capita • Abuse • Poverty • Special Education • Income • School Enrollment • Arrests • Population Density

  4. Introduction • One Variable Analysis • Histograms • Scatterplots • Q-Q Plots • Two Variable Analysis • Linear models • Regression analysis • Simple Models • Arrests • School Enrollment • Residual Diagnostics

  5. One Variable Analysis • Histograms & Scatterplots • Frequency of occurrences • Skew of data • Q-Q Plot • Normal distribution • Usefulness of variables • Real-life relationships • Data flaws

  6. Example HistogramPopulation Density

  7. Example Q-Q PlotSchool Enrollment

  8. Two Variable Analysis • Correlation Table – used to check initial predictions • Linear regression line • Residuals • How much do our explanatory variables matter?

  9. Two Variable Analysis • More refined analysis to test: • Arrests ~ abuse, special education enrollment, poverty, school • School enrollment ~ income, poverty, abuse, population density

  10. Arrests vs. Abuse • Good linear fit – strong correlation • Residuals relatively small • Large F Statistic, small P Value

  11. School vs. Income • Relationship is very weak • No strong, overall trend • Possible weak, positive correlation

  12. Two Variable AnalysisConclusions • Arrests strongly correlated with abuse, moderately correlated with special education enrollment and poverty, and not correlated with school enrollment • School enrollment strongly with population density, and not related to income, poverty and abuse

  13. Simple ModelsSchool Enrollment • Possible variables • Abuse • Income • Poverty • Population density

  14. Income and Poverty Correlation Variance Inflation Factors Best Regression By AIC Not enough applicable data -0.911 Income: 8.489 Poverty: 9.278 School~Density Underfitted Flawed variables Problems with School Model

  15. Simple ModelsArrests • Possible variables • Abuse • Special Eduacation • Poverty • Population Density • School enrollment

  16. Problem High correlation and VIFs with explanatory variables Multicollinearity Fix Removed Income (too similar to poverty) Proceeded to refine the model and it worked itself out Problems with Arrests Model

  17. Arrests Modelchoosing a model • The Test for best fit • AIC goodness test • Arrests~Abuse + Special Ed + Poverty + Density + School • Arrests~Abuse + Special Ed + School

  18. Residual plots led to possible transformation on School To choose transform used GAM plots Residual DiagnosticsModel Refinement

  19. Residual DiagnosticsModel Refinement • Used a Cubic transform • Resulted in a higher Adj R squared value • New Model didn’t have normal residuals • Rejected the model

  20. Box Cox Plot Lowest near 0 No transform required Residual DiagnosticsModel Refinement

  21. LRPlot One obvious non influential outlier Easily removed without damage to the model Residual DiagnosticsRemoving Outliers

  22. Conclusions • Good linear fit between arrests and its explanatory variables; not so for school enrollment • Juvenile arrests can be modeled by: Arrests = 2.58 + 0.21(Sped) + 0.95(Abuse) – 0.08(School) • Not enough appropriate data to make a model for school enrollment • Improvements • Check correlation of variables earlier • Additional data acquisition

More Related