Download Presentation
## Assumptions

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**“Essentially, all models are wrong, but some are useful”**Your model has to bewrong… … but that’s o.k.if it’s illuminating! George E.P. Box**Linear ModelAssumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Linear ModelAssumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Absence of Collinearity**Baayen(2008: 182)**Absence of Collinearity**Baayen(2008: 182)**Demo**Where does collinearitycome from? …most often, correlated predictor variables**Linear ModelAssumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Leverage**Baayen(2008: 189-190)**Leave-one-outInfluence Diagnostics**DFbeta (…and much more)**Linear ModelAssumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Normality of Error**The error (not the data!) is assumed to be normally distributed So, the residuals should be normally distributed**xmdl = lm(y ~ x)**hist(residuals(xmdl)) ✔**qqnorm(residuals(xmdl))**qqline(residuals(xmdl)) ✔**qqnorm(residuals(xmdl))**qqline(residuals(xmdl)) ✗**Linear ModelAssumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Homoskedasticity of Error**The error (not the data!) is assumed to have equal variance across the predicted values So, the residuals should have equal variance across the predicted values**WHAT TO IF NORMALITY/HOMOSKEDASTICITY IS VIOLATED?**Either: nothing + report the violation Or: report the violation + transformations**Two types of transformations**LinearTransformations NonlinearTransformations Leave shape of the distributionintact (centering, scaling) Do change the shape of the distribution**After transformation**Still bad…. …. but better!!**Assumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Assumptions**Normality of Errors Homoskedasticity of Errors (Histogram of Residuals) Q-Q plot of Residuals Residual Plot**Assumptions**Normality of Errors Homoskedasticity of Errors Absence ofCollinearity No influentialdata points Independence**Assumptions**Absence ofCollinearity No influentialdata points Normality of Errors Homoskedasticity of Errors Independence**Common experimental data**Subject Item... Item #1 Item... Rep 1 Rep 3 Rep 2**Common experimental data**Pseudoreplication = Disregarding Dependencies Subject Item... Item #1 Item... Rep 1 Rep 3 Rep 2**Subject1 Item1**Subject1 Item2 Subject1 Item3 … … Subject2 Item1 Subject2 Item2 Subject3 Item3 …. … “pooling fallacy” Machlis et al. (1985) “pseudoreplication” Hurlbert (1984)**Hierarchicaldataiseverywhere**• Typological data(e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011) • Organizational data • Classroom data**Finnish**Norwegian Swedish English German Hungarian French Romanian Italian Spanish Turkish**Finnish**Norwegian Swedish English German Hungarian French Romanian Italian Spanish Turkish**Hierarchicaldataiseverywhere**Class 1 Class 2**Hierarchicaldataiseverywhere**Class 1 Class 2**Hierarchicaldataiseverywhere**Class 1 Class 2**Hierarchicaldataiseverywhere**IntraclassCorrelation (ICC)