Data Transformation For Normality

Data Transformation For Normality • An assumption of our analysis is that the data is normally distributed • If the data is not normally distributed, then you must do a transformation to get normal data: • Log(Y), 1/Y, SQRT(Y)

Log transformation reduces larger numbers by a greater percentage than smaller numbers.

Residual Plots • Residual value = Observed – Predicted • For Regression Equation: Y=(2.197*X)-395.32 • Our YObs=76 and X=208. Based on out equation though: If X= 208, then YPred=60.5536 • Our residual for this X = 76-60.5536 = 15.4464

Based on Y=(2.197*X)-395.32 = Wt – Pred_Wt

Weight Not Transformed Length Residual Plots If there is a pattern to a residual plot, then you should do a data transformation.

Based on Y=(0.0053*X) + 0.8357 = LogWt – P_LogWt

Weight Transformed Length Residual Plots The residual plot of normally distributed data should not have an obvious pattern

Weight Not Transformed Length Weight Transformed Length

More on Regression Equation: Y=mX+b Y=(2.197*X)-395.32 • What does this equation tell us? • The predicted value of Y for a given X • If X=220; then Y=86.85 • What does b reflect? • It is where the regression line crosses the Y axis; where X=0. • Y = (2.197*0) – 395.32 = -395.32 • This says the weight of white trout = -395.32g when the length = 0mm; make sense to you? • Also, a white trout that weighs 0g should be 180 mm long (solve regression for X when Y=0)? • Can not extrapolate beyond your data set!!!!!

What does m reflect? • It represents how much Y changes with a change in X • For every 1 mm increase in length, the predicted weight value increase by 2.197 g (NOT =(2.197*X)-395.32). • What if length increases by 10  weight increases by 21.97 • What if length increases by 23  23 * 2.197 = 50.531

Comparing Two Regression Lines

First thing to do is to use regression to get slopes. proc sort; by sex; proc reg; by sex; model weight=width; run; Will give us the slopes for male and female.

Comparing the Growth Rate • m for female = 6.17430 • m for male = 7.95468 • The difference is 6.17430 – 7.95468 = -1.7803834 • Significant?

Proc GLM (General Linear Models) • A blend of ANOVA and Regression in SAS proc sort; by sex; proc reg; by sex; model weight=width; run; proc glm; class sex; model weight=sex width sex*width / solution; run; Will give us the slopes for male and female. Can use Proc GLM in place of Proc Reg. Will compare the slopes for male and female.

Output to Look For Because P < 0.05, we can say the two slopes are significantly different from one another and that male crabs are heaver than a female of = width.

Data Transformation For Normality