1 / 20

Today: March 7

Today: March 7. Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4. Transformations ANOVA and Regression. Common assumptions; normality, constant variance, linear relationship What if these aren’t true?

nakia
Download Presentation

Today: March 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today: March 7 • Data Transformations • Rank Tests for Non-Normal data • Solutions for Assignment 4

  2. TransformationsANOVA and Regression • Common assumptions; normality, constant variance, linear relationship • What if these aren’t true? • One method - transform your data to help meet the necessary assumptions (choose a different scale of measurement)

  3. Transformations Common transformations: loge, log10, square root, inverse Steps: • Choose your transformation • Re-check assumptions (residual plot) • Perform inference on transformed data Miles/hour to hours per mile

  4. PROC CONTENTS OUTPUT The CONTENTS Procedure Data Set Name: TOMHS.BPSTUDYObservations: 902 Member Type: DATA Variables: 16 Engine: V8 Indexes: 0 Created: 9:07 Saturday, February 26, 2005 Observation Length: 128 Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: 0 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ------------------------------------------ 3 AGE Num 8 16 6 CHOL12 Num 8 40 2 GROUP Num 8 8 8 HDL12 Num 8 56 9 PULSE12 Num 8 64 10 PULSEBL Num 8 72 4 SBP12 Num 8 24 5 SBPBL Num 8 32 1 SEX Num 8 0 7 TRIG12 Num 8 48 11 WT12 Num 8 80 12 WTBL Num 8 88 13 cholbl Num 8 96 14 hdlbl Num 8 104 16 id Char 6 120 15 trigbl Num 8 112 Triglycerides distributions are typically skewed

  5. The UNIVARIATE Procedure Variable: TRIG12 Histogram # Boxplot 530+* 1 * . .* 1 * . . 430+ .* 1 * . .* 1 * .** 3 * 330+* 1 0 .** 3 0 .* 2 0 .* 2 0 .*** 5 0 230+*** 5 0 .******* 13 | .********** 19 | .********* 18 | .************** 28 | 130+*************************** 53 +-----+ .***************************** 58 | + | .********************************************* 89 *-----* .********************************************* 89 +-----+ .******************************* 62 | 30+********* 18 | ----+----+----+----+----+----+----+----+----+

  6. The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: TRIG12 Normal Probability Plot 530+ * | | * | | 430+ | * | | * | ** 330+ * | ** | ** | * ++ | ** ++++ 230+ **++ | *** | +*** | ++*** | +++*** 130+ ++***** | ++**** | ****** | ******* | *********++ 30+******** +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

  7. The UNIVARIATE Procedure Trig12 Moments N 472 Sum Weights 472 Mean 110.283898 Sum Observations 52054 Std Deviation 64.9410309 Variance 4217.33749 Skewness 2.2308045 Kurtosis 8.06124788 Uncorrected SS 7727084 Corrected SS 1986365.96 Coeff Variation 58.885324 Std Error Mean 2.98915323 Basic Statistical Measures Location Variability Mean 110.2839 Std Deviation 64.94103 Median 94.0000 Variance 4217 Mode 83.0000 Range 511.00000 Interquartile Range 68.5000 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.823599 Pr < W <0.0001 Kolmogorov-Smirnov D 0.13479 Pr > D <0.0100

  8. Taking LOG Transformation – Base 10 X log10 X 10 1 100 2 1000 3 10000 4 Takes small values of X and spreads them out and takes large values of X and brings them closer together. DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log (trig12); Natural log;

  9. The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: logtrig12 Histogram # Boxplot 2.75+* 1 0 .* 2 0 .*** 6 | .*** 6 | .************ 23 | .********************* 41 | .****************************** 59 +-----+ 2.05+************************************** 76 | | .********************************************* 89 *--+--* .********************************** 68 +-----+ .************************** 52 | .**************** 31 | .******* 13 | .* 2 | 1.35+** 3 | ----+----+----+----+----+----+----+----+----+ * may represent up to 2 counts

  10. The UNIVARIATE Procedure Normal Probability Plot 2.75+ * | * | ****+ | **+++ | ***** | ***** | ****** 2.05+ ***** | ****** | ***** | ****** | ****** | ****** |+**+ 1.35+* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

  11. The UNIVARIATE Procedure Variable: logtrig12 Moments N 472 Sum Weights 472 Mean 1.98272777 Sum Observations 935.847505 Std Deviation 0.22379335 Variance 0.05008346 Skewness 0.20941868 Kurtosis 0.17927124 Uncorrected SS 1879.12014 Corrected SS 23.5893114 Coeff Variation 11.2871446 Std Error Mean 0.01030092 Basic Statistical Measures Location Variability Mean 1.982728 Std Deviation 0.22379 Median 1.973128 Variance 0.05008 Mode 1.919078 Range 1.34814 Interquartile Range 0.30586 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.996164 Pr < W 0.3132 Kolmogorov-Smirnov D 0.034423 Pr > D >0.1500

  12. TOMHS Study • 6 Treatment groups (Variable GROUP) • Beta-blocker • Calcium channel blocker • Diuretic • Alpha-blocker • ACE inhibitor • Placebo • All Treatments given lifestyle intervention to lower BP

  13. TOMHS Triglyceride Analyses • 3 Treatment groups (Variable GROUP) • Beta-blocker • Diuretic • Placebo Beta-blockers may increase triglycerides

  14. LIBNAME tomhs 'C:\my documents\ph5415\'; DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log(trig12); if group in(1,3,6); Select only group 1, 3 , and 6

  15. PROCGLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: TRIG12 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 31373.955 15686.978 3.76 0.0239 Error 469 1954992.003 4168.426 Corrected Total 471 1986365.958 Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 0.35612380 0.17806190 3.59 0.0282 Error 469 23.23318762 0.04953771 Corrected Total 471 23.58931142

  16. PROCGLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 0.35612380 0.17806190 3.59 0.0282 Error 469 23.23318762 0.04953771 Corrected Total 471 23.58931142 Dependent Variable: logtrig12x (Analyses Using LOG Scale - Base e) Sum of Source DF Squares Mean Square F Value Pr > F Model 2 1.8881321 0.9440660 3.59 0.0282 Error 469 123.1799936 0.2626439 Corrected Total 471 125.068125

  17. PROCGLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE'BB vs Diur' group 1 -10; ESTIMATE'BB vs Plac' group 10 -1; The GLM Procedure Level of ----------TRIG12--------- --------logtrig12-------- --------logtrig12x------- GROUP N Mean Std Dev Mean Std Dev Mean Std Dev 1 125 121.800000 73.6913791 2.02229444 0.23006400 4.65650504 0.52974193 3 125 112.856000 72.2005165 1.98992986 0.22306638 4.58198284 0.51362933 6 222 102.351351 53.6124266 1.95639400 0.21796954 4.50476365 0.50189340 Note SDs are much closer between groups in log scale

  18. PROCGLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE'BB vs Diur' group 1 -10; ESTIMATE'BB vs Plac' group 10 -1; Dependent Variable: TRIG12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 8.9440000 8.16668985 1.10 0.2740 BB vs Plac 19.4486486 7.21970271 2.69 0.0073 Dependent Variable: logtrig12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.03236458 0.02815321 1.15 0.2509 BB vs Plac 0.06590045 0.02488864 2.65 0.0084 Dependent Variable: logtrig12x Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.07452220 0.06482517 1.15 0.2509 BB vs Plac 0.15174139 0.05730822 2.65 0.0084

  19. Interpretation of Differences UsingNatural Log Scale (Base e) Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur 0.07452220 0.06482517 1.15 0.2509 BB vs Plac 0.15174139 0.05730822 2.65 0.0084 0.074 indicates that BB increases triglycerides by approximately 7.45% compared to diuretic 0.152 indicates that BB increases trigycerides by approximately 15.2% compared to placebo More precise estimate is 100*(exp(0.074) – 1) = 7.7% More precise estimate is 100*(exp(0.152) – 1) = 16.4%

  20. USING WILCOXON RANK TEST • Each point is given score from 1 to n. Analyses is done on these ranked values • PROCNPAR1WAYWILCOXON; • CLASS group; • VAR trig12; • RUN; • The NPAR1WAY Procedure • Wilcoxon Scores (Rank Sums) for Variable TRIG12 • Classified by Variable GROUP • Sum of Expected Std Dev Mean • GROUP N Scores Under H0 Under H0 Score • ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ • 3 125 29614.50 29562.50 1307.50251 236.916000 • 6 222 49734.00 52503.00 1479.00376 224.027027 • 1 125 32279.50 29562.50 1307.50251 258.236000 • Average scores were used for ties. • Kruskal-Wallis Test • Chi-Square 5.0323 • DF 2 • Pr > Chi-Square 0.0808

More Related