Today: Feb 28

109 Views

Download Presentation
## Today: Feb 28

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Today: Feb 28**• Reading Data from existing SAS dataset • One-way ANOVA • Reading Le 7:5 • Reading C&S 7:A-H**Reading SAS Datasets**Sometimes your “raw” data is already a SAS dataset LIBNAME tomhs 'c:/my documents/ph5415/'; PROCCONTENTSDATA=tomhs.bpstudy; PROCPRINTDATA=tomhs.bpstudy (obs=10); RUN; The libname statement tells SAS which directory (folder) the dataset is in. DATA=tomhs.bpstudy Tells SAS to look for a SAS dataset called bpstudy in the directory referenced by tomhs.**PROC CONTENTS OUTPUT**The CONTENTS Procedure Data Set Name: TOMHS.BPSTUDYObservations: 902 Member Type: DATA Variables: 16 Engine: V8 Indexes: 0 Created: 9:07 Saturday, February 26, 2005 Observation Length: 128 Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: 0 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ------------------------------------------ 3 AGE Num 8 16 6 CHOL12 Num 8 40 2 GROUP Num 8 8 8 HDL12 Num 8 56 9 PULSE12 Num 8 64 10 PULSEBL Num 8 72 4 SBP12 Num 8 24 5 SBPBL Num 8 32 1 SEX Num 8 0 7 TRIG12 Num 8 48 11 WT12 Num 8 80 12 WTBL Num 8 88 13 cholbl Num 8 96 14 hdlbl Num 8 104 16 id Char 6 120 15 trigbl Num 8 112**PROC PRINT – 10 Observations**C T U U c t G S S H R H L L h h r R B B O I D S S W W o d i O S O A P P L G L E E T T l l g b E U G 1 B 1 1 1 1 B 1 B b b b i s X P E 2 L 2 2 2 2 L 2 L l l l d 1 1 3 54 . 139.5 . . . . 76 . 224.0 205 24 179 A00001 2 2 6 62 129 144.0 241 65 66 80 72 124.0 141.0 260 75 67 A00010 3 2 5 64 118 141.0 307 425 41 80 81 144.0 157.0 228 29 564 A00021 4 1 5 47 . 134.0 . . . . 80 . 214.0 194 66 49 A00023 5 1 3 51 . 132.5 . . . . 73 . 206.5 226 40 53 A00056 6 1 2 62 133 133.0 196 72 44 72 76 211.0 227.5 207 47 126 A00075 7 2 2 59 113 136.0 231 75 61 72 74 125.0 137.0 214 62 119 A00083 8 1 3 63 127 137.5 217 137 35 64 74 195.0 211.5 214 37 165 A00105 9 2 4 64 122 151.0 201 57 44 56 63 150.0 159.5 214 47 133 A00133 10 2 5 52 122 140.0 209 105 57 60 81 168.5 196.5 215 55 105A00143**Reading a SAS Dataset**DATA temp; SET tomhs.bpstudy; sbpdif = sbp12-sbpbl; PROCMEANSDATA=temp; Reads in an observation. Replaces the infile and input statements when reading in text data The MEANS Procedure Variable N Mean Std Dev Minimum Maximum SEX 902 1.3824834 0.4862633 1.0000000 2.0000000 GROUP 902 3.7882483 1.7874130 1.0000000 6.0000000 AGE 902 54.7727273 6.4039396 44.0000000 69.0000000 SBP12 848 124.1002358 15.1891840 87.0000000 187.0000000 SBPBL 902 140.3636364 12.4446043 113.5000000 190.0000000 CHOL12 849 220.8386337 38.8624342 111.0000000 456.0000000 TRIG12 849 106.9634865 62.5307082 24.0000000 592.0000000 HDL12 849 45.4923439 12.1059688 18.0000000 102.0000000 PULSE12 847 69.3506494 10.0301471 44.0000000 112.0000000 PULSEBL 901 73.6925638 8.6698610 48.0000000 109.0000000 WT12 848 176.8225236 30.4251368 105.5000000 286.0000000 WTBL 902 187.3791574 31.0782720 113.0000000 289.2500000 cholbl 900 228.2511111 38.4169684 113.0000000 357.0000000 hdlbl 900 43.6122222 11.6124701 17.0000000 97.0000000 trigbl 900 131.7366667 76.5211232 17.0000000 815.0000000 sbpdif 848 -16.5176887 14.4532685 -75.5000000 30.0000000**One-Way Analysis of Variance**• Two-sample t-test; compare means of two groups • Are the means different? • What if we have more than two groups? Examples; • compare three different behavioral interventions • compare 5 different BP drugs**Analysis of Variance**Could compare all pairs of means with t-tests three groups: A-B, B-C, A-C five groups: A-B, A-C, A-D, A-E B-C, B-D, B-E C-D, C-E D-E**Analysis of Variance**Problem - multiple comparisons!! When performing many tests, may reject null hypothesis by chance (Type I error) With = 0.05, you allow for possibility of rejecting 1 out of 20 tests by chance Even if all group means are equal then there is a fairly large chance that one-pair will be different**Analysis of Variance**ANOVA simultaneously tests for difference in k means • Y - continuous • k samples from k normal distributions • each size ni, not necessarily equal • each with possibly different mean • each with constant variance 2**Constant variance**ANOVA is robust for violations of constant variance (and normality) Rule of thumb: If largest standard deviation is less than twice the smallest standard deviation, you’re ok. Can sometimes transform to achieve equal variance or normality**Analysis of Variance**Two-sample t-test is special case; k = 2 Ho: 1 = 2= ... = k Ha: Not all i equal For each group i; ni = number of observations = sample mean = sample variance = overall mean Sometimes referred to as a global or omnibus test**Two-sample T-test**Variation Between Groups • Compared means for two groups • This compares variation between groups with variation within groups - y y 1 2 = t 1 1 + s p n n 1 2 Variation Within Groups**ANOVA F-test**Variation Between Groups – Compared to Grand Mean • Compared means for all groups • This compares variation between groups with variation within groups = F 2 s p Variation Within Groups**Analysis of Variance**Variation for all observations: • Called the “(corrected) total sum of squares” or SST • Can be divided into two parts: • deviation of individual observation from its sample mean • deviation of sample means from overall mean Similar to regression**Analysis of Variance**Measures variation within samples Measures variation between samples Each has a corresponding “sum of squares” Sum of squares within (SSW) Sum of squares between (SSB)**Analysis of Variance**Each has a corresponding degrees of freedom (DF) SST = n-1 df SSB = k-1 df SSW = (n-1) - (k-1) = n-k df Ratio of each sum of squares over its degrees of freedom gives us the mean squares MSW = SSW / (n-k) = average variation within k samples MSB = SSB / (k-1) = average variation between k samples**Analysis of Variance**MSW is estimate of the total variance, 2 MSW = SSW/(n-k) SSW = Sample variance for ith group, = Pooled variance for k groups**Analysis of Variance**The null hypothesis is tested by looking at F ratio: F = MSB/MSW, compare to F distribution with k-1, n-k df If variation between groups much greater than variation within groups; F >> 1, reject null hypothesis F 1, fail to reject null hypothesis**Analysis of Variance**Results often presented in an ANOVA table SAS uses “Model” for “Between” and “Error” for “Within”**ANOVA in SAS; two ways**PROCANOVADATA = LIPID; CLASS diet; MODEL lipid = diet; RUN; PROCGLMDATA = LIPID; CLASS diet; MODEL lipid = diet; RUN; Both test for difference in mean lipid reduction for the two diets**PROC ANOVA and GLM**• Almost exactly the same for this case • GLM is a more general procedure**TOMHS Study**• 6 Treatment groups (Variable GROUP) • Beta-blocker • Calcium channel blocker • Diuretic • Alpha-blocker • ACE inhibitor • Placebo • All Treatments given lifestyle intervention to lower BP**ANOVA – TOMHS Study**PROCGLMDATA=temp; CLASS group; MODEL sbpdif = group; MEANS group; RUN; Creates 5 dummy variables for you OUTPUT The GLM Procedure Class Level Information Class Levels Values GROUP 6 1 2 3 4 5 6 Number of observations 902 NOTE: Due to missing values, only 848 observations can be used in this analysis**GLM – OUTPUT**The GLM Procedure Dependent Variable: sbpdif Sum of Source DF Squares Mean Square F Value Pr > F Model 5 13149.8402 2629.9680 13.52 <.0001 Error 842 163785.8945 194.5201 Corrected Total 847 176935.7347 R-Square Coeff Var Root MSE sbpdif Mean 0.074320 -84.43703 13.94705 -16.51769 ANOVA TABLE If H0 is true than F should be near 1 F = 2629.97/194.52 Pooled (over 6 groups) standard deviation Estimates s**GLM – OUTPUT**Source DF Type I SS Mean Square F Value Pr > F GROUP 5 13149.84018 2629.96804 13.52 <.0001 Source DF Type III SS Mean Square F Value Pr > F GROUP 5 13149.84018 2629.96804 13.52 <.0001 If no covariates are in the model this portion of the output will be the same as the ANOVA table because the model includes only GROUP. The GLM Procedure Level of ------------sbpdif----------- GROUP N Mean Std Dev 1 126 -20.0555556 15.3474717 2 121 -17.5289256 11.6080607 3 124 -21.8467742 14.4977118 4 129 -16.0697674 14.0005223 5 127 -17.6023622 13.1844874 6 221 -10.5950226 14.3539675**Contrasts**PROCGLMDATA=temp; CLASS group; MODEL sbpdif = group; MEANS group; ESTIMATE'BB vs Placebo' group 10000 -1 ; ESTIMATE'CCB vs Placebo' group 01000 -1 ; ESTIMATE'Diur vs Placebo' group 00100 -1 ; ESTIMATE'AB vs Placebo' group 00010 -1 ; ESTIMATE'ACE vs Placebo' group 00001 -1 ; RUN; The GLM Procedure OUTPUT Dependent Variable: sbpdif Standard Parameter Estimate Error t Value Pr > |t| BB vs Placebo -9.4605329 1.55691725 -6.08 <.0001 CCB vs Placebo -6.9339030 1.57727142 -4.40 <.0001 Diur vs Placebo -11.2517516 1.56489344 -7.19 <.0001 AB vs Placebo -5.4747448 1.54534422 -3.54 0.0004 ACE vs Placebo -7.0073396 1.55300848 -4.51 <.0001**Compare all Groups**PROCGLMDATA=temp; CLASS group; MODEL sbpdif = group; LSMEANS group/PDIF; RUN;**GLM – OUTPUT**The GLM Procedure Least Squares Means sbpdif LSMEAN GROUP LSMEAN Number 1 -20.0555556 1 2 -17.5289256 2 3 -21.8467742 3 4 -16.0697674 4 5 -17.6023622 5 6 -10.5950226 6 Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: sbpdif i/j 1 2 3 4 5 6 1 0.1550 0.3103 0.0228 0.1622 <.0001 2 0.1550 0.0156 0.4087 0.9669 <.0001 3 0.3103 0.0156 0.0010 0.0161 <.0001 4 0.0228 0.4087 0.0010 0.3796 0.0004 5 0.1622 0.9669 0.0161 0.3796 <.0001 6 <.0001 <.0001 <.0001 0.0004 <.0001 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be use P-value: Group 1 v Group 2