330 likes | 355 Views
Learn to restructure datasets in SAS by collapsing multiple observations per patient to one observation per patient, with sample programs.
E N D
Lesson 9 - Topics • Restructuring datasets • LSB: 6:14 • Programs 16-18 in course notes
Some Tabulate Options PROCTABULATE S=[cellwidth=150]; CLASS sex ; VAR varlist / S=[cellwidth=275]; TABLE …/ box=‘Dietary Nutrient’; RUN;
Two Structures for Data A001 125 130 140 01/15/95 07/15/95 01/15/96 A001 125 01/15/95 A001 130 07/15/95 A001 140 01/15/96 3 obs/patient Advantages for each format
Restructure Data Read in multiple observations per patient and write out one observation per patient Read in one observation per patient and write out multiple observations per patient
Restructuring Datasets Data for a patient sometimes comes as multiple observations ptid visit date weight id1 0 01/15/98 200 id1 3 04/18/98 190 id1 6 07/13/98 196 id2 0 11/14/98 165 id2 3 02/18/99 160 id2 6 05/13/99 155 Wish to collapse data so all information for a patient is contained on 1 observation
Program 16 Collapsing multiple observations per patient to one observation per patient DATA visit; INFILE‘C:\SAS_Files\multrec.data'; INPUT ptid $ visit group weight sbp; RUN; PROCSORTDATA=visit; BY ptid visit; PROCPRINTDATA = visit ; TITLE'Display of Mult obs/patient dataset '; RUN;
Display of Mult obs/patient dataset Obs ptid visit group weight sbp 1 A00083 3 2 128 112 2 A00083 6 2 124 105 3 A00083 9 2 125 113 4 A00083 12 2 125 113 5 A00301 3 6 205 118 6 A00301 6 6 209 109 7 A00312 3 3 134 127 8 A00312 6 3 134 129 9 A00312 9 3 132 135 10 A00312 12 3 131 113
Want dataset to look like this: Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A00083 2 128 124 125 125 112 105 113 113 2 A00301 6 205 209 . . 118 109 . . 3 A00312 3 134 134 132 131 127 129 135 113 4 A00354 3 149 145 142 140 124 139 124 116 5 A00400 5 190 . . . 121 . . . 6 A00504 4 184 186 176 181 112 124 118 115 7 A00608 2 . 188 189 . . 130 114 . Several ways to do this.
/******************************************************* Here is one solution. Separate datasets are created for each visit and then merged. The visit-varying variables are renamed using PROC DATASETS. ********************************************************/ DATA visit3 visit6 visit9 visit12; SET visit; if visit = 3thenoutput visit3; else if visit = 6thenoutput visit6; else if visit = 9thenoutput visit9; else if visit =12thenoutput visit12; run; These all have one row per id
* Need to rename variables to have unique names; PROCDATASETS; MODIFY visit3; RENAME weight = weight3 sbp = sbp3; MODIFY visit6; RENAME weight = weight6 sbp = sbp6; MODIFY visit9; RENAME weight = weight9 sbp = sbp9; MODIFY visit12; RENAME weight = weight12 sbp = sbp12; RUN; DATA patient; MERGE visit3 visit6 visit9 visit12; BY ptid; PROCPRINT; TITLE'Desired Dataset Using Separate Dataset Technique'; RUN; Procedure to add/change names, labels, or formats
PROCPRINTDATA=patient (OBS=7); TITLE'1-Obs per patient dataset'; RUN; 1-Obs per patient dataset Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A00083 2 128 124 125 125 112 105 113 113 2 A00301 6 205 209 . . 118 109 . . 3 A00312 3 134 134 132 131 127 129 135 113 4 A00354 3 149 145 142 140 124 139 124 116 5 A00400 5 190 . . . 121 . . . 6 A00504 4 184 186 176 181 112 124 118 115 7 A00608 2 . 188 189 . . 130 114 .
A General WayUsing Data Step • Read-in 1st obs for a patient • Create new variables to hold values of weight and BP from 1st obs of patient • Repeat above 2 steps for 2nd obs of patient • When all data for a patient is complete then output variables to dataset
PROGRAM 16 DATA patient; SET visit; BY ptid; RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12; if FIRST.ptid = 1thendo; wt3=.; wt6=.; wt9=.; wt12=.; sbp3=.; sbp6=.; sbp9=.; sbp12=.; end; if visit = 3thendo ; wt3 = weight; sbp3 = sbp; end; if visit = 6thendo ; wt6 = weight; sbp6 = sbp; end; if visit = 9thendo ; wt9 = weight; sbp9 = sbp; end; if visit = 12thendo ; wt12 = weight; sbp12 = sbp; end; if LAST.ptid = 1thenOUTPUT; KEEP ptid group sbp3 sbp6 sbp9 sbp12 wt3 wt6 wt9 wt12;
SET visit; BY ptid; Creates two special variables called FIRST.ptid = 1 when reading first obs for a patient = 0 otherwise LAST.ptid = 1 when reading last obs for a patient = 0 otherwise With these variables you can know if you are processing the first or last observation for a patient
* Tells SAS not to reset these variables to missing when going to top of datastep; RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12; * Set variables to missing when reading new patient – clear previous patients data!; if FIRST.ptid = 1thendo; wt3=.; wt6=.; wt9=.; wt12=.; sbp3=.; sbp6=.; sbp9=.; sbp12=.; end;
* Assign variables depending on visit; if visit = 3thendo ; wt3 = weight; sbp3 = sbp; end; if visit = 6thendo ; wt6 = weight; sbp6 = sbp; end; if visit = 9thendo ; wt9 = weight; sbp9 = sbp; end; if visit = 12thendo ; wt12 = weight; sbp12 = sbp; end; *Output variables only when done with patient; if LAST.ptid = 1thenOUTPUT;
PROCPRINTDATA=patient (OBS=7); TITLE'1-Obs per patient dataset'; RUN; 1-Obs per patient dataset Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A00083 2 128 124 125 125 112 105 113 113 2 A00301 6 205 209 . . 118 109 . . 3 A00312 3 134 134 132 131 127 129 135 113 4 A00354 3 149 145 142 140 124 139 124 116 5 A00400 5 190 . . . 121 . . . 6 A00504 4 184 186 176 181 112 124 118 115 7 A00608 2 . 188 189 . . 130 114 .
PROCTRANSPOSEDATA=visit out=wtdata PREFIX = wt; BY ptid; ID visit; VAR weight ; PROCPRINTDATA=wtdata; TITLE'Weights For Patient On Same Record'; Weights For Patient On Same Record Obs ptid _NAME_ wt3 wt6 wt9 wt12 1 A00083 weight 128 124 125 125 2 A00301 weight 205 209 . . 3 A00312 weight 134 134 132 131 4 A00354 weight 149 145 142 140 5 A00400 weight 190 . . . 6 A00504 weight 184 186 176 181 7 A00608 weight . 188 189 . PROC TRANSPOSE can sometimes help you restructure datasets
Going the Other Way From: A001 125 130 140 01/15/95 07/15/95 01/15/96 To: A001 125 01/15/95 A001 130 07/15/95 A001 140 01/15/96 3 obs/patient
Program 17 DATA multrec ; INFILE‘C:\SAS_Files\' OBS=8 ; INPUT @1 ptid $10. @14 rdate mmddyy10. @25 group 1. @63 date6 mmddyy10. @74 date12 mmddyy10. @115 sbpbl 3. @119 sbp6 3. @123 sbp12 3. ;
* Creating 3 observations per patient; sbp = sbpbl; visit = '00'; datev = rdate; OUTPUT; *output 1st obs for patient; sbp = sbp6; *change values visit = '06'; datev = date6; OUTPUT; *output 2nd obs with new values; sbp = sbp12; visit = '12'; datev = date12; *output 3rd obs and values OUTPUT; KEEP ptid group sbp visit datev;
PROCSORTDATA=multrec; BY ptid visit; PROCPRINTDATA=multrec; VAR ptid group sbp visit datev; RUN; Listing of Dataset With Multiple Observations Per Patient Obs ptid group sbp visit datev 1 A01088 6 154 00 02/23/88 2 A01088 6 116 06 08/15/88 3 A01088 6 117 12 02/13/89 4 B00025 6 162 00 12/05/86 5 B00025 6 122 06 06/03/87 6 B00025 6 130 12 12/03/87 7 B00644 6 151 00 05/21/87 8 B00644 6 140 06 11/24/87 9 B00644 6 . 12 . 10 B00714 4 130 00 06/12/87 11 B00714 4 132 06 01/14/88 12 B00714 4 144 12 06/16/88
PROCPRINTDATA=multrec; VAR ptid group sbp visit datev; WHERE sbp > 140 ; TITLE'Observations With SBP > 140'; RUN; Observations With SBP > 140 Obs ptid group sbp visit datev 1 A01088 6 154 00 02/23/88 4 B00025 6 162 00 12/05/86 7 B00644 6 151 00 05/21/87 12 B00714 4 144 12 06/16/88 16 B01408 2 144 00 11/12/87 19 C03615 5 143 00 11/10/87 23 D01348 2 150 06 05/23/88 24 D01348 2 142 12 11/30/88
PROCSORT; BY ptid DESCENDING sbp; DATA highbp; SET multrec; BY ptid; if FIRST.ptid; *Select the first obs for each patient - highest BP; PROCPRINTDATA=highbp; VAR ptid group sbp visit datev; TITLE'Listing of Highest SBP for Each Patient'; RUN; Listing of Highest SBP for Each Patient Obs ptid group sbp visit datev 1 A01088 6 155 00 02/23/88 2 B00025 6 162 00 12/05/86 3 B00644 6 151 00 05/21/87 4 B00714 4 144 12 06/16/88
Program 18 *Want to know the most common side effects ; LIBNAME class ‘C:\SAS_Files'; DATA se; SET class.tomhs (KEEP = ptid group clinic sex se12_1-se12_20 ); sidenum = 1; severity = se12_1; if severity > 1thenoutput; sidenum = 2; severity = se12_2; if severity > 1thenoutput;
Program 18 *Want to know the most common side effects ; LIBNAME class ‘C:\SAS_Files'; DATA se; SET class.tomhsp (KEEP = ptid group clinic sex se12_1-se12_20 ); ARRAY se(20) se12_1 - se12_20; DO sidenum = 1to20; severity = se(sidenum); if severity > 1thenOUTPUT; output only if have se; END; KEEP ptid sidenum group severity; RUN;
PROCPRINTDATA=se (OBS=20); TITLE'List of Side Effects by Patient'; RUN; List of Side Effects by Patients - First 20 Obs ptid group sidenum severity 1 A00083 2 13 2 2 A00301 6 2 2 3 A00301 6 13 2 4 A00301 6 18 2 5 A00312 3 6 2 6 A00312 3 8 2 7 A00312 3 13 2 8 A00354 3 3 2 9 A00504 4 3 2 10 A00504 4 6 2 Note: Number of obs on dataset is total number of se
PROCFORMAT; VALUE setype 1='Drowsiness' 2='Tiredness' 3='Faintness' 4='Itchy Skin' 5='Skin Rash' 6='Headaches' 7='Ringing in Ears' 8='Stuffy Nose' 9='Dry Mouth' 10='Cough' 11='Fast Heart Rate' 12='Chest Pain' 13='Joint Pain' 14='Swelling Feet' 15='Muscle Cramps' 16='Numbness' 17='Trouble Sleeping' 18='Wake Up Early' 19='Mood Changes' 20='Depressed';
PROCFREQDATA=se ; TABLES sidenum; FORMAT sidenum setype. ; TITLE'Number of Patients With Indicated Side Effect In Order on Form'; RUN; PROCFREQDATA=se ORDER = FREQ ; TABLES sidenum; FORMAT sidenum setype. ; TITLE'Number of Patients With Indicated Side Effect - In Order of Frequency'; RUN;
Number of Patients With Indicated Side Effect In Order on Form Cum Cum sidenum Frequency Percent Frequency Percent Drowsiness 11 4.42 11 4.42 Tiredness 19 7.63 30 12.05 Faintness 3 1.20 33 13.25 Itchy Skin 13 5.22 46 18.47 Skin Rash 8 3.21 54 21.69 Headaches 23 9.24 77 30.92 Ringing in Ears 10 4.02 87 34.94 Stuffy Nose 28 11.24 115 46.18 Dry Mouth 9 3.61 124 49.80 Cough 13 5.22 137 55.02 Fast Heart Rate 5 2.01 142 57.03 Chest Pain 5 2.01 147 59.04 Joint Pain 35 14.06 182 73.09 Swelling Feet 3 1.20 185 74.30 Muscle Cramps 10 4.02 195 78.31 Numbness 9 3.61 204 81.93 Trouble Sleeping 9 3.61 213 85.54 Wake Up Early 14 5.62 227 91.16 Mood Changes 8 3.21 235 94.38 Depressed 14 5.62 249 100.00
Number of Patients With Indicated Side Effect - In Order of Frequency The FREQ Procedure Cumulative Cumulative sidenum Frequency Percent Frequency Percent Joint Pain 35 14.06 35 14.06 Stuffy Nose 28 11.24 63 25.30 Headaches 23 9.24 86 34.54 Tiredness 19 7.63 105 42.17 Depressed 14 5.62 119 47.79 Wake Up Early 14 5.62 133 53.41 Cough 13 5.22 146 58.63 Itchy Skin 13 5.22 159 63.86 Drowsiness 11 4.42 170 68.27 Muscle Cramps 10 4.02 180 72.29 Ringing in Ears 10 4.02 190 76.31 Dry Mouth 9 3.61 199 79.92 Numbness 9 3.61 208 83.53 Trouble Sleeping 9 3.61 217 87.15 Mood Changes 8 3.21 225 90.36 Skin Rash 8 3.21 233 93.57 Chest Pain 5 2.01 238 95.58 Fast Heart Rate 5 2.01 243 97.59 Faintness 3 1.20 246 98.80 Swelling Feet 3 1.20 249 100.00
PROCPRINT DATA= se NOOBS ; VAR ptid group sidenum; FORMAT sidenum setype. group group.; WHERE severity = 4; TITLE'List of Patients With a Severe Side Effect'; RUN;
List of Patients With a Severe Side Effect ptid group sidenum A00608 Calcium Channel Blocker Stuffy Nose A00967 Placebo Headaches A00967 Placebo Joint Pain B00056 Placebo Chest Pain B03077 Calcium Channel Blocker Drowsiness B03077 Calcium Channel Blocker Tiredness B03077 Calcium Channel Blocker Joint Pain C00049 Calcium Channel Blocker Joint Pain C00628 ACE Inhibitor Muscle Cramps C04514 ACE Inhibitor Itchy Skin C04514 ACE Inhibitor Skin Rash D01348 Calcium Channel Blocker Trouble Sleeping D01348 Calcium Channel Blocker Wake Up Early