1 / 19

SHRUG, Feb 2013: Networking exercise

SHRUG, Feb 2013: Networking exercise. Many Ways to Solve a SAS Problem. Data sets. work.cohort. work.sesquart. What do you consider?. How easy was the code to read and understand? How short was the code? In the code samples that follow, are there new techniques that you’ll use?

earl
Download Presentation

SHRUG, Feb 2013: Networking exercise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHRUG, Feb 2013: Networking exercise Many Ways to Solve a SAS Problem

  2. Data sets work.cohort work.sesquart

  3. What do you consider? • How easy was the code to read and understand? • How short was the code? • In the code samples that follow, are there new techniques that you’ll use? • Is there new syntax you learned? • Does the formatting of the code help legibility? • Do you see code that could create more user-friendly output?

  4. Solution 1 Code & Results – David Moses, Ron Dewar proc rank data=COHORT out=COHORT_QUARTILE groups=4; var INCOME; ranks INCOME_QUARTILE; run; work.cohort_quartile

  5. Solution 2: Code – Art Tabachneck data _for_format; set SESQUART; fmtname="income"; type="N"; do label=1 to 4; if label eq 1 then do; start=0; end=Pctl_25-.0000000001; end; else if label eq 2 then do; start=Pctl_25; end=Pctl_50-.0000000001; end; else if label eq 3 then do; start=Pctl_50; end=Pctl_75-.0000000001; end; else do; start=Pctl_75; end=99999999999; end; output; end; run; proc format cntlin = _for_format; run ; data want; set cohort; quartile=put(income,income.); run;

  6. Solution 2 - Results work._for_format work.want

  7. Solution 3: Code – Anonymous data inc_quart; * this appends the income cutpoints to the cohort dataset; if _n_ = 1 then do; set SESQUART (keep = Pctl_25 Pctl_50 Pctl_75); end; else; set COHORT ; * assign cutpoints to an array, so income can be compard in a loop; array q(*) Pctl_25 Pctl_50 Pctl_75 ; income_quartile = .; * who knows? income may be missing; if income ^= . then do; quart = 1; do i = 1 to 3; if income > q(i) then income_quartile = i+1; end; end; drop i Pctl_25 Pctl_50 Pctl_75 ; run;

  8. Solution 3 - Results work.inc_quart

  9. Solution 4: Code – Peter Ye proc sqlnoprint; select Pctl_25,Pctl_50,Pctl_75 into:q25, :q50, :q75 from sesquart; quit; data cohort; set cohort; select; when(missing(income)) quantile = .; when(income < &q25) quantile = 25; when(income < &q50) quantile = 50; when(income < &q75) quantile = 75; otherwise quantile = 100; end; run;

  10. Solution 4 - Results work.cohort

  11. Solution 5 Code: Anonymous (x2) * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile‘ ; run; * put income into quartiles; proc sql feedback; create table incquartiles as select a.* , put(income,incfmt.) as incquart length=35 from work.cohort as a ; quit;

  12. Solution 5 - Results work.incquartiles

  13. Solution 6: Code – Anonymous data a; set work.sesquart; x=1; *x=1 added to this record; data b; set work.cohort; x=1; *x=1 added to each record; data c; merge a b; by x; *25th, 50th & 75th percentiles added to each record; q1=(y>pctl_75) +(y>pctl_50) +(y>pctl_25) +(income>=0); *quartile calculated by method 1; q2=(y>=pctl_75)+(y>=pctl_50)+(y>=pctl_25)+(income>=0); *quartile calculated by method 2; q =(q1+q2)/2; * 0 signifies invalid (missing or negative) income *1.5 signifies income can be classified as either 1st or 2nd quartile; *2.5 signifies income can be classified as either 2nd or 3rd quartile; *3.5 signifies income can be classified as either 3rd or 4th quartile; drop pctl_25 pctl_50 pctl_75 q1 q2 x; *drop variables that are no longer required; /* Dataset c contains the original cohort data with quartile (q) added to each record. q= 0 signifies an invalid (missing or negative) income. q=1.5 signifies ties in the value of income. Client could be classified as either 1st or 2nd cohort. q=2.5 signifies ties in the value of income. Client could be classified as either 2nd or 3rd cohort. q=3.5 signifies ties in the value of income. Client could be classified as either 3rd or 4th cohort. */ RUN;

  14. Solution 6 - Results work.c NOTE: Variable y is uninitialized. NOTE: There were 1 observations read from the data set WORK.A. NOTE: There were 10 observations read from the data set WORK.B. NOTE: The data set WORK.C has 10 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.01 seconds

  15. Solution 7 – Code: Anonymous * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile' ; run; * create a new data set with income categorized into quartiles; data quartiles_inc; set cohort; if _n_=1 then set sesquart; attrib incquart length=$35 label='Income quartile'; * categorize income based on observation; incquart=put(income,incfmt.); run;

  16. Solution 7 - Results work.quartiles_inc

  17. Summary of Techniques

More Related