1 / 46

SAS Efficiency Techniques and Methods

SAS Efficiency Techniques and Methods. By Kelley Weston Sr. Statistical Programmer Quintiles. Efficiency can be measured in many ways – e.g.: CPU time Disk space required Memory Input / Output Original Programmer time Maintenance Programmer time. Outline Copying a data set

jacob
Download Presentation

SAS Efficiency Techniques and Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Efficiency Techniques and Methods ByKelley Weston Sr. Statistical Programmer Quintiles

  2. Efficiency can be measured in many ways – e.g.: • CPU time • Disk space required • Memory • Input / Output • Original Programmer time • Maintenance Programmer time

  3. Outline • Copying a data set • Changing attributes • Appending data sets • Procedures • Rename variables • Alternatives to logical OR constructs • Formats • Indexing • Disk Space • Views • Proc Sort and disk space • Hashing – merging data sets

  4. Avoid reading the dataset multiple times Data work.data1 work.data2 work.data3; Set lib1.master; If substr(x, 1, 1) EQ 'A'thenoutput work.data1; Elseif substr(x, 1, 1) EQ 'B'thenoutput work.data2; Elseif substr(x, 1, 1) EQ 'C'thenoutput work.data3; Run; Data work.data1 work.data2 work.data3; Set lib1.master; Type = substr(x, 1, 1); If type EQ 'A'thenoutput work.data1; Elseif type EQ 'B'thenoutput work.data2; Elseif type EQ 'C'thenoutput work.data3; Drop type; Run;

  5. Copying a dataset: /* inefficient */ data work.data; set lib1.data; run; /* efficient */ procdatasetslib = work nolist; copyin = lib1 out = work; select data; quit;

  6. Changing Attributes: /* inefficient */ /* reads & writes one observation at a time */ data work.data; set lib1.data; label age = 'Years'; format salary dollar10.; rename cars = autos; Run;

  7. Changing Attributes: /* efficient */ procdatasetslib = work nolist; copyin = lib1 out = work; select data; modify data (label = "Demographic Data"); label age = 'Years'; format salary dollar10.; rename cars = autos; change data = demograph; contentsdata = demograph; quit;

  8. Appending datasets: /* inefficient */ /* reads and writes one observation at a time */ data work.data1; set work.data1 work.data2; run;

  9. Appending datasets: /* efficient */ procdatasetsnolist; append base = work.data1 data = work.data2; quit;

  10. Give procedures what they need – but no more: /* inefficient */ Procsortdata = lib1.data Out = work.data; By var1 var2 var3; Run;

  11. Give procedures what they need – but no more: /* still inefficient, but less so */ Procsortdata = lib1.data Out = work.data (drop = var4); By var1 var2 var3; Run;

  12. Give procedures what they need – but no more: /* Efficient */ Procsortdata = lib1.data (drop = var4) Out = work.data; By var1 var2 var3; Run;

  13. Give procedures what they need – but no more: • Tools to limit datasets: • Drop / keep • Subsetting if • Where • Firstobs / Obs • Use these on the input dataset as much as possible • NOTE 1: Where has its affect after the Drop / Keep • NOTE 2: You can now use Firstobs / Obs along with Where; Where has its affect first, then Firstobs / Obs

  14. Use rename rather than reassign: • Reassign: • Creates another variable (probably needlessly) • Takes up more space in the data set • Executes needlessly each execution of the data step • Is slower • Rename: • Occurs just once, at compile time, not execution time • Might be able to avoid reading the data one observation at a time

  15. Use rename rather than reassign: /* inefficient */ Data work.data; Set lib1.data; Var2 = var1; Run;

  16. Use rename rather than reassign: /* still inefficient, but less so */ Data work.data; Set lib1.data; Rename Var1 = var2; Run;

  17. Use rename rather than reassign: /* Efficient */ Procdatasetslib = work Nolist; Copyin = lib1 Out = work; Select data; Modify data; Rename var1 = var2; quit;

  18. Avoid using logical OR constructs • Can use IN() or Select/when – much faster, easier to code and understand (consider the order of the choices listed) • data work.data1; • set lib1.data1; • length msg $20; • select (category); • when ('A') • msg = 'Category A'; • when ('B') • msg = 'Category B'; • when ('C') • msg = 'Category C'; • otherwise • msg = 'Category unknown'; • end; • run;

  19. Use formats to make coding shorter and more to the point: procformat ; value $status 'M','SEP' = 'A' 'S', 'D', 'W' = 'B'; run; data work.data; set lib1.data; where put(status, $status.) = 'A'; run; OR data work.data; set lib1.data (where = (put(status, $status.) = 'A')); Run;

  20. Consider using an index instead of sorting the dataset • When to consider using an index: • When you can subset the dataset using a where option/statement • When the dataset will be involved in a merge / join • Tip: Use “options msglevel = I” (for Info) when using indexes; gives INFO: message in log when used • /* 3 Ways of Creating an Index: */ • Data Step • Proc Datasets • Proc SQL

  21. Indices and the Data Step data work.data1 (index = (var1 = (var1) composite = (var1 var2 ))); set lib1.data1; run; /* Review index */ proccontentsdata = work.data1; run; /* Delete index – not preferred method */ data work.data1; set lib1.data1; run;

  22. Alphabetic List of Indexes and Attributes # of Unique # Index Values Variables 1 composite 50 var1 var2 2 var1 50

  23. Create index with proc datasets procdatasetslib = work nolist; Modify data1; /* do not include missing values in the index */ Indexcreatevar1 / nomiss; /* insure that index values are unique */ Indexcreate composite = (var1 var2) / unique; Quit; /* delete index */ procdatasetslib = work nolist; Modify data1; Indexdeletevar1 composite; Quit;

  24. Create index with proc sql Proc sql; /* simple index */ Createindex<index name>on<dataset>(variables(s)); Createindex prodno onwork.products(prodno); /* composite index */ /* notice the commas */ Createindex ordno onwork.orders(custno, prodno, orderno); quit; /* Delete index */ procsql; dropindex ordno, orderno fromwork.orders; dropindex prodno fromwork.products; quit;

  25. DISK SPACE CONSIDERATIONS • Consider putting long memo fields into a separate, associated dataset • Use _NULL_ as the data set name when you do not need to create a dataset (e.g., when creating macro variables) • Use the KEEP &/or DROP data set options (on input &/or output) or statements to limit the variables. • Use the WHERE data set option (on input &/or output) or statement to limit the observations.

  26. DISK SPACE CONSIDERATIONS • Use data set compression (data set options COMPRESS = YES REUSE = YES). This is primarily useful with character data, or numeric data where the numbers are small. • data work.new2 (compress = yes • reuse = yes); • set work.new; • run; • Use a data set view, rather than creating a SAS data set – essentially a “pointer” to the data. • Use SQL to merge, summarize, sort, etc. instead of a combination of DATA steps and procs.

  27. DISK SPACE CONSIDERATIONS • Store the data in the order in which it is usually required. This saves the disk space used to re-sort the data. • Create only the indexes that are needed - delete any unneeded ones. • Delete old work libraries (use Windows Explorer) • Delete old autosaved programs (use Windows Explorer) • Delete datasets in current programs as soon as possible (proc datasets)

  28. DISK SPACE CONSIDERATIONS • Use appropriate length of variables • Find length of current character values, then make length current + 15% in new dataset to allow for longer lengths • Remember that character variables get their length from their first use, if not explicitly defined (e.g., scan, repeat, symget functions have default length of 200) • For numerics, can make length as low as 3 (default of 8) – good for ages (length of 3), dates (length of 4)

  29. SAS DATA VIEWS • Can be created with either a data step or proc sql. • Does not contain any data – only instructions on how to access the data • When the original data changes, it is immediately reflected in the view.

  30. Data set View data work.class / view = class; set sashelp.class; where weight LE 125; drop height; run; title'Printing View work.class'; procprintdata = work.class heading = h n; run;

  31. Data set View /* retrieve the source of a view */ /* prints the source in the log */ dataview = work.class; describe; run;

  32. Create view in sql title'View Created with SQL'; procsql; createviewwork.subviewas select * fromsashelp.class where height GT 50 orderby name; select * fromwork.subview; quit; /* print source statements in log */ procsql; describeviewwork.subview; quit;

  33. Source stmts of SQL View 76 proc sql; 77 describe view work.subview; NOTE: SQL view WORK.SUBVIEW is defined as: select * from SASHELP.CLASS where height>50 order by name asc; 78 quit;

  34. Saving Disk Space when using Proc Sort proc sort data = sashelp.class out = work.class1 tagsort; by name; run;

  35. Tagsort: • Stores only the BY variables and the obs # in temp. files • Uses tags to retrieve the records from the input data set in sorted order • Not supported by the multi-threaded sort. • Best used when the total length of BY variables is small compared to length of entire observation. • Can increase processing time.

  36. Using Hashing to Merge Data Sets data <output data set(s)>; * Set up input and lookup datasets; * Set up hash info; * Read input dataset; * Check if key value is stored in hash; * Get value of data variable; * Write obs to the output data set(s); run;

  37. Using Hashing to Merge Data Sets data <output data set>; * Set up input and lookup datasets; if0thendo; set <input data set>; set <lookup data set>(keep = <lookup var(s)>); end; ... run;

  38. Using Hashing to Merge Data Sets data <output data set>; * Set up input and lookup datasets; * Set up hash info; retain rc 0; if _n_ EQ 1thendo; declare hash h(dataset: "<lookup data set>"); h.definekey("<key var 1>", "<key var 2>"); h.definedata("<data var 1>", "<data var 2>"); h.definedone(); * all definitions are complete; * assign missing values to vars; call missing(<key vars>, <lookup vars>); end; ... run;

  39. Using Hashing to Merge Data Sets data <output data set(s)>; * Set up input and lookup datasets; * Set up hash info; * Read input dataset; set <input data set>; ... run;

  40. Using Hashing to Merge Data Sets data <output data set(s)>; * Set up input and lookup datasets; * Set up hash info; * Read input dataset; * Check if key value is stored in hash; rc = h.check(); ... Run;

  41. Using Hashing to Merge Data Sets data <output data set(s)>; * Set up input and lookup datasets; * Set up hash info; * Read input dataset; * Check if key value is stored in hash; * Get value of data variable; if rc EQ 0thendo; rc = h.find(); end; ... run;

  42. Using Hashing to Merge Data Sets data <output data set(s)>; * Set up input and lookup datasets; * Set up hash info; * Read input dataset; * Check if key value is stored in hash; * Get value of data variable; * Write obs to the output data set(s); output <output data set>; run;

  43. Using Hashing to Merge Data Sets data <output data set>; if0thendo; set <input data set>; set <lookup data set>(keep = <lookup variable(s)>); end; retain rc 0; if _n_ EQ 1thendo; declare hash h(dataset: "<lookup data set>"); h.definekey("<key variable 1>", "<key variable 2>"); h.definedata("<data variable 1>", "<data variable 2>"); h.definedone(); * all definitions are complete; * assign missing values to vars; call missing(<key variables>, <lookup variables>); end; set <input data set>; rc = h.check(); * is key stored in hash?; if rc EQ 0thendo; rc = h.find(); * get value of data variable; end; output <output data set>; drop rc; run;

  44. Questions ?

More Related