1 / 63

Chapter 3: Reading and Processing Data

2. Chapter 3: Reading and Processing Data. 2. Chapter 3: Reading and Processing Data. Objectives. Use SAS file I/O functions to manipulate SAS data sets. Retrieve metadata. Managing SAS Data Sets. The Orion Star programmers need macros to perform the following data management tasks:

loughman
Download Presentation

Chapter 3: Reading and Processing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2 Chapter 3: Reading and Processing Data

  2. 2 Chapter 3: Reading and Processing Data

  3. Objectives • Use SAS file I/O functions to manipulate SAS data sets. • Retrieve metadata.

  4. Managing SAS Data Sets • The Orion Star programmers need macros to perform the following data management tasks: • Test the existence of a data set. • Determine the number of observations in a data set. • Determine the age of a data set. • Archive a data set. • Create a data set for every worksheet in an Excel workbook. • They decided to use the SAS File I/O functions and metadata to accomplish these tasks.

  5. Using Functions to Manipulate Files • SAS supports different ways to manipulate and obtain information about SAS files and other files. Many of these techniques require a DATA step or PROC step to be part of the SAS code. • Some functions, generally used in the DATA step and SCL, permit direct access to files. These functions, when used with the macro facility, enable the same direct access without introducing additional program steps. • The functions can be categorized into two groups: • SAS file I/O functions • external file functions

  6. SAS File I/O Functions • Functions to access a SAS data set: • EXIST • OPEN • CLOSE • Functions to access data set descriptor information: • DSNAME • VARNUM • ATTRC • ATTRN • Functions to access data library information: • LIBREF • PATHNAME

  7. Task 1: Determine Data Set Existence %macro printds(dset); %if %sysfunc(exist(&dset))= 0 %then %do; %put ERROR: Data set &dset does not exist.; %put ERROR- Macro will terminate now.; %return; %end; proc print data=&dset (obs=10) noobs; title "First 10 Observations from &dset"; run; %mend printds; m203d01 Use the EXIST function to test for the existence of a data set before progressing further into a macro program.

  8. Task 1: Determine Data Set Existence %printds(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used: real time 0.01 seconds cpu time 0.00 seconds 29 %printds(orion.daily) ERROR: Data set orion.daily does not exist. Macro will terminate now. m203d01 Partial SAS Log Partial SAS Log

  9. Task 2: Obtain Attribute Information • The Orion Star programmers found that many times a data set might exist but is empty. They want to verify that a data set is not empty before performing further processing. • The following steps provide data set attribute information: • Open the data set using the OPEN function. • Retrieve a numeric attribute using the ATTRN function. • Retrieve a character attribute using the ATTRC function. • Close the data set using the CLOSE function.

  10. Step 1: Open the SAS Data Set OPEN(data-set-name) 4 %let dsid=%sysfunc(open(orion.daily_sales)); 5 %put dsid=&dsid; dsid=1 The OPEN function opens a SAS data set and returns a unique numeric data set identifier. The data set identifier, a nonzero positive number, is used in most other SAS File I/O functions. The OPEN function returns 0 if the data set cannot be opened. General form of the OPEN function: Partial SAS Log

  11. Step 2: Use the ATTRN Function ATTRN(data-set-identifier, attribute-name) The ATTRN function returns the value of a numeric attribute of a data set. General form of the ATTRN function: Selected attribute-name values and descriptions:

  12. Step 3: Use the ATTRC Function ATTRC(data-set-identifier, attribute-name) The ATTRC function returns the value of a character attribute of a data set. General form of the ATTRC function: Selected attribute-name values and descriptions:

  13. Step 4: Close the SAS Data Set CLOSE(data-set-identifier) 6 %let dsidc=%sysfunc(close(&dsid)); 7 %put dsidc=&dsidc; dsidc=0 The CLOSE function closes a SAS data set. The CLOSE function returns 0 if the operation was successful and returns a nonzero value if it was not successful. General form of the CLOSE function: Partial SAS Log It is important to close all SAS data sets as soon as they are no longer needed by the application.

  14. Obtaining Number of Observations %macro numobs(dsn); %local dsid nobs dsidc; %let dsn=%upcase(&dsn); %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsidc=%sysfunc(close(&dsid)); %if &nobs=0 %then %do; %put ERROR: &dsn contains 0 Observations.; %put ERROR- PROC PRINT will not execute.; %return; %end; proc print data=&dsn (obs=10) noobs; title "First 10 Observations"; title2 "&dsn Contains &nobs Observations"; run; %mend numobs; m203d02 Use the NLOBS attribute to obtain the number of observations in a data set and assign this value to a macro variable.

  15. Obtaining Number of Observations 231 %numobs(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 232 %numobs(orion.no_rows) ERROR: ORION.NO_ROWS contains 0 Observations. PROC PRINT will not execute. m203d02 Partial SAS Log

  16. Obtaining Number of Observations First 10 Observations ORION.DAILY_SALES Contains 58 Observations Total_ Product_ID Product_Name Retail_Price 220200200024 Pro Fit Gel Gt 2030 Women's Running Shoes $178.50 220200100092 Big Guy Men's Air Terra Sebec Shoes $83.00 240200100043 Bretagne Performance Tg Men's Golf Shoes L. $282.40 220100700024 Armadillo Road Dmx Women's Running Shoes $99.70 220200300157 Hardcore Men's Street Shoes Large $220.20 240200100051 Bretagne Stabilites 2000 Goretex Shoes $420.90 220200100035 Big Guy Men's Air Deschutz Viii Shoes $125.20 220200100090 Big Guy Men's Air Terra Reach Shoes $177.20 220200200018 Lulu Men's Street Shoes $132.80 240200100052 Bretagne Stabilities Tg Men's Golf Shoes $99.70 m203d02 PROC PRINT Output

  17. 3.01 Quiz %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted= ; %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..; m203a01 Open the program m203a01. Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED?

  18. 3.01 Quiz – Correct Answer %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted=%sysfunc(attrc(&openrc,sortedby)); %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..; m203a01 Open the program m203a01. Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED? Employee_ID

  19. Task 3: Determine the Age of a SAS Data Set %macro age(dsn); %local dsid crdate dsidc days; %let dsid=%sysfunc(open(&dsn)); %let crdate=%sysfunc(attrn(&dsid,crdte)); %let dsidc=%sysfunc(close(&dsid)); %let days=%sysevalf("&sysdate9"d -%sysfunc(datepart(&crdate))); %if &days > 0 %then %do; %put WARNING: &dsn is &days day(s) old. It is being recreated.; data &dsn; infile 'orders03.dat'; input Order_ID Order_Type Order_Date : date9.; format Order_Date date9.; run; %end; %else %put NOTE: &dsn is current.; %mend age; m203d03 The Orion Star programmers need a way to determine when to refresh a data set. They decided to use the CRDTE attribute to calculate the age of a data set.

  20. Task 3: Determine the Age of a SAS Data Set • %age(orion.orders03) • WARNING: orion.orders03 is 1 day(s) old. It is being recreated. • NOTE: The infile 'orders03.dat' is: • Filename=C:\workshop\orders03.dat, • RECFM=V,LRECL=256,File Size (bytes)=2496, • Last Modified=31Jan2008:18:09:56, • Create Time=16Jun2008:17:09:05 • NOTE: 104 records were read from the infile 'orders03.dat'. • The minimum record length was 22. • The maximum record length was 22. • NOTE: The data set ORION.ORDERS03 has 104 observations and 3 variables. • NOTE: DATA statement used (Total process time): • real time 0.14 seconds • cpu time 0.06 seconds • %age(orion.orders03) • NOTE: orion.orders03 is current. m203d03 Partial SAS Log

  21. Task 4: Archive a SAS Data Set RENAME(old-name, new-name) TODAY( ) Because many of Orion Star’s macro applications refresh SAS data sets, the programmers want to archive the current data set before the data set is refreshed. They decided to concatenate today’s date to the end of the data set name, using the RENAME and TODAY functions. General form of the RENAME function: General form of the TODAY function:

  22. Task 4: Archive a SAS Data Set %let newname=daily_sales_%sysfunc(today(), date9.); %let rc=%sysfunc(rename(orion.daily_sales, &newname)); proc contents data=orion._all_ nods; run; Member File # Name Type Size Last Modified 1 COUNTRY DATA 17408 01Jul08:23:11:48 COUNTRY INDEX 17408 01Jul08:23:11:48 2 CUSTOMER DATA 33792 30Jul08:22:28:42 3 CUSTOMER_DIM DATA 33792 14Dec07:09:05:44 4 CUSTOMER_TYPE DATA 17408 30Jul08:01:29:54 CUSTOMER_TYPE INDEX 9216 30Jul08:01:29:54 5 DAILY_SALES_07OCT2008 DATA 9216 21Aug08:14:18:18 6 ORDER_FACT DATA 66560 10Jul08:19:45:26 7 SALES DATA 25600 27Jul08:21:40:55 m203d04 Partial PROC CONTENTS Output

  23. Task 5: Create Data Sets from Worksheets Sales.xls Australia$ United States$ %READXLS Australia UnitedStates The Orion Star programmers need a macro to import every worksheet in a given Excel workbook.

  24. Task 5: Create Data Sets from Worksheets • The programmers will use SAS session metadata that is available via PROC SQL DICTIONARY tables or Sashelp views. • The metadata includes information on the following: • SAS files • external files • macro variables • system options, titles, and footnotes

  25. Task 5: Create Data Sets from Worksheets • The macro will incorporate these elements: • SAS/ACCESS LIBNAME statement • sashelp.vtable • an iterative %DO loop • indirect macro variable references

  26. Reading Excel Files Using the LIBNAME Statement libname xlsdata 's:\workshop\c3\sales.xls'; proc contents data=xlsdata._all_; run; m203d05 The SAS/ACCESS LIBNAME statement extends the LIBNAME statement to support assigning a library reference name (libref) to Microsoft Excel workbooks. This enables you to reference worksheets directly in a DATA step or SAS procedure. Each worksheet in the Excel workbook is treated as though it were a SAS data set.

  27. Reading Excel Files Using the LIBNAME Statement The CONTENTS Procedure Directory Libref XLSDATA Engine EXCEL Physical Name sales.xls User Admin DBMS Member Member # Name Type Type 1 Australia$ DATA TABLE 2 UnitedStates$ DATA TABLE Partial PROC CONTENTS Output

  28. Reading Excel Files Using the LIBNAME Statement data australia; set xlsdata.'Australia$'n; run; m203d05 All worksheets will be referenced with a SAS two-level name, that is, libref.data-set-name. If the worksheet name contains special characters, you must use the SAS name literal construct of "name"n.

  29. Using SAS Session Metadata data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname); if last then call symputx('n', _n_); run; 473 %put _user_; GLOBAL SHEET1 Australia$ GLOBAL SHEET2 UnitedStates$ GLOBAL N 2 m203d05 Use sashelp.vtableto create a series of macro variables that contain the member names. Partial SAS Log

  30. 3.02 Quiz data _null_; set sashelp.vtable; where libname='ORION'; call symputx(cats('dsn', _N_), ??????????); run; %put _user_; m203a02 Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library.

  31. 3.02 Quiz – Correct Answer data _null_; set sashelp.vtable end=last; where libname='ORION'; call symputx(cats('dsn', _N_), memname); run; m203a02 Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library.

  32. Iterative %DO Loops (Review) %DOindex-variable=start %TO stop <%BY increment>; text %END; %macro putloop; %do i=1 %to &n; %put Sheet&i is &&sheet&i; %end; %mend putloop; m203d05 The iterative %DO statement executes a section of a macro repetitively, based on the value of an index variable. General form of the iterative %DO statement:

  33. Indirect Macro Variable References (Review) &&sheet&i reference 1st scan &sheet1 2nd scan Australia$ The indirect reference causes a second scan of the macro variable reference. Partial Symbol Table

  34. 3.03 Quiz %do i=1 %to &n; %put The values of the macro variables are __________ ; %end; How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop?

  35. 3.03 Quiz – Correct Answer do i=1 %to &n; %put The names of the macro variables are &&dsn&i; %end; How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop?

  36. Processing a Data Library %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls) m203d06 Use a %DO loop to generate a DATA step and a PROC PRINT step for every worksheet in an Excel workbook. ...

  37. Processing a Data Library %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls) m203d06 The %LENGTH function is used to return the number of characters in &&SHEET&I. ...

  38. Processing a Data Library %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls) The %EVAL function enables subtraction of 1 from that length to create a macro variable LEN that is the length of the spreadsheet name without the $. ...

  39. Processing a Data Library %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls) The %SUBSTR function creates a macro variable DSN, beginning at position 1, for the length number of characters in the macro variable LEN.

  40. Processing a Data Library %readxls(sales.xls) NOTE: There were 63 observations read from the data set XLSDATA.'Australia$'n. NOTE: The data set WORK.AUSTRALIA has 63 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 63 observations read from the data set WORK.AUSTRALIA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: There were 102 observations read from the data set XLSDATA.'UnitedStates$'n. NOTE: The data set WORK.UNITEDSTATES has 102 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 102 observations read from the data set WORK.UNITEDSTATES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds • Partial SAS Log m203d06

  41. Exercise This exercise reinforces the concepts discussed previously.

  42. 2 Chapter 3: Reading and Processing Data

  43. Objectives • Use external file functions to examine files that are not SAS files.

  44. Processing External Files • The Orion Star programmers want to reduce redundant code when reading multiple external files into SAS data sets. The applications should be able to process the files in a given directory and subdirectory in order to accomplish the following tasks: • Process all DAT files. • Import all CSV files. • Read every worksheet in all of the Excel workbooks. • They decided to use the external file functions to accomplish these three tasks.

  45. External File Functions Functions to access a directory: • DOPEN • DNUM • DREAD • DCLOSE Functions to access an external file: • FILEEXIST and FEXIST • FILENAME • FOPEN • FCLOSE Functions to read from or write to an external file: • FREAD • FGET • FPUT and FWRITE

  46. Processing External Files The DOPEN, DNUM, and DREAD functions enable access to all the external files found in a given directory. Use these steps for processing files from a directory: %SYSFUNC is required to execute these functions within the macro facility. Use the FILENAME function to assign a fileref to the directory. Use the DOPEN function to open the directory. Use the DNUM function to identify how many members are in the directory. Use the DREAD function to extract each member name. Process the external files. Use the DCLOSE function to close the directory.

  47. Steps 1 and 2: Access a Directory %macro direxist(dir); %local fileref rc did didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %put NOTE: Directory ID is &did ; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref)); %mend direxist; %dirlist(s:\workshop) m203d07 For applications to extract information about a directory and its contents, it is necessary to first open the directory using the DOPEN function. If it is successful, the function returns a directory identifier.

More Related