1 / 67

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110) Module 1

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110) Module 1. Mark Carpenter, Professor of Statistics Department of Mathematics and Statistics Phone: 4-3620 Office: Parker 364-A E-mail: carpedm@auburn.edu Web: http://www.auburn.edu/~carpedm/stat6110. Introduction.

caden
Download Presentation

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110) Module 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110)Module 1 Mark Carpenter, Professor of Statistics Department of Mathematics and Statistics Phone: 4-3620 Office: Parker 364-A E-mail: carpedm@auburn.edu Web: http://www.auburn.edu/~carpedm/stat6110

  2. Introduction • Introduction to SAS Windows Environment (log, editor, and output screens). • Introduction to SAS Help Screens (on-line and within SAS system) • Introduction to the SAS DATASTEP • SAS LIBNAMES • SAS INFILE Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  3. Rules for SAS Statements • SAS statements end with a semicolon. • You can enter SAS statements in lowercase, uppercase, or a mixture of the two. • You can begin SAS statements in any column of a line and write several statements on the same line. • You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines. • Words in SAS statements are separated by blanks or by special characters (such as the equal sign and the minus sign in the calculation of the Loss variable in the WEIGHT_CLUB example). Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  4. Comment Statements • Documents the purpose of the programming statements or the overall program. • Can appear anywhere in the program • Are helpful reminders to the programmer and assist the user in implementation of the program. • Syntax: • *message; • or • /*message*/ Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  5. Comment Statements (cont) Example: /* the following lines produce summary statistics */ or *the following lines produce summary statistics; Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  6. Comment Statements (cont) Example: /* Author: John Smith Assignment: Homework 1 Due Date: 9/21/04 */ Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  7. Comment Statements (cont) Example: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  8. Comment Statements (cont) NOTE: All Programs for Homework assigments turned will have to have to start with a preamble: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  9. Comment Statements (cont) Example: * Author: John Smith; * Assignment: Homework 1; * Due Date: 9/21/04; Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  10. INTRODUCTION TO THE SAS DATASTEP Click on “Help”, “SAS Help and Documentation” Click “Contents” tab. Click “SAS Products” then “Base SAS” Click “Step-by-step Programming with Base software module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  11. SAS BASE PROGRAMMING The DATA step is one of the basic building blocks of SAS programming. It creates the data sets that are used in a SAS program's analysis and reporting procedures. Understanding the basic structure, functioning, and components of the DATA step is fundamental to learning how to create your own SAS data sets. Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  12. SAS DATA SETS AND DATASTEPs • In this section, you will learn the following: • what a SAS data set is and why it is needed • how the DATA step works • what information you have to supply to SAS so that it can construct a SAS data set for you. Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  13. ANOTOMY OF A DATASTEP Creating a SAS data set from Scratch using datalines statement DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  14. ANOTOMY OF A DATASTEP 1 DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB 1 Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  15. ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeightEndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 2 The INPUT statement identifies the fields to be read from the input data and names the SAS variables to be created from them (IdNumber, Name, Team, StartWeight, and EndWeight). 2 Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  16. ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 3 The third statement is an assignment statement. It calculates the weight each person lost and assigns the result to a new variable, Loss. 3 Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  17. ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 4 4 The DATALINES statement indicates that data lines follow Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  18. ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 5 The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data. (Later sections show ways to access larger amounts of data that are stored in files.) 5 Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  19. ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 6 The DATALINES statement marks the beginning of the input data. The single semicolon marks the end of the input data and the DATA step. 6 Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  20. NAMING CONVENTIONS • Rules for Most SAS Names • SAS names are used for SAS data set names, variable names, and other items. The following rules apply: • A SAS name can contain from one to 32 characters. • The first character must be a letter or an underscore (_). • Subsequent characters must be letters, numbers, or underscores. • Blanks cannot appear in SAS names. Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  21. NAMING CONVENTIONS Special Rules for Variable Names For variable names only, SAS remembers (labels) the combination of uppercase and lowercase letters that you use when you create the variable name. Internally, the case of letters does not matter. "CAT," "cat," and "Cat" all represent the same variable. But for presentation purposes, SAS remembers (labels) the initial case of each letter and uses it to represent the variable name when printing it. Module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  22. STAT 6110SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; title 'Health Club Data'; run; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  23. STAT 6110SOME SAS BASE PROCEDURES options linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; TITLE 'Health Club Data'; RUN; module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  24. STAT 6110SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC TABULATE DATA=weight_club; CLASS team; VAR StartWeightEndWeight Loss; TABLE team, mean*(StartWeightEndWeight Loss); TITLE1 'Mean Starting Weight, Ending Weight,'; TITLE2 'and Weight Loss'; RUN; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  25. SAS module • Create a directory on your hard drive called c:\sasfiles • Save the SAS programs to your local directory, • module1_example1.sas • module1_example2.sas • module1_example3.sas • module1_example4.sas • module1_exampl5.sas • Save the text files, module1_text1.txt and module1_text2.txt, and the the excel file “classroll_example.xls” to the “c:\sasfiles” directory. • Open SAS and go to the editor. • Follow Professor’s instructions on how to open and run these programs. • Replicate these steps at home and make sure you can open and run SAS programs before the next class meeting. Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  26. SAS DataSet from Existing SAS DataSet DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; DATA weight2; *DATA statement tells SAS to begin building a SAS data set named weight2; SET weight_club; *SET statement tells SAS from which existing dataset to begin; RUN; *Run statement tells SAS that you are at the end of this DATASTEP; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  27. Temporary SAS datasets and the WORK Directory • Both SAS datasets, “weight_club” and “weight2” are temporary SAS datesets • Temporary SAS datasets can be referenced and used throughout the SAS module in which they were created only. • Temporary SAS datasets are stored in the temporary SAS library that SAS calls the “WORK”. Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  28. Temporary SAS datasets and the WORK Directory module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics module 2 : STAT 6110 28

  29. Temporary SAS datasets and the WORK Directory Double-click module 2 : STAT 6110 Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics 29

  30. Temporary SAS datasets and the WORK Directory List of SAS Datasets Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  31. Permanent SAS datasets and user defined SAS Libraries • LIBNAME Statement is used to define a permanent SAS library with name of user’s choosing. • The SAS library is mapped to a specific folder located on the user’s hard-drive. module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  32. Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  33. Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; SAS LIBNAME statement tells SAS you are going to create or reference a SAS Library mapped to a specific location on the harddrive. Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  34. Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; User defined library name. Instead of “libref” the user may choose the name. Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  35. Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user’s harddrive. Example: LIBNAME stat6110 ‘c:\sasfiles’; module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  36. Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user’s harddrive. Example: LIBNAME stat6110 ‘c:\sasfiles’; Must exist on harddrive Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  37. Permanent SAS datasets and user defined SAS Libraries SAS log file Programming Statements LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; 85 86 LIBNAME stat6110 'c:\sasfiles'; NOTE: Libref STAT6110 was successfully assigned as follows: Engine: V9 Physical Name: c:\sasfiles 87 88 DATA stat6110.weight_club; 89 SET weight2; 90 RUN; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  38. Permanent SAS datasets and user defined SAS Libraries Programming Statements Creates a library called “stat6110” “stat6110” is mapped to c:\sasfiles LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; Module1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  39. Permanent SAS datasets and user defined SAS Libraries Programming Statements LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; Creates a permanent SAS dataset called weight_club which is “virtually” mapped to the stat6110 library but actual file is located on the harddrive in ‘c:\sasfiles’ module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  40. Permanent SAS datasets and user defined SAS Libraries Permanent SAS dataset New SAS Library mapped to c:\sasfiles module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  41. FREE FORMAT DATA CREATION If the raw data is in “rectangular” format where columns represent variables and rows represent observations and the variables are separated by spaces, then the SAS dataset can be created (using DATALINES, INFILE, etc) without column formatting. module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  42. FREE FORMAT AND COMMA DELIMITED FILES DATA1 and DATA2 are identical DATA DATA1; INPUT ID Age savings; DATALINES; 1 25 4000 2 33 1000 3 32 8000 4 26 1500 ; DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  43. DSD versus delimter=',' DATA2 and DATA2b are identical DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; The DSD option sets the comma as the default delimiter The DSD and delimiter=',' both sets the comma as the delimiter for this dataset module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  44. DSD versus delimter=',' • DSD (delimiter-sensitive data) • specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values. DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  45. FREE FORMAT DATA CREATIONOther Delimiters DATA DATA3; INFILE datalines delimiter=‘8'; INPUT first$ last$; DATALINES; John8Smith Bill8Johnson Alice8Bening ; DATA DATA4; INFILE datalines delimiter=‘*'; INPUT ID Age savings; DATALINES; 1*25*4000 2*33*1000 3*32*8000 4*26*1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  46. STAT 6110READING CHARACTER VARIABLES DATA DATA5; INPUT ID Age Gender$ Savings; DATALINES; 1 25 Male 4000 2 33 Female 1000 3 32 Male 8000 4 26 Male 1500 ; Dollar sign, $, tells SAS that the variable to be read is a character variable. module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  47. MISSOVER STATEMENT This example demonstrates how to prevent missing values from causing problems when you read the data with list input. Some data lines in this example contain fewer than five temperature values. Use the MISSOVER option so that these values are set to missing. weather1 and weather2 are identical DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 . . 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  48. MISSOVER STATEMENT Prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 . . 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  49. Using the INFILE statement (Reading External Text Files) To find more information on INFILE: While in the text editor in a SAS module, go to “Help” then click on the “Index” tab. Type the word “infile” in the keyword box, then double click the word “INFILE” in the results section. module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

  50. Using the INFILE statement (Reading External Text Files) • Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. • Usually, you use an INFILE statement to read data from an external file. When data is read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step. module 1 STAT 5110/6110: SAS Programming and Applications Mark Carpenter, Professor of Statistics

More Related