1 / 39

EPIB 698E Lecture 3

EPIB 698E Lecture 3. Instructor: Raul Cruz-Cano raulcruz@umd.edu 9/18/2013. Creating Data in SAS, an overview. Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement) Reading dataset from external files (not SAS data, INFILE statement)

kcarney
Download Presentation

EPIB 698E Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EPIB 698E Lecture 3 Instructor: Raul Cruz-Cano raulcruz@umd.edu 9/18/2013

  2. Creating Data in SAS, an overview • Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement) • Reading dataset from external files (not SAS data, INFILE statement) • Using Import/Export facility (a point-and-click approach)

  3. Entering Data with Viewtable window • To open viewtable window, select “Table Editor” from the Tools menu. An empty viewtable window will appear • The letters at the tops of columns are default variable names. Right click on the letter and open the Column attributes window. You can replace variables name, type, etc

  4. Entering Data with Viewtable window • Entering you data once you have defined your columns. • To save your table, select “Save as” from the File menu, then select a library and specify the name of your table (SAS dataset) • To open an existing table, go to ToolsTable editor, the view table window will be opened. Then go the File menu, click Openchoose libraryselect Table name. To switch from browse mode (default) to edit mode, select “Edit Mode” from the Edit menu • Using Viewtable you can easily create a data table by setting the columns, you can add rows to the table. However, you can not add columns once you finish defining your columns. This is a considerable disadvantage.

  5. Reading Data Inline, CARDS Statement • You enter the actual data points inside the PROGRAM EDITOR • Example: CARDS statement data instructor; input name $ gender $ age; cards; Jane F 30 Mary F 29 Mike M 28 ; run;

  6. Examples • Reading multiple observations in each line of data by adding the @@ symbol at the end of the input statement data aaa; input x y @@; datalines; 1 12.4 1 11.3 1 1.4 2 2.1 2 19.4 1 10.0 2 13.2 1 11.6 ; run;

  7. Reading Dataset from External Files: INFILE Statement • Identifies the external file that contains the data and has options that control how the records in file are read into SAS • Must be used before the input statement because it locates the data file to be read • Syntax datadata_set_name; Infile directory_and_file_name; inputvariable_list; run;

  8. Reading raw data separated by spaces—list input • List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks. • Will read all the data in a record, no skipping unwanted values • Any missing data must by indicated with a period • Character data must be simple, no embedded spaces

  9. Reading raw data separated by spaces—list input Example: data demographics; infile ’E:\mydata.txt'; input Gender $ Age Height Weight; run; • $ sign after gender means that gender is a character variable

  10. Specify missing values with list input • We use a period to represent missing values M 50 168 155 M 50 168 155 F 23 160 101 F . 160 101 M 65 172 220 M 65 172 220 F 35 165 133 F 35 165 133 M 15 171 166 M 15 171 166

  11. Reading raw data separated by commas • Comma separated values file (csv file) use commas as data delimiters • They may or may not enclose character values in quotes. • Example: mydata.csv "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166

  12. Reading raw data separated by commas data demographics; infile‘E:\mydata.csv'dsd; input Gender $ Age Height Weight; run; Dsd: means delimiter sensitive data. It has several functions. (1)change the default delimiter from a blank to a comma. (2)If there are two delimiters in a row, it assumes there is a missing value in between (3)If character values are placed in quotes, the quotes are stripped from the value

  13. INFILE statement, useful options DSD • It recognizes two consecutive delimiters as a missing value. Example: 20,30,,50, SAS will treat this as 20 30 50 . but with the the dsd option SAS will treat it as 20 30 . 50 . • It allows you to include the delimiter within quoted strings. Example: a comma separated file and your data included values like "George Bush, Jr.” With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

  14. INFILE statement (cont.) DLM=The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. dlm=‘,’ indicates a comma is the delimiter (e.g., a comma separated file, .csv file). dlm='09'x indicates that tabs are used to separate your variables dlm=‘:’ indicates a colon is the delimiter

  15. INFILE statement (cont.) • We can use dsd and dlm at the same time: infile ‘file-description’ dsd dlm=‘:’ ; This combination of options performs all the actions requested by the DSD option, but overrides the default delimiter (comma) with a delimiter of your choice

  16. INFILE statement (cont.) • useful options continued… • missover: if number of variables in file does not match number of variables in input file all remaining variables are set to missing. • obs: specifies the last record to be read into the data set • firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset. Data mydata2; infile ‘E:\mydata.txt' missover obs=4 firstobs=2; input Gender $ Age Height Weight; run;

  17. Filename statement • Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name filename myfile ‘E:\mydata.csv'; data demographics; infile myfile dsd; input Gender $ Age Height Weight; run;

  18. Specifying INFILE options with the DATALINES statement data demographics; infile datalines dsd; input gender $ Age Height Weight; datalines; "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 ; Run;

  19. Reading data from fixed columns • Many raw data files store specific information in fixed columns • The advantage of fixed column files: (1) don’t need to worry about missing values (2) you can choose which variables to read and in what order to read them

  20. Bank data 123456789012345678901 columns 00110/ 21/1955M 1145 00211/ 18/2001F 18722 Column 1-3 : subject ID Column 4-13: Date of birth Column 14-14: gender Column 15-21: Account balance

  21. Column Input data financial; infile‘E:\bank.txt'; input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21; run;

  22. Formatted Input • Formatted input can read both character and standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates. • Formatted input is the most common and powerful of all input methods • SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed.

  23. Formatted Input • The bank data: data financial; infile ‘E:\bank.txt'; input @1 Subj $3. @4 DOB mmddyy10. @14 Gender $1. @15 Balance 7.; run;

  24. Formatted Input • The @ sign the INPUT statement are called column pointers. @4 tells SAS to go to column 4. • Following variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value

  25. Formatted Input • Two of the most basic informats are w.d and $w. • The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value. For examples: data value is 123, With informat 3.0, SAS will save it as 123; With informat 3.1, SAS will save it as 12.3; • If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23, With informat 4.1, SAS will it as 1.23;

  26. Formatted Input • The $w Informat tells SAS to read w columns of character data • The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. • SAS stores dates as numeric values equal to the number of days from January 1, 1960. Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of 1.

  27. The format statement • The format statements are built-in SAS command that allow you to display data in easily readable ways. • All SAS formats command ends either in a period or in a period follows by a number. title"Listing of FINANCIAL"; procprintdata=financial; format DOB mmddyy10. Balance dollar11.2; run; The dollar11.2 tellsSAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points.

  28. Using a format/informat statement in a DATA step • It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set. • You can override any permanent format by placing a FORMAT statement in a particular procedure.

  29. A informat statement with list input • Following the key word informat, you list each variable and information you want to use to read each variable data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8. ; infile‘E:\list.csv'dsd; input Subj Name DOB Salary; format DOB date9. Salary dollar8.; run; Creates space I just declare that comma is my delimiter, hence SAS is going to read all the characters that it might encounter until a comma is found, right? Nope, wrong.

  30. data list_example; infile‘E:\list.csv 'dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9. Salary dollar8.; run; there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met. Without it, SAS may read past a delimiter to satisfy the width specified in the informat. 30

  31. What are the Differences Informats used at input Usually reading external data Formats used during output cycle Write formatted value to output

  32. Selected date informats

  33. Selected date formats

  34. More examples of informat for numeric data

  35. More examples of informat for character data

  36. Import/Export Data • To Export SAS datasets • Go to the File menu and select “Export Data” • Choose the data file ( from the library Work) • Locate and select file type using the browse button • Save the data set and finish • Check the log to make sure the data set was created • This method does not require a data step, but any modification may require a data step • Convenient for Excel file • Import a SAS data set follows similar step

  37. Read data in Excel file • (1) Use Import procedure FileImport dataChoose Microsoft ExcelClick “Next”Select work sheet using browseselect “Table” you want to importClick “Next”Select Library and assign a file nameClick “Next” if you want Proc import SAS code generated, otherwise click “Finish” Check the log window to make sure the data was created successfully. • (2) Save Excel file as a csv file, then read in using the infile statement with dsd option.

  38. PROC IMPORT PROC IMPORT OUT= WORK.auto1 DATAFILE= "C:\auto.xlsx" DBMS=xlsx REPLACE; SHEET="auto"; GETNAMES=YES; RUN; • OUT option in the proc import tells SAS what the name should be for the newly-created SAS data file and where to store the data set once it is imported.  • DATAFILE option tells SAS where to find the file we want to import.  • DBMS option is used to identify the type of file being imported.  • REPLACE option will overwrite an existing file. • SHEET specifies which sheet SAS should import to. The default is for SAS to read the first sheet.  • GETNAMES is the default setting and SAS will automatically use the first row of data as variable names.  If the first row of your sheet does not contain variable names use the GETNAMES=no. 

  39. PROC EXPORT PROC EXPORT DATA= WORK.auto2 OUTFILE= "C:\auto.xlsx" DBMS=xlsx REPLACE; RUN; • OUTFILE option in the proc export tells SAS what the name should be for the newly-created Excel data file and where to store it.  • DATA option tells SAS where to find the dataset we want to export.  • DBMS option is used to identify the type of file being imported.  • REPLACE option will overwrite an existing file. • How to create nicer files (reports) will be addressed in future lectures

More Related