Data Transformation. Data cleaning. Importing Data. Reading data from external formats Libname/Infile/Input for text form data Proc Import for Excel/Access data ODBC for external database data. Importing an Excel Spreadsheet. PROC IMPORT OUT= WORK.Fall2007
PROCIMPORT OUT= WORK.Fall2007
PROC IMPORT OUT= WORK.OrderLine
PROC CONTENTS DATA= OrderLine;
PROC PRINT DATA= OrderLine (OBS= 10);
LIBNAME course "L:\DataWarehousing07f\CourseDatabase";
Note: the name associated with the libname command (“course”) must be 8 characters or less.
INPUT NAME $ WEIGHT WAIST PULSE CHINS SITUPS JUMPS;
Generating new attributes for a table. SAS creates attributes when they are referred to in a data step. The metadata depends on the context of the code.
TextOutput = PUT(variable, format)
Note: the result of a put function is always character
Note: there is also a PUT statement that writes the contents of a variable to the SAS log
Output = INPUT(CharacterInput, informat)
Note: the variable for an input function is always character
Note: use the SAS “search” tab to look for “Formats.” For a list of SAS formats look under: “Formats: Formats by Category”
The following code is handy for testing functions and formats in SAS. The _Null_ dataset name tells SAS not to create the datset in the WORK library
OutputVal= PUT(InputVal, Roman30.);
PUT InputVal OutputVal;
Length Date $10.;
Date= PUT( Datepart(OrderDate), MDDYY8.);
We are especially interested in “Character” and “Date and Time” functions
Note: use the SAS “search” tab to look for “Functions.” For a list of SAS functions look under: “Functions and CALL routines: Functions and CALL Routines by Category”
Note: it is a good idea to strip blanks before you try to parse a string.
LENGTH street cust_addr $20.;
FORMAT street cust_addr $20.;
Number= Scan(Cust_Addr,1,' ');
Street= Scan(Cust_Addr,2,' ');
Note: The LENGTH and FORMAT statements clear trailing blanks for further display.
Obs cust_addr Number street
1 481 OAK 481 OAK
2 215 PETE 215 PETE
3 48 COLLEGE 48 COLLEGE
4 914 CHERRY 914 CHERRY
5 519 WATSON 519 WATSON
6 16 ELM 16 ELM
7 108 PINE 108 PINE
Always print the before and after images here. Parsing free form text can be quite a problem. For example, apartment addresses ‘110b Elm’ and ‘110 b Elm’ will parse differently. In this case you may have to search the second word for things that look like apartments and correct the data.
Use this when you have a known position for characters.
Replaces character value contents. Use this when you know where the replacement starts.
Returns the length of a character variable
Supplier= Tranwrd(supplier, " Incorporated", "");
If Supplier= "Trinkets & Things" then supplier= "Trinkets n' Things";
More complex logic is often needed. See the course examples.
It is a good idea to produce a change log for standardized changes:
Data Products2 Changed;
* * * *
If Trim(supplier) ^= Trim(SupplierOld) then output Changed;
Proc Print Data= Changed;
Var SupplierOld Supplier;