Ch 2 getting your data into sas
Download
1 / 33

Ch. 2: Getting Your Data into SAS - PowerPoint PPT Presentation


  • 346 Views
  • Uploaded on

Ch. 2: Getting Your Data into SAS. At OSEDA we try to ease the burden of converting data by having data already in SAS data set format in the archive. But it is still important that you know how to convert your own.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ch. 2: Getting Your Data into SAS' - bryce


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Ch 2 getting your data into sas l.jpg
Ch. 2: Getting Your Data into SAS

  • At OSEDA we try to ease the burden of converting data by having data already in SAS data set format in the archive.

  • But it is still important that you know how to convert your own.

  • If you understand input statements and proc import, you’ll be well on your way to how to use put statements and proc export.


Four basic methods l.jpg
Four Basic Methods

  • Direct data entry via Viewtable or SAS/FSP windows.

  • Reading raw data files (ascii) with input statements.

  • Converting files from other packages such as dbf, xls, wkn -- usually via proc import.

  • Accessing data in other DB formats directly via special engines that make them look like SAS data sets. (Oracle access, for example.)


Entering data directly via windows l.jpg
Entering Data Directly via Windows

  • Not used much here. This is usually done as part of an on-line transaction system, which is not something we do much with SAS.

  • It can be done, however, and can be quite sophisticated.


Reading raw data with input stmts l.jpg
Reading Raw Data with Input Stmts

  • This has been traditionally the most important way we capture our data.

  • This requires the most knowledge of SAS language to do. Sometimes the format of the data makes it quite challenging.

  • Reading .csv files has become very important and common. SAS makes it pretty easy.


Converting xls dbf etc l.jpg
Converting xls, dbf, etc.

  • DDE (Windows only) option. Requires having the other application running during conversion.

  • Proc import and the Import Data Wizard is easiest.

  • Exporting the other application’s data to csv and then importing remains an option.


Reading spss oracle etc l.jpg
Reading SPSS, Oracle, etc.

  • We now have special “engines” that can be used to make things such as Oracle tables look just like a SAS data set.

  • We have had good success accessing Oracle tables via the engine.

  • Nice part of doing it this way is that if the source data changes you do not have to reconvert. Sometimes, that’s a problem.


Reading raw data files l.jpg
Reading Raw Data Files

  • For small data collections you can imbed the data lines in your own code, preceded by a datalines; statement.

  • Most of the time you’ll be dealing with data stored in external files.

  • The infile statement is used to point SAS to the file to be read. The input statement does the actual reading.


Accessing the sample code l.jpg
Accessing the Sample Code

  • Go to URL:http://ftp.sas.com/samples/A56649

  • Cut and paste to get individual programs


Long records l.jpg
Long Records

  • An annoying “gotcha” is the default of truncating input lines at 256 bytes.

  • Easy to circumvent if you know to code lrecl=2000 (or some large value) as option on the infile statement. Sets up the buffer size - it does not mean the records have to really be that long. They just can’t be longer.


Space separated data l.jpg
Space Separated Data

  • Avoid this except for really small sets of data that you are keying in yourself.

  • Too easy for something to go wrong.

  • Character data containing blanks are a big problem.


Reading fixed format file l.jpg
Reading Fixed Format File

  • These are files where the “fields” are in the same column locations on each input line.

  • Read using a combination of column and formatted read specifications.

  • A good idea is to use formatted input for everything. Keeps it simple.


Formatted vs column input l.jpg
Formatted vs. Column Input

  • Input id $5. +1 year $4. +1 sales 7. ;

  • Input id $1-5 year $7-10 sales 12-18;

  • input id $char5. @7 year $4. @12 sales 7.;

  • input id $char5. +1 year $4. +1 sales 7; *-!-;

  • vs. free format:

    • input id $ year $ sales ;


Sas builtin formats l.jpg
SAS builtin formats

  • Used within input (and put) statements (and functions) to convert to/from external formats to SAS internal formats.

  • $ always used as 1st char of a character format or informat.

  • Most references to formats require a period at the end to distinguish what it is.

  • Input dob date7 … ; *<--what happens?-;


Rich collection of formats l.jpg
Rich Collection of Formats

  • As seen on pp. 34-35 of TLSB.

  • Most of these can be used to write data (in a put statement) as well as read data. They just reverse the conversion process.

  • Note the wide assortment of data and time related formats.


Converting raw data at oseda l.jpg
Converting Raw Data at OSEDA

  • Major purpose of creating data archive is to make the expensive and error-prone task of getting the data properly converted something that is done just once and by pro. programmers.

  • Goal is to replace infile/input with set statements as much as possible. Make record-layout type documentation obsolete.


Need to carefully review l.jpg
Need to Carefully Review

  • It is extremely easy to make a mistake when coding complex input statements.

  • Always review the resulting data set(s) very carefully before going on.

  • Use Proc Contents and Proc Print in batch, or the corresponding interactive windows in DM to carefully inspect data sets.


Sas data sets l.jpg
SAS Data Sets

  • Absolutely essential that you become very comfortable with what these are, how they are created, referenced, etc.

  • For some reason, many new users see the way SAS handles data set references as really hard.

  • But it’s really pretty simple.


New in v7 data set literals l.jpg
New in V7: Data Set Literals

  • Starting with v7 you can code:data ‘c:\temp\boone_county’;

  • Before v7 this was heresy. You always had to define the data library separately and use a 2-level name to reference a data set within that directory:libname tempsas ‘c:\temp’; data tempsas.boone_county;


One level data set names l.jpg
One Level Data Set Names

  • References to SAS data sets that have only one level (I.e. no period such as save.set1) are assumed to be stored in a special temporary SAS data library named work.

  • An important exception to this is if you define the specific user libref. If such a library is defined, SAS will store and look for all 1-level names in that data library.


Sas data libraries l.jpg
SAS Data Libraries

  • On Unix and Windows these are just collections of SAS data sets stored in directories.

  • At least it used to be that simple. With v8 we can now define SAS data libraries that include multiple directories.

  • But the latter kind are still rare around here.


Creating a permanent sas data set l.jpg
Creating a Permanent SAS Data Set

  • Typically involves something such as:libname mysas ‘s:\myname\mysas’; data mysas.data0901;

  • The directory specified in quotes in the libname statement must be allocated outside SAS (using mkdir or the equivalent.)

  • The data set is stored as a system file within that directory with name data0901. The extension will vary with the engine/SAS version/platform.


Entering data with vt window l.jpg
Entering Data with VT Window

  • I have never done this or known anyone who has.

  • One of our goals at OSEDA is to be able to have a record of how the data we use relates to the original source.

  • Keying in data like this can be tricky in that it is way too easy to make a mistake that the software cannot detect.


Reading multiple input lines l.jpg
Reading Multiple Input Lines

  • The input statement typically reads one line of data to create one observation in a SAS data set. However, ….

  • You can use the “/” to tell SAS to go to the next input line, or “#2” (or #3, etc.) to tell it to position itself at the 2nd line (relative to where it began reading.)


Load data and code for 2 12 l.jpg
Load Data and Code for 2.12

  • Go to the sample code at http://ftp.sas.com/samples/A56649

  • Search for text ‘2.12’

  • Go to Windows to create c:\MyRawData.

  • Type DM command note rawdata to open a SAS notepad window.

  • Copy the data from your browser and paste it to the notepad window.


Data code for 2 12 cont l.jpg
Data & Code for 2.12 - cont.

  • Use File-Save as to save the contents of the notepad window to c:\MyRawData\temperature.dat

  • Go back and get the sample data step code from the browser. Come back to SAS and paste the code into the editor window.

  • Submit the code.


Multiple obs per line of raw data l.jpg
Multiple Obs Per Line of Raw Data

  • Use the double trailing @ to tell SAS to stay on this line, even across cycles of the DATA step. (Default is to always flush the current record at the end of a d-s cycle.)

  • data scores; input name $ score @@;datalines;Mike 25 Samuel 36 Melanie 40 Me 2 proc print; sum score; run;


Trailing @ to read part of line l.jpg
Trailing @ to Read Part of Line

  • A single trailing @ in an input stmt says to leave the data pointer where it is.

  • Subsequent input statements in the same step will pick up where last input left off.

  • But record is released at end of data step cycle (unlike with double trailing @).

  • Allows you to read part of a record and then conditionally continue reading, or have alternative reads based on “record type”.


Infile statement options l.jpg
Infile Statement Options

  • Firstobs= and obs= commonly used to begin/end at specific locations within file.

  • Missover, truncover, and stopover (not in TLSB) specify how to handle case where SAS needs to go to a new line to complete the input read.


Reading delimited csv files l.jpg
Reading Delimited (.csv) Files

  • You almost always want to use the dsd option. Add option dlm=‘09’x for tab-delimited files.

  • A problem with reading this way is that character variables may not get right length imputed.

  • Strongly recommend “declaring” all vars before reading (prior to input statement.)


Example of reading csv file l.jpg
Example of Reading csv File

  • data class;

  • length stud_id $6 Name $24 Address $40 City $20 State $2 zip $5 gpa 5;

  • infile datalines dsd;

  • input stud_id -- gpa;

  • put / _all_;

  • datalines;

  • 001234, Joe Smith,123 S 5th St,Columbia,MO,65201,3.0

  • 003456, Mary Jones,909 Nifong, Jefferson City,MO,65103,3.5

  • run;


Running the samples l.jpg
Running the Samples

  • We have captured the sample code for the text (TLSB) and edited/extended it for local use.

  • See the code in file s:\sas8\Ch2_Samples.sas s:\sas8\Ch3_Samples.sas etc.


The copyto macro l.jpg
The %copyto Macro

  • Utility macro used to copy the sample data to our local data directory,s:\sas8\RawData

  • Uses a single positional parameter and a single keyword parameter.

  • Generates a simple SAS data step that reads data from datalines and writes a file in the RawData directory with name determined by the parameters.


Run the samples ch 2 l.jpg
Run the Samples - Ch 2

  • Use the new enhanced editor window.

  • Do a file-open and select the Ch2_Samples.sas program file.

  • Note that all the steps which invoke %copyto have already been run and the results are already out there. But it will not hurt if you rerun.

  • You can edit the copyto macro and make your own personal copies which you can then play with.


ad