HRP223 - 2008. Topic 2 – Using EG. At this point you can:. Start up a project Use SAS as a calculator Set some configuration options Remember to work in WORK, rather than SASUSER Create a library Import a dataset into work or your custom library Subset a dataset
Topic 2 – Using EG
Set up a library to hold your permanent data.
Import data into that library.
Look at what you’ve got.
Check for bad data.
Subset the data to keep the data you want.
Make a report.
Tools menu > Assign Library…
Review the code (if you want)
Check the log
Where is the dataset node in the flowchart?
The log is good. It is a bug… they forgot to draw the dataset if you use proc import.
Tweak the code and link the import node to the library.
Once a file is in a library, you can access it just like any other file on your computer.
Use the task list (right side of the screen), organized by task name, to look up the procedures that go with a menu item or if you are told to use a procedure, you can find the corresponding menu item like this.
Be glad you did not need to memorize this stuff.
Drag Tour from the left pane and drop it into the Analysis variables group.
In this source file we have a categorical “tour” variable. What are the its values?
Use the Describe > One-Way Frequencies menu option to see the categories.
The procedure that does frequency counts is proc freq (pronounced freak). It is very important to learn because it does the core categorical analysis for basic epidemiological studies. The EG code is:
This could be simplified
You have already seen how to subset a dataset using the GUI and SQL.
What if you want to subset into 3 different data sets? You could do a lot of pointing and clicking or write a little program.
That technique is not fun if you need to split into many subgroups. If you do need many subgroups, use code instead.
All data steps begin with the datastatement.
Most have a set statement saying where the data is coming from, and they should end with a run statement.
* A list of what data sets to make;
data fj12 ps27 sh43;
* based on what file? ;
* Check the value of tour and if TRUE output;
if tour = "FJ12"thenoutput fj12;
if tour = "PS27"thenoutput ps27;
if tour = "SH43"thenoutput sh43;
return; * This line is optional;
Knowing the variables’ order can help you do complex things.
proccontentsdata=teletubbies position; run;
Deposit_date looks bad
Notice dates in Excel are actually the number of days since 1900 (in Windows).
Dates in SAS are the number of days since 1960.
The format is made but is not associated with any variable.
Be sure to label the format node in the flowchart and also link it up graphically to show where it is used.
You can then use the formatted data for a categorical analysis without having to make new variables.
Import an Excel file
Describe the data
Do a t-test vs. a population BMI of 24.8