epib 698e lecture 6
Download
Skip this Video
Download Presentation
EPIB 698E Lecture 6

Loading in 2 Seconds...

play fullscreen
1 / 25

EPIB 698E Lecture 6 - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

EPIB 698E Lecture 6. Raul Cruz-Cano Fall 2013. Sorting, Printing and Summarizing Your Data. SAS Procedures (or PROC) perform specific analysis or function, produce results or reports Eg: Proc Print data =new; run; All procedures have required statements, and most have optional statements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' EPIB 698E Lecture 6' - lamar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
epib 698e lecture 6

EPIB 698E Lecture 6

Raul Cruz-Cano

Fall 2013

sorting printing and summarizing your data
Sorting, Printing and Summarizing Your Data
  • SAS Procedures (or PROC) perform specific analysis or function, produce results or reports
  • Eg: Proc Print data =new; run;
  • All procedures have required statements, and most have optional statements
  • All procedures start with the key word “PROC”, followed by the name of the procedure, such as PRINT, or contents
  • Options, if there are any, follow the procedure name
  • Data=data_name options tells SAS which dataset to use as an input for this procedure. NOTE: if you skip it, SAS will use the most recently created dataset, which is not necessary the same as the mostly recently used data.
by statement
BY statement
  • The BY statement is required for only one procedure, Proc sort

PROC Sort data = new;

By gender;

Run;

  • For all the other procedures, BY is an optional statement, and tells SAS to perform analysis for each level of the variable after the BY statement, instead of treating all subjects as one group

Proc Print data =new;

By gender;

Run;

  • All procedures, except Proc sort, assumes you data are already sorted by the variables in your BY statement
proc sort
PROC Sort
  • Syntax

Proc Sort data =input_data_name out =out_data_name ;

By variable-1 … variable-n;

  • The variables in the by statement are called by variables.
  • With one by variable, SAS sorts the data based on the values of that variable
  • With more than one variable, SAS sorts observations by the first variable, then by the second variable within the categories of the first variable, and so on
  • The DATA option specify the input data set. Without the DATA option, SAS will use the most recently created data set.
proc sort1
PROC Sort
  • By default, SAS sorts data in ascending order, from the lowest to the highest value or from A to Z. To have the ordered reversed, you can add the keyword DESCENDING before the variable you want to use the highest to the lowest order or Z to A order
  • The NODUPKEY option tells SAS to eliminate any duplicate observations that have the same values for the BY variables
proc sort2
PROC Sort
  • Example: The sealife.txt contains information on the average length in feet of selected whales and sharks. We want to sort the data by the family and length

Name Family Length

beluga whale 15

whale shark 40

basking shark 30

gray whale 50

mako shark 12

sperm whale 60

dwarf shark .5

whale shark 40

humpback . 50

blue whale 100

killer whale 30

proc sort3
PROC Sort
  • Example: The sealife.txt contains information on the average length in feet of selected whales and sharks. We want to sort the data by the family and length

Name Family Length

beluga whale 15

whale shark 40

basking shark 30

gray whale 50

mako shark 12

sperm whale 60

dwarf shark .5

whale shark 40

humpback . 50

blue whale 100

killer whale 30

proc sort4
PROC Sort

DATA marine;

INFILE \'F:\sealife.txt\';

INPUT Name $ Family $ Length;

run;

* Sort the data;

PROCSORTDATA = marine OUT = seasort

NODUPKEY;

BY Family DESCENDING Length;

run;

title and footnote statement
Title and Footnote statement
  • Title and Footnote statements are global statements, and are not technically part of any step.
  • You can put them anywhere in your program; but since they apply to the procedure output, it is usually make sense to put them with the procedure
  • Syntax

Title ‘This is a title for this procedure’

Footnote ‘This is the footnote for this procedure’;

  • To cancel the current title or footnote, use the following null statement:

Title;

Footnote;

label statement
Label Statement
  • The label statement can create descriptive labels, up to 256 characters long, for each variable
  • Eg:

Label Shipdate = ‘Date merchandise was shipped’;

ID =‘Identification number of subject’;

  • When a label statement is used in a data step, the labels become part of the data set; but when used in a PROC step, the labels stay in effect only for the duration of that step

If you want to see the labels in a proc print you have to use the option label.

slide11

DATA marine;

INFILE \'C:\sealife.txt\';

INPUT Name $ Family $ Length;

Label Family = "Famili name"

Length="subject lenght“

;

run;

proc print data =marine label;

run;

proc format statement
PROC Format statement
  • The PROC FORMAT procedure allows you to create your own formats. It is useful when you use coded data.
  • The Proc format procedure creates formats what will later be associated with variables in a FORMAT statement
  • Syntax of the PROC FORMAT:

PROC FORMAT;

Value name range-1 =‘formated-text-1’

range-2 =‘formated-text-2’

range-n =‘formated-text-n’;

  • Name is the name of the format you are creating; if the format is for character data, the you need to use $name instead of name. In addition the name can not be the name of an existing format
proc format statement1
PROC Format statement
  • Each range is the value of the variable that is assigned to the text given in the quotation marks
  • The text can be up to 32,767 characters long, but some procedures print only the first 8 to 16 characters
  • The following are some examples of valid range specifications:

‘A’=‘Asian’; character values must be put in quotation marks

1,3,5,7,9=‘ODD’; with more than one value in the range, separate

them with comma or hyphen (-);

5000-high=‘high price’; the key word high and low can be used in

ranges to indicate the lowest and highest

non-missing values for the variable

proc format statement2
PROC Format statement
  • Here is a survey about subject’s preference of car colors. The data contains subject’s age, sex (coded as 1 for male and 2 for female), annual income, and preferred car color (yellow, green, blue, and white). Here are the data:

age sex income color

19 1 14000 Y

45 1 65000 G

72 2 35000 B

31 1 44000 Y

58 2 83000 W

slide15
DATA carsurvey;

INFILE ‘C:\car.txt\';

INPUT Age Sex Income Color $ ;

run;

PROCFORMAT;

VALUE gender 1 = \'Male’

2 = \'Female\';

VALUE agegroup 13 -< 20 = \'Teen\'

20 -< 65 = \'Adult\'

65 - HIGH = \'Senior\';

VALUE $col \'W\' = \'Moon White\'

\'B\' = \'Sky Blue\'

\'Y\' = \'Sunburst Yellow\'

\'G\' = ‘Green\';

run;

PROCPRINTDATA = carsurvey;

FORMAT Sex gender. Age agegroup.

Color $col. Income DOLLAR8.;

RUN;

subsetting in procedures with a where statement
Subsetting in procedures with a where statement
  • The WHERE statement tells a procedure to use a subset of data
  • It is an optional statement for any PROC step
  • Unlike subsetting in the DATA step, using a WHERE statement in a procedure does not create a new data set
  • The basic form is

Where condition; (eg : where gender =‘female’;)

subsetting in procedures with a where statement1
Subsetting in procedures with a where statement
  • A data set contains information about well-known painters:

Name Style Nation of origin

Mary Cassatt Impressionism U

Paul Cezanne Post-impressionism F

Edgar Degas Impressionism F

Paul Gauguin Post-impressionism F

Claude Monet Impressionism F

Pierre Auguste Renoir Impressionism F

Vincent van Gogh Post-impressionism N

  • Goal: we want a list of impressionist painters
slide18
DATA style;

INFILE‘C:\style.txt\';

INPUT Name $ 1-21 style $ 23-40 Origin $ 42;

RUN;

PROCPRINTDATA = style;

WHERE style = \'Impressionism\';

TITLE\'Major Impressionist Painters\';

FOOTNOTE\'F = France N = Netherlands U = US\';

RUN;

summarizing you data with proc means
Summarizing you data with PROC MEANS
  • The proc means procedure provide simple statistics on numeric variables. Syntax: Proc means options ;
  • List of simple statistics can be produced by proc means:

MAX: the maximum value

MIN: the minimum value

MEAN: the mean

N : number of non-missing values

STDDEV: the standard deviation

NMISS: number of missing values

RANGE: the range of the data

SUM: the sum

MEDIAN: the median

DEFAULT

proc means
Proc means
  • Options of Proc means:
  • By variable-list : perform analysis for each level of the variables in the list. Data needs to be sorted first
  • Var variable list: specifies which variables to use in the analysis
proc means1
Proc means
  • A wholesale nursery is selling garden flowers, they want to summarize their sales figures by month. The data is as follows:

ID Date Lily SnapDragon Marigold

756-01 05/04/2001 120 80 110

756-01 05/14/2001 130 90 120

834-01 05/12/2001 90 160 60

834-01 05/14/2001 80 60 70

901-02 05/18/2001 50 100 75

834-01 06/01/2001 80 60 100

756-01 06/11/2001 100 160 75

901-02 06/19/2001 60 60 60

756-01 06/25/2001 85 110 100

slide22
DATA sales;

INFILE \'C:\Flowers.txt\';

INPUT CustomerID $ @9 SaleDate MMDDYY10. Lily SnapDragon Marigold;

Month = MONTH(SaleDate);

RUN;

PROCSORT DATA = sales;

BY Month;

RUN;

* Calculate means by Month for flower sales;

PROCMEANS DATA = sales;

BY Month;

VAR Lily SnapDragon Marigold;

TITLE \'Summary of Flower Sales by Month\';

RUN;

output statement
OUTPUT statement
  • The SAS data set created by the output statement will contain all the variables defined in the output statistic list; any variables in a BY or CLASS statement, plus two new variables: _TYPE_ and _FREQ_
  • Without BY or CLASS statement, the data will have just one observation
  • If there is a BY statement, the data will have one observation for each level of the BY group
  • CLASS statements produce one observation for each level of interaction of the class variables
  • The value _TYPE_depends on the level of interactions of the CLASS statement.
  • _TYPE_= 0 is the grand total
output statement of the proc means
OUTPUT statement of the PROC MEANS
  • We can use the OUTPUT statement to write summary statistics in a SAS data set.
  • Syntax

OUTPUT out =data_name output-statistic-list;

  • Eg:

* Calculate means by Month for flower sales;

PROCMEANS DATA = sales;

OUTPUT OUT= values;

BY Month;

VAR Lily SnapDragon Marigold;

TITLE \'Summary of Flower Sales by Month\';

RUN;

  • In the output data set new1, we have two means for age and BMI respectively. The variable names are mean_age mean_BMI respectively.
  • Be careful with the output format, it might not look as the output of the proc.
slide25

data report;

set values ( keep= Lily SnapDragon _STAT_ );

if _STAT_ ="MIN" or _STAT_="MAX";

run;

ad