Chapter 12
Download
1 / 217

Chapter 12 - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Chapter 12. Data Transformations. Section 12.1. Reviewing SAS Functions. Objectives. Review the syntax of SAS functions. SAS Functions. SAS provides a large library of functions for manipulating data during DATA step execution.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Chapter 12' - oscar-workman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 12

Chapter 12

Data Transformations


Section 12 1

Section 12.1

Reviewing SAS Functions


Objectives
Objectives

  • Review the syntax of SAS functions.


Sas functions
SAS Functions

  • SAS provides a large library of functions for manipulating data during DATA step execution.

  • A SAS function is often categorized by the type of data manipulation performed:

    • truncation

    • character

    • date and time

    • mathematical

    • trigonometric

  • special

  • sample statistics

  • financial

  • random number

  • state and ZIP code


Syntax for sas functions
Syntax for SAS Functions

  • A SAS function is a routine that performs a computation or system manipulation and returns a value. Functions use arguments supplied by the user or by the operating environment.

  • General form of a SAS function:

function-name(argument-1,argument-2,…,argument-n)


Using sas functions
Using SAS Functions

  • You can use functions in executable DATA step statements anywhere that an expression can appear.

data contrib; set prog2.donate;Total=sum(Qtr1,Qtr2,Qtr3,Qtr4); if Total ge 50;run;proc print data=contrib noobs;run;


Using sas functions1
Using SAS Functions

  • Partial PROC PRINT Output

  • What if you want to sum Qtr1 through Qtr400, instead of Qtr1 through Qtr4?

ID Qtr1 Qtr2 Qtr3 Qtr4 Total

E00224 12 33 22 . 67

E00367 35 48 40 30 153

E00441 . 63 89 90 242

E00587 16 19 30 29 94

E00621 10 12 15 25 62


Sas variable lists
SAS Variable Lists

  • A SAS variable list is a shortcut method of referring to a list of variable names. SAS enables you to use the following variable lists:

    • numbered range lists

    • name range lists

    • name prefix lists

    • special SAS name lists

  • These methods can be used in many places where variable names are expected.


Sas variable lists numbered range list
SAS Variable Lists: Numbered Range List

  • Syntax: x1-xn

  • A numbered range list specifies all variables from x1 to xn inclusive (including the variables named).

  • You can begin with any number and end with any number. You must follow the rules for user-supplied variable names and the numbers must be consecutive.

proc print data=prog2.donate; var id Qtr2-Qtr4;

run;


Sas variable lists numbered range list1
SAS Variable Lists- Numbered Range List

  • What would be the result of this program if Qtr3 were not in the data set?

proc print data=prog2.donate; var id Qtr2-Qtr4;

run;


Sas variable lists numbered range list2
SAS Variable Lists- Numbered Range List

  • What would be the result of this program if Qtr3 were not in the data set?

proc print data=prog2.donate; var id Qtr2-Qtr4;

run;

Because the variable Qtr3 is not in the data set, you get an error in the log. The error message indicates that the variable does not exist.


Sas variable lists name range list

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Range List

  • Syntax: StartVarName--StopVarName

  • A name range list specifies all variables ordered as they are in the program data vector, from StartVarName to StopVarName, inclusive.

  • There are two hyphens.

proc print data=fakedata; var id Name--Salary;

run;


Sas variable lists name range list1

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Range List

  • What is the result of the following program?

  • The output contains ID followed by Name, State, and Salary.

proc print data=fakedata; var id Name--Salary;

run;


Sas variable lists name range list2

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Range List

  • Syntax: StartVarName-NUMERIC-StopVarName

  • StartVarName-CHARACTER-StopVarName

  • You can also use the keyword NUMERIC or CHARACTER inside the hyphens to select all the variables of that data type, inclusively.

proc print data=fakedata; var id Name-character-JobCode;

run;


Sas variable lists name range list3

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Range List

  • What is the result of the following program?

  • The output contains ID followed by Name, State, and Jobcode. Salary is not displayed because it is a numeric variable.

proc print data=fakedata; var id Name-character-Jobcode;

run;


Sas variable lists name prefix list

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Prefix List

  • Syntax: PartVarName:

  • Providing part of the variable name followed by a semicolon tells SAS that you want all the variables that start with that string.

  • The case of the string does not matter.

data fakedata2;

set fakedata; keep id S:;

run;


Sas variable lists name prefix list1

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Name Prefix List

  • What is the result of the following program?

  • The new data set contains ID followed by State and Salary.

data fakedata2;

set fakedata; keep id S:;

run;


Sas variable lists special prefix list

ID$4

Year8

Name$24

State$2

Salary8

Jobcode$3

SAS Variable Lists- Special Prefix List

  • Syntax: _ALL_

  • _NUMERIC_

  • _CHARACTER_

  • specifies either all variables, all numeric variables, or all character variables that are defined in the current DATA step.

  • The case of the keyword does not matter.


Sas variable lists1
SAS Variable Lists

  • When you use a SAS variable list in a SAS function, use the keyword OF in front of the first variable name in the list.

  • If you omit the OF keyword, the function subtracts Qtr4 from Qtr1.

data contrib; set prog2.donate; Total=sum(of Qtr1-Qtr4); if Total ge 50;run;


Section 12 2

Section 12.2

Manipulating Character Values


Objectives1
Objectives

  • Use SAS functions and operators to extract, edit, and search character values.


A mailing label application
A Mailing Label Application

  • The freqflyers data set contains information about frequent flyers.

  • Use this data set to create another data set suitable for mailing labels.


A mailing label application1
A Mailing Label Application

  • IDis a character variable. Its last digit represents the gender (1 denotes female, 2 denotes male) of the frequent flyer.

  • prog2.freqflyers

ID Name Address1 Address2

F31351 Farr,Sue 15 Harvey Rd. Macon,Bibb,GA,31298

F161 Cox,Kay B. 163 McNeil Pl. Kern,Pond,CA,93280

F212 Mason,Ron 442 Glen Ave. Miami,Dade,FL,33054

F25122 Ruth,G. H. 2491 Brady St. Munger,Bay,MI,48747


A mailing label application2
A Mailing Label Application

  • labels

  • The first task is to create a title of Mr. or Ms. based on the last digit of ID.

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298

Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280

Mr. Ron Mason 442 Glen Ave. Miami, FL 33054

Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747


The substr function right side
The SUBSTR Function (Right Side)

  • The SUBSTR function is used to extract or replace characters.

  • This form of the SUBSTR function (right side of the assignment statement) extracts characters.

NewVar=SUBSTR(string,start<,length>);


The substr function examples
The SUBSTR Function - Examples

  • If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the first argument to SUBSTR.


The substr function right side1

Location

$ 18

State

$ 18

Columbus, OH 43227

OH

The SUBSTR Function (Right Side)

  • Extract two characters from Location and start at position 11.

State=substr(Location,11,2);


A mailing label application3
A Mailing Label Application

  • In what position does the last digit of ID occur?

  • In some values, the last digit is in column 6 and in others it is in column 4.

proc print data=prog2.freqflyers noobs;

var ID;run;

PROC PRINT Output

ID

F31351

F161

F212

F25122


The right function

NewID

$ 6

ID

$ 6

F161

F161

The RIGHT Function

  • The RIGHT function returns its argument right-aligned.

  • Trailing blanks are moved to the start of the value.

NewVar=RIGHT(argument);

NewID=right(ID);

continued...


The right function1
The RIGHT Function

  • An argument can be a character constant, variable, or expression

  • If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.


The left function

ID

$ 6

NewID

$ 6

F161

F161

The LEFT Function

  • The LEFT function returns its argument left-aligned.

  • Trailing blanks are moved to the end of the value.

NewVar=LEFT(argument);

NewID=left(ID);


The left function1
The LEFT Function

  • An argument can be a character constant, variable or expression

  • If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.


A mailing label application4
A Mailing Label Application

data labels; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.';run;proc print data=labels noobs; var ID Title;run;

The result of the RIGHT function acts as the first argument to the SUBSTR function.


A mailing label application5
A Mailing Label Application

  • PROC PRINT Output

ID Title

F31351 Ms.

F161 Ms.

F212 Mr.

F25122 Mr.


A mailing label application6

Name

Farr,Sue

Cox,Kay B.

FMName

LName

Sue

Farr

Kay B.

Cox

A Mailing Label Application

  • The next task is to separate the names of the frequent flyers into two parts.


The scan function
The SCAN Function

  • The SCAN function returns the nth word of a character value.

  • It is used to extract words from a character value when the relative order of words is known, but their starting positions are not.

NewVar=SCAN(string,n<,delimiters>);


The scan function1
The SCAN Function

  • When the SCAN function is used,

    • the length of the created variable is 200 bytes if it is not previously defined with a LENGTH statement

    • delimiters before the first word have no effect

    • any character or set of characters can serve as delimiters

    • two or more contiguous delimiters are treated as a single delimiter

    • a missing value is returned if there are fewer than n words in string

    • if n is negative, the SCAN function selects the word in the character string starting from the end of string.


The scan function2

Phrase

$ 21

Second

$ 200

The SCAN Function

  • Extract the second word of Phrase.

Second=scan(Phrase,2,' ');

software and services

and

1

2

3

...


The scan function3

Phrase

$ 21

Second

$ 200

The SCAN Function

  • Extract the second word of Phrase.

Second=scan(Phrase,2,':');

software and:services

software and:services

services

services

1

2


The scan function4
The SCAN Function

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

...


The scan function5
The SCAN Function

4

1

2

3

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Var1

$ 200

Thursday

...


The scan function6
The SCAN Function

4

1

2

3

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Var1

$ 200

Var2

$ 200

Thursday

1776

...


The scan function7
The SCAN Function

4

1

2

3

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Var1

$ 200

Var2

$ 200

Var3

$ 200

Thursday

1776

missing

...


The scan function8

space

The SCAN Function

1

2

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Var1

$ 200

Var2

$ 200

Var3

$ 200

Var4

$ 200

Thursday

1776

1776)

...


The scan function9

Var1

$ 200

Var2

$ 200

Var3

$ 200

Var4

$ 200

Var5

$ 200

space

The SCAN Function

1

2

  • data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Thursday

1776

1776)

1776


A mailing label application7
A Mailing Label Application

  • data labels; length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,',');run;


A mailing label application8
A Mailing Label Application

proc print data=labels noobs; var ID Name Title FMName LName;run;

PROC PRINT Output

ID Name Title FMName LName

F31351 Farr,Sue Ms. Sue Farr

F161 Cox,Kay B. Ms. Kay B. Cox

F212 Mason,Ron Mr. Ron Mason

F25122 Ruth,G. H. Mr. G. H. Ruth

The next task is to join the values of Title, FMName,and LName into another variable.


Concatenation operator
Concatenation Operator

  • The concatenation operator joins character strings.

  • Depending on the characters available on your keyboard, the symbol to concatenate character values can be two exclamation points (!!), two vertical bars (||), or two broken vertical bars (¦¦).

NewVar=string1 !! string2;


Concatenation operator1

FMName

$ 10

LName $ 10

FULLName

$ 20

Sue

Farr

Sue Farr

Concatenation Operator

  • Combine FMName and LName to create FullName.

FullName=FMName !! LName;


The trim function
The TRIM Function

  • The TRIM function removes trailing blanks from its argument.

  • If the argument is blank, the TRIM function returns one blank.

  • The TRIMN function is similar but returns a null string (zero blanks) if the argument is blank.

NewVar=TRIM(argument1)!! argument2;


The trim function1
The TRIM Function

data trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1

$ 10

Sue

...


The trim function2
The TRIM Function

data trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1

$ 10

FullName2

$ 20

Sue

SueFarr

...


The trim function3
The TRIM Function

data trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1

$ 10

FullName2$ 20

FullName3$ 21

Sue

SueFarr

Sue Farr


The trim function4
The TRIM Function

  • The TRIM function does not remove leading blanks from a character argument. Use a combination of the TRIM and LEFT functions to remove leading and trailing blanks from a character argument.

  • If FMName contained leading blanks, the following assignment statement would correctly concatenate FMName and LName into FullName.

FullName=trim(left(FMName)) !! ' ' !! LName;


A mailing label application9
A Mailing Label Application

data labels(keep=FullName Address1 Address2); length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,','); FullName=Title !! ' ' !! trim(FMName) !! ' ' !! LName; Address2=scan(Address2,1,',') !! ', ' !! scan(Address2,3,',') !! ' ' !! scan(Address2,4,',');run;


A mailing label application10
A Mailing Label Application

proc print data=labels noobs; var FullName Address1 Address2;run;

PROC PRINT Output

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298

Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280

Mr. Ron Mason 442 Glen Ave. Miami, FL 33054

Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747


The catx function
The CATX Function

  • The CATX function concatenates character strings, removes leading and trailing blanks, and inserts separators.

CATX(separator, string-1, … string-n)


A mailing label application11
A Mailing Label Application

data labels(keep=FullName Address1 Address2); length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title = 'Ms.'; else if substr(right(ID),6)='2' then Title = 'Mr.'; FMName = scan(Name,2,','); Lname = scan(Name,1,','); FullName = catx(' ',Title,FMName,LName);

Address2 = catx(' ', scan(Address2,1,',') || ',', scan(Address2,3,','),

scan(Address2,4,','));

run;


A mailing label application12
A Mailing Label Application

proc print data=labels noobs; var FullName Address1 Address2;run;

PROC PRINT Output

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298

Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280

Mr. Ron Mason 442 Glen Ave. Miami, FL 33054

Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747


Exercise
Exercise

  • This exercise reinforces the concepts discussed previously.


Exercises

  • The MIT Admissions Office received a list of students with perfect SAT scores. The file must be in a format that Admissions can use. Use the People data set to create a temporary SAS data set named Separate that contains the variables First, MI, and Last to perform the following tasks:

    • Create a First and MI variable that contains each person's first name and middle initial. Do not include in the Separate data set.

    • Use the Separate data set to create a temporary data set called flname that contains the variables NewName and CityState. NewName should be a concatenation of each person's first name and last name with one space between them.

    • Create a list report to view the results.


Exercises – Solution 1

libname prog2 'your-directory';

data separate(drop=FMnames);

length FMnames First $ 30

MI $ 2

Last $ 30;

set prog2.people;

FMnames = left(scan(Name,2,','));

First = scan(FMnames,1,' ');

MI = left(scan(FMnames,2,' '));

Last = scan(Name,1,',');

run;

proc print data=separate;

var Name CityState First MI Last;

run;


Exercises – Solution 2

libname prog2 'your-directory';

data flname(keep=NewName CityState);

length FMname First MI Last $ 30;

set prog2.people;

Last = scan(Name,1,',');

FMname = left(scan(Name,2,','));

First = scan(FMname,1,' ');

MI = scan(FMname,2,' ');

NewName = trim(First) !! ' ' !! Last;

run;

proc print data=flname;

var NewName CityState;

run;


Exercises - Output

The SAS System

Obs NewName CityState

1 LINDSAY DEAN WILMINGTON, NC

2 HELEN-ASHE FLORENTINO WASHINGTON, DC

3 JAN VAN ALLSBURG SHORT HILLS, NJ

4 STANLEY LAFF SPRINGFIELD, IL

5 GEORGE RIZEN CHICAGO, IL

6 MARC MITCHELL CHICAGO, IL

7 DOROTHY MILLS JOE, MT

8 JONATHAN WEBB MORRISVILLE, NC

9 MAYNARD KEENAN SEDONA, AZ

10 PHYLLIS LACK WALTHAM, MA

11 KERRY THOMPSON WINTER PARK, FL

12 DOROTHY COX TIMONIUM, MD

13 DONALD SEPTOFF BOSTON, MA

14 JANICE PHOENIX SOMERVILLE, NJ

15 MURRAY HUNEYCUTT DIME BOX, TX

16 SHERRY ERICKSON EL PASO, TX

17 CLIVE SCHNEIDER CAPE MAY, NJ

18 KIMBERLY PUTNAM DUNWOODY, GA

19 JENNIFER PITTMAN BENNINGTON, VT

20 STACY ROLEN CODY, WY


Exercises

The MIT Admissions Office likes to review applications based on merit and remove as much identifiable material as possible. They choose to label each application folder with an applicants' initials only.

Using the Separate data set that you recently created, create a temporary data set called init that contains only the variables Name and Initials.

The value of Initials should be a concatenation of the first characters from each person's first name, middle initial, and last name with no delimiters separating the characters.


Exercises

data init(drop=FName MName LName FMNames);

length Initials $ 3 LName FMNames FName MName $ 30;

set prog2.People;

LName=scan(Name,1,',');

FMNames=scan(Name,2,',');

FName=scan(FMNames,1,' ');

MName=scan(FMNames,2,' ');

/* Put together just the first letters. */

Initials=substr(FName,1,1) !!

substr(MName,1,1) !!

substr(LName,1,1);

run;

proc print data=init;

var Name CityState Initials;

run;


Exercises

The SAS System

Obs Name CityState Initials

1 DEAN, LINDSAY A. WILMINGTON, NC LAD

2 FLORENTINO, HELEN-ASHE H. WASHINGTON, DC HHF

3 VAN ALLSBURG, JAN F. SHORT HILLS, NJ JFV

4 LAFF, STANLEY X. SPRINGFIELD, IL SXL

5 RIZEN, GEORGE Q. CHICAGO, IL GQR

6 MITCHELL, MARC J. CHICAGO, IL MJM

7 MILLS, DOROTHY E. JOE, MT DEM

8 WEBB, JONATHAN W. MORRISVILLE, NC JWW

9 KEENAN, MAYNARD J. SEDONA, AZ MJK

10 LACK, PHYLLIS M. WALTHAM, MA PML

11 THOMPSON, KERRY L. WINTER PARK, FL KLT

12 COX, DOROTHY E. TIMONIUM, MD DEC

13 SEPTOFF, DONALD E. BOSTON, MA DES

14 PHOENIX, JANICE A. SOMERVILLE, NJ JAP

15 HUNEYCUTT, MURRAY Y. DIME BOX, TX MYH

16 ERICKSON, SHERRY A. EL PASO, TX SAE

17 SCHNEIDER, CLIVE J. CAPE MAY, NJ CJS

18 PUTNAM, KIMBERLY M. DUNWOODY, GA KMP

19 PITTMAN, JENNIFER R. BENNINGTON, VT JRP

20 ROLEN, STACY D. CODY, WY SDR


A search application
A Search Application

  • The ffhistory data set contains information about the history of each frequent flyer.

  • This history information consists of

    • each membership level that the flyer attained (Bronze, Silver, or Gold)

    • the year that the flier attained each level.

  • Create a report that shows all frequent flyers who attained Silver membership status and the year each of them became Silver members.


A search application1
A Search Application

  • ffhistory

  • To determine who attained Silver membership status, search the Status variable for the value Silver.

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F161 Bronze 1999 WINDOW

F212 Bronze 1992,silver 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE


The find function
The FIND Function

  • The FIND function searches for a specific substring of characters within a character string that you specify, and returns its location.

  • The FIND function returns

    • the starting position of the first occurrence of value within target, if value is found

    • 0, if value is not found.

Position = FIND(target,value<,modifiers,startpos>);


The find function1
The FIND Function

  • A modifier can be the value of I or T. I indicates that the search is case insensitive. T indicates that the search ignores trailing blanks. They can be combined. If they are omitted, the search is case sensitive and trailing blanks are taken into consideration.

  • The startpos is an integer that specifies the position at which the search should start and the direction of the search.

  • Positive values = forward (right)

  • Negative values= backward (left)

  • If omitted, the search starts at position 1 and moves right.


The find function2

24

Text

$ 34

PosN 8

This target contains a BULL'S-EYE.

24

The FIND Function

  • Determine whether Text contains the string BULL'S-EYE.

Text="This target contains a BULL'S-EYE.";

Pos=find(Text,"BULL'S-EYE");


The find function3

6

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');

run;

Pos1

N 8

6

...


The find function4

8

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');

run;

Pos1

N 8

Pos2

N 8

6

8

...


The find function5
The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');

run;

Pos1

N 8

Pos2

N 8

Pos3

N 8

6

8

0

...


The find function6

Pos4

N 8

Pos1

N 8

Pos2

N 8

Pos3

N 8

6

6

8

0

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');

run;


A search application2
A Search Application

prog2.ffhistory

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F161 Bronze 1999 WINDOW

F212 Bronze 1992,silver 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

data silver; set prog2.ffhistory; if find(Status,'silver','I') > 0;run;


A search application3
A Search Application

proc print data=silver noobs;run;

PROC PRINT Output

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F212 Bronze 1992,silver 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE


The index function
The INDEX Function

  • The INDEX function searches a character argument for the location of a specified character value and returns its location.

  • The INDEX function returns

    • the starting position of the first occurrence of value within target, if value is found

    • 0, if value is not found.

Position=INDEX(target,value);


The index function1
The INDEX Function

  • Target = specifies the character expression to search.

  • Value= specifies the string of characters to search for in the character expression.

  • The search for value is literal. Capitalization and blanks are considered.

  • The INDEX function differs from the FIND function :

    • does not have a modifier

    • does not have startpos functionality


The index function2

24

TEXT

$ 34

PosN 8

This target contains a BULL'S-EYE.

24

The INDEX Function

  • Determine whether Text contains the string BULL'S-EYE.

Text="This target contains a BULL'S-EYE.";

Pos=index(Text,"BULL'S-EYE");


The index function3

6

Pos1

N 8

6

The INDEX Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;

...


The index function4

8

Pos1

N 8

Pos2

N 8

6

8

The INDEX Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;

...


The index function5

Pos1

N 8

Pos2

N 8

Pos3

N 8

6

8

0

The INDEX Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;


The index function6
The INDEX Function

data index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

String

$ 5

IT

...


The index function7
The INDEX Function

data index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

String

$ 5

Pos4

N 8

IT

0

...


The index function8

6

Pos5

N 8

6

The INDEX Function

data index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

String

$ 5

Pos4

N 8

IT

0

...


The index function9

8

Pos5

N 8

Pos6

N 8

6

8

The INDEX Function

data index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

String

$ 5

Pos4

N 8

IT

0


A search application4
A Search Application

prog2.ffhistory

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F161 Bronze 1999 WINDOW

F212 Bronze 1992,silver 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

data silver; set prog2.ffhistory; if index(Status,'Silver') > 0;run;


A search application5
A Search Application

proc print data=silver noobs;run;

  • Why was F212 not selected?

PROC PRINT Output

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE


A search application6
A Search Application

proc print data=silver noobs;run;

  • Why was F212 not selected?

  • For F212, silver was stored in lowercase. You searched for Silver.

PROC PRINT Output

Seat

ID Status Pref

F31351 Silver 1998,Gold 2000 AISLE

F25122 Bronze 1994,Gold 1996,Silver 1998 AISLE


The upcase function
The UPCASE Function

  • The UPCASE function

    • converts all letters in its argument to uppercase

    • has no effect on digits and special characters.

NewVal=UPCASE(argument);


A search application7
A Search Application

  • data silver(drop=Location); length Year $ 4; set prog2.ffhistory; Location=index(upcase(Status),'SILVER'); if Location > 0; Year=substr(Status,Location+7,4);run;proc print data=silver noobs; var ID Status Year SeatPref;run;


A search application8

Location

Location+7

A Search Application

PROC PRINT Output

  • Did the values of Status permanently change?

Year=substr(Status,Location+7,4);

Seat

ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 AISLE

F212 Bronze 1992,silver 1995 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE


The propcase function
The PROPCASE Function

  • The PROPCASE function converts all words in an argument to proper case, in which the first letter is uppercase and the remaining letters are lowercase.

  • Default delimiters for the PROPCASE function are the blank, forward slash, hyphen, open parenthesis, period, and tab characters.

NewVal=PROPCASE(argument <,delimiter(s)>);


A search application9
A Search Application

data silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=propcase(Status,' ,'); Location=find(Status,'Silver'); if Location > 0;

SeatPref=propcase(SeatPref);

Year=substr(Status,Location+7,4);run;proc print data=silver noobs; var ID Status Year SeatPref;run;


A search application10
A Search Application

PROC PRINT Output

Seat

ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 Aisle

F212 Bronze 1992,Silver 1995 1995 Window

F25122 Bronze 1994,Gold 1996,Silver 1998 1998 Aisle


The tranwrd function
The TRANWRD Function

  • The TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string.

  • The TRANWRD function does not remove trailing blanks from target or replacement.

  • source source string that you want translated

  • target string searched for in source

  • replacement string that replaces the target

NewVal=TRANWRD(source,target,replacement);


The tranwrd function1
The TRANWRD Function

  • If the length of the new variable is not previously defined with a LENGTH statement, the default length is 200 bytes.

  • Using the TRANWRD function to replace an existing string with a longer string might cause truncation of the resulting value if a LENGTH statement is not used.


The tranwrd function2

Dessert

$ 20

Dessert

$ 20

Pumpkin pie

Apple pie

The TRANWRD Function

  • Replace the first word of Dessert.

Dessert=tranwrd(Dessert,'Pumpkin','Apple');


A search application11
A Search Application

data silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=tranwrd(Status,'silver','Silver'); Location=index(Status,'Silver'); if Location > 0; Year=substr(Status,Location+7,4);run;proc print data=silver noobs; var ID Status Year SeatPref;run;


A search application12
A Search Application

PROC PRINT Output

Seat

ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 AISLE

F212 Bronze 1992,Silver 1995 1995 WINDOW

F25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE


The lowcase function
The LOWCASE Function

  • The LOWCASE function

    • converts all letters in its argument to lowercase

    • has no effect on digits and special characters.

NewVal=LOWCASE(argument);


The substr function left side
The SUBSTR Function (Left Side)

  • The SUBSTR function is used to extract or replace characters.

  • This form of the SUBSTR function (left side of the assignment statement) replaces characters in a character variable.

SUBSTR(string,start<,length>)=value;


The substr function left side1
The SUBSTR Function (Left Side)

  • string specifies a character variable

  • start specifies a numeric expression that is the beginning character position

  • length specifies a numeric expression that is the length of the substring to be replaced

  • The length value cannot be larger than the remaining length of string after start (including trailing blanks).

  • If you omit length, SAS uses all the characters on the right side of the assignment statement to replace the values of string up to the limit.


The substr function left side2

Location

$ 18

Location

$ 18

Columbus, OH 43227

Columbus, GA 43227

The SUBSTR Function (Left Side)

  • Replace two characters from Location starting at position 11.

substr(Location,11,2)='OH';


The lowcase function1

SeatPref

$ 6

AISLE

The LOWCASE Function

data silver; set silver;

substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

...


The lowcase function2

SeatPref

$

6

AISLE

The LOWCASE Function

data silver; set silver;

substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

isle

...


The lowcase function3

SeatPref

$

6

SeatPref

$

6

AISLE

A

The LOWCASE Function

data silver; set silver;

substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

isle

isle


A search application13
A Search Application

proc print data=silver noobs; var ID Year SeatPref;run;

PROC PRINT Output

Seat ID Year Pref F31351 1998 Aisle

F212 1995 Window

F25122 1998 Aisle


Exercise1
Exercise

  • This exercise reinforces the concepts discussed previously.


Exercises

The Pizza Company tracks feedback from customers. The person who recorded the data misspelled the word received in the data set Complaint.

Write a program to correct this data mistake. Create a new data set called Complaints2 that contains the corrected data. Create a list report to view the results.


Exercises – Solution

libname apex 'SAS-directory';

data complaints2;

set apex.complaint;

Complaint = tranwrd(Complaint, 'recieve', 'receive');

run;

proc print data=complaints2;

run;



Exercises

  • The SAS Snack Company wants to create categories for snacks. Using the Snacks data set, create a temporary SAS data set named Snack_new.

    • Create a variable called Category. The value will depend on the product and should be one of the following: Chips, Pretzels, Pretzel Sticks, Popcorn, Pork Rinds, Crackers. or Puffs. (Note: Pretzel Sticks should go to the Pretzel Category, as well as other snacks with Pretzel in the name.)

    • Create a variable called TotalSales that is the product of the quantity sold and the price. Format the value with dollar signs and two decimal places.

    • Create a frequency report of Category to make sure that all the rows are mapped correctly.

    • Create a list report to view the results.


Exercises - Solution

data snack_new;

set snacks;

length Category $ 12;

if index(lowcase(Product), 'chip') >0then Category = "Chips";

elseif index(lowcase(Product), 'pretzel') >0then Category = "Pretzels";

elseif index(lowcase(Product), 'stick') >0then Category = "Sticks";

elseif index(lowcase(Product), 'popcorn') >0then Category = "Popcorn";

elseif index(lowcase(Product), 'pork rinds') >0then

Category = "Pork Rinds";

elseif index(lowcase(Product), 'cracker') >0 or

index(lowcase(Product), 'saltine') >0then Category = "Crackers";

elseif index(lowcase(Product), 'puffs') >0then Category = "Puffs";

TotalSales = QtySold*Price;

format TotalSales dollar20.2;

run;

procfreqdata=snack_new;

tables category;

run;


Exercises - Output

The FREQ Procedure

Cumulative Cumulative

Category Frequency Percent Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chips 11242 31.43 11242 31.43

Crackers 5110 14.29 16352 45.71

Popcorn 4088 11.43 20440 57.14

Pork Rinds 3066 8.57 23506 65.71

Pretzels 3066 8.57 26572 74.29

Puffs 3066 8.57 29638 82.86

Sticks 6132 17.14 35770 100.00


Exercises

As part of the financial aid process, MIT helps students identify scholarship opportunities from external sources.

Use the People data set to create the temporary Prairie data set.

Use the appropriate function to search through values of CityState to identify only those applicants from Illinois.


Exercises

data prairie;

set prog2.people;

if index(CityState,' IL') > 0;

run;

proc print data=Prairie;

run;

The SAS System

Obs Name CityState

1 LAFF, STANLEY X. SPRINGFIELD, IL

2 RIZEN, GEORGE Q. CHICAGO, IL

3 MITCHELL, MARC J. CHICAGO, IL


Exercises

  • In order to mail the scholarship information to the proper applicants, MIT need the address information in the proper format for a letter.

    • Use the variable Name from Prairie to create a data set called mixedprairie that contains the values of Name.

    • Convert Name from uppercase to mixed case.


Exercises

data mixedprairie;

set prairie;

Name = propcase(Name);

run;

proc print data=mixedprairie;

run;


Section 12 3

Section 12.3

Manipulating Numeric Values


Objectives2
Objectives

  • Use SAS functions to truncate numeric values.

  • Use SAS functions to compute sample statistics of numeric values.


Truncation functions
Truncation Functions

  • Selected functions that truncate numeric values include

    • ROUND function

    • CEIL function

    • FLOOR function

    • INT function.


The round function
The ROUND Function

  • The ROUND function returns a value rounded to the nearest round-off unit.

  • If round-off-unit is not provided, argument is rounded to the nearest integer.

NewVar=ROUND(argument<,round-off-unit>);


The round function1

NewVar1

12

The ROUND Function

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...


The round function2

NewVar1

NewVar2

12

42.7

The ROUND Function

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...


The round function3

NewVar1

NewVar2

NewVar3

12

42.7

6.48

The ROUND Function

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...


The round function4

NewVar1

NewVar2

NewVar3

NewVar4

12

42.7

6.48

100

The ROUND Function

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;


The ceil function

4

3

5

The CEIL Function

  • The CEIL function returns the smallest integer greater than or equal to the argument.

NewVar=CEIL(argument);

X

4.4

x=ceil(4.4);

x=5;


The floor function

4

3

5

The FLOOR Function

  • The FLOOR function returns the greatest integer less than or equal to the argument.

NewVar=FLOOR(argument);

X

3.6

y=floor(3.6);

y=3;


The int function

4

3

5

X

3.9

The INT Function

  • The INT function returns the integer portion of the argument.

NewVar=INT(argument);

z=int(3.9);

z=3;


Truncation functions1

Var1

NewVar1

6.478

7

Truncation Functions

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

...


Truncation functions2

Var1

NewVar1

NewVar2

6.478

7

6

Truncation Functions

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

...


Truncation functions3

Var1

NewVar1

NewVar2

NewVar3

6.478

7

6

6

Truncation Functions

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;


Truncation functions4

Var1

NewVar1

NewVar2

NewVar3

-6.478

-6

-7

-6

Truncation Functions

Use the same functions with a negative value for the variable Var1.

For values greater than 0, the FLOOR and INT functions return the same value. For values less than 0, the CEIL and INT functions return the same value.

data truncate; Var1=-6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;


Functions that compute statistics
Functions That Compute Statistics

  • Selected functions that compute sample statistics based on a group of values include the following:

    • SUM function (total of values)

    • MEAN function (average of values)

    • MIN function (lowest value)

    • MAX function (highest value)

  • These functions

    • accept multiple arguments in any order

    • use the same algorithm as SAS statistical procedures

    • ignore missing values.


Functions that compute statistics1
Functions That Compute Statistics

  • The SUM function adds values together and ignores missing values.

  • The MIN function returns the smallest non-missing value.

  • The MAX function returns the largest value.

NewVar=SUM(argument-1,argument-2,…,argument-n);

MIN(argument-1, argument-2,…, argument-n);

MAX(argument-1, argument-2,…, argument-n);


The sum function

Var1

Var2

Var3

NewVar

12

.

6

18

The SUM Function

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

...


The sum function1

Var1

Var2

Var3

NewVar

12

.

6

18

The SUM Function

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?

...


The sum function2

Var1

Var2

Var3

NewVar

12

.

6

18

The SUM Function

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?

Missing


The mean function
The MEAN Function

  • The MEAN function calculates the arithmetic mean (average) of values and ignores missing values.

NewVar=MEAN(argument-1,argument-2,…,argument-n);


The mean function1

Var1

Var2

Var3

NewVar

12

.

6

9

The MEAN Function

data summary; Var1=12; Var2=.; Var3=6; NewVar=mean(Var1,Var2,Var3);run;


Exercise2
Exercise

  • This exercise reinforces the concepts discussed previously.


Exercises

Final grades are coming.

Use the data set Grade to create a data set named Final. The Final data set should contain a variable named Overall that is the semester average grade.

Calculate Overall by averaging all the tests plus the final. The final is weighted twice as much as any of the other tests. (Count the final twice when calculating Overall.)

Round Overall to the nearest integer.


Exercises

data final;

set prog.grade;

Overall=round(mean(Test1,Test2,Test3,Final,Final));

run;

proc print data=final;

run;

Alternate Solution

data final;

set prog.grade;

Overall=round(mean(of Test1-Test3,Final,Final));

proc print data=final;

run;


Exercises

The SAS System

SSN Course Test1 Test2 Test3 Final Overall

012-40-4928 BUS450 80 70 80 80 78

012-83-3816 BUS450 90 90 60 80 80

341-44-0781 MATH400 78 87 90 91 87

423-01-7721 BUS450 80 70 75 95 83

448-23-8111 MATH400 88 91 100 95 94

723-14-8422 HIST100 88 90 91 95 92

819-32-1294 HIST100 67 80 60 70 69

831-34-2411 MATH400 72 76 82 79 78

837-33-8374 HIST100 90 99 87 96 94

877-22-7731 MATH400 87 85 80 78 82

880-90-0783 HIST400 50 70 78 80 72

920-22-0209 MATH400 79 87 81 82 82

973-34-2119 BUS450 80 75 88 90 85

877-22-7731 SCI400 80 70 80 80 78

012-40-4928 FRENCH100 80 70 80 80 78

819-32-1294 FRENCH100 67 80 60 70 69

819-32-1294 BUS450 67 80 60 70 69

723-14-8422 SCI400 79 87 81 82 82

837-33-8374 SCI400 79 87 81 82 82


Exercises

Final grades are coming.

Modify the DATA step in the previous exercise so that the value of Overall is the average of the two highest test scores and the final. (The lowest test should not be used to calculate Overall.)

As before, the final exam should be counted twice.

Round Overall to the nearest integer.


Exercises

data final(drop=OverallTotal);

set prog2.grade;

OverallTotal=sum(Test1,Test2,Test3,Final,Final)-

min(Test1,Test2,Test3);

Overall=round(OverallTotal/4);

run;

proc print data=final;

run;

Alternate Solution

data final(drop=OverallTotal);

set prog2.grade;

OverallTotal=sum(of Test1-Test3,Final,Final)-

min(of Test1-Test3);

Overall=round(OverallTotal/4);

proc print data=final;

run;


Exercises

The SAS System

Obs SSN Course Test1 Test2 Test3 Final Overall

1 012-40-4928 BUS450 80 70 80 80 80

2 012-83-3816 BUS450 90 90 60 80 85

3 341-44-0781 MATH400 78 87 90 91 90

4 423-01-7721 BUS450 80 70 75 95 86

5 448-23-8111 MATH400 88 91 100 95 95

6 723-14-8422 HIST100 88 90 91 95 93

7 819-32-1294 HIST100 67 80 60 70 72

8 831-34-2411 MATH400 72 76 82 79 79

9 837-33-8374 HIST100 90 99 87 96 95

10 877-22-7731 MATH400 87 85 80 78 82

11 880-90-0783 HIST400 50 70 78 80 77

12 920-22-0209 MATH400 79 87 81 82 83

13 973-34-2119 BUS450 80 75 88 90 87

14 877-22-7731 SCI400 80 70 80 80 80

15 012-40-4928 FRENCH100 80 70 80 80 80

16 819-32-1294 FRENCH100 67 80 60 70 72

17 819-32-1294 BUS450 67 80 60 70 72

18 723-14-8422 SCI400 79 87 81 82 83

19 837-33-8374 SCI400 79 87 81 82 83


Section 12 4

Section 12.4

Manipulating Numeric Values Based on Dates


Objectives3
Objectives

  • Review SAS functions used to create SAS date values.

  • Review SAS functions used to extract information from SAS date values.

  • Use SAS functions to determine the intervals between two SAS date values.


Creating sas date values
Creating SAS Date Values

  • You can use the MDY or TODAY functions to create SAS date values.

  • The MDY function creates a SAS date value from month, day, and year values.

  • The TODAY function returns the current date as a SAS date value.

NewDate=MDY(month,day,year);

NewDate=TODAY();


Extracting information
Extracting Information

  • You can use the MONTH, DAY, and YEAR functions to extract information from SAS date values.

  • The MONTH function creates a numeric value (1-12) that represents the month of a SAS date value.

NewMonth=MONTH(SAS-date-value);

continued...


Extracting information1
Extracting Information

  • The DAY function creates a numeric value (1-31) that represents the day of a SAS date value.

  • The YEAR function creates a four-digit numeric value that represents the year.

NewDay=DAY(SAS-date-value);

NewYear=YEAR(SAS-date-value);

continued...


Calculating an interval of years
Calculating an Interval of Years

  • The YRDIF function returns the number of years between two SAS date values.

NewVal=YRDIF(sdate,edate,basis);

continued...



The yrdif function
The YRDIF Function

  • The variable DOB represents a person's date of birth. Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

  • The DATDIF function can be used to return the number of days between two SAS date values.

  • Only two basis values are valid for the DATDIF function:

  • 'ACT/ACT' and '30/360')

MyVal=yrdif(DOB,'3may2008'd,'act/act');


The yrdif function1

MyVal

35.4836

The YRDIF Function

  • The variable DOB represents a person's date of birth. Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

MyVal=yrdif(DOB,'3may2008'd,'act/act');

...


The yrdif function2

MyVal

35.4836

The YRDIF Function

  • The variable DOB represents a person's date of birth. Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

  • How can you alter this program to

    • compute each employee's age based on today's date

    • truncate all of the decimal places without rounding?

MyVal=yrdif(DOB,'3may2008'd,'act/act');

...


The yrdif function3

MyVal

35

The YRDIF Function

  • How can you alter this program to

    • compute each employee's age based on today's date

    • truncate all of the decimal places without rounding?

  • This code was run on July 13, 2008. Your values will differ.

int(yrdif(DOB, today(),'act/act'));


Exercise3
Exercise

  • This exercise reinforces the concepts discussed previously.


Exercises

  • In order to vote in most states, voters must be 18 years of age by the date of the election.

    • Use the Register data set to create two new data sets called Voters and NonVoters.

    • Use the existing BirthMonth, Day, and BirthYear to create a new variable called Birthday that stores the SAS date value for each voter's birth date.

    • Create a second new variable called Age that stores the number of years between each voter's birthday and today.

    • The values of Birthday should be displayed with the DATE9. format. The value of Age should be truncated to remove all decimals without rounding.

    • Produce two listing reports with appropriate titles.


Exercises

data voters nonvoters;

keep StudentId Name Birthday Age;

set prog.register;

Birthday=mdy(BirthMonth,Day,BirthYear);

/* The FLOOR function could be used in the following

assignment statement. */

Age=int(yrdif(Birthday,today(),'act/act'));

format Birthday date9.;

if Age >= 18 then output voters;

else output nonvoters;

run;

title 'Students who are over 18 and can vote';

proc print data=voters;

run;

title 'Students who are not over 18 and cannot vote';

proc print data=nonvoters;

run;


Exercises

Partial Listing of Voters

Students who are over 18 and can vote

Student

Obs ID Name Birthday Age

1 1155 Angel Reisman 23JUN1987 19

2 1266 Melanie Michaels 17MAR1988 18

3 2055 Faith Sadowski 06FEB1988 18

4 2561 Dorothy Gilbert 16APR1988 18

5 2584 Patrice Ray 18AUG1988 18

6 2587 Jeremiah Ashford 26SEP1987 19

7 2600 Alisha Gurman 21DEC1987 18

8 2606 Gustavo Spencer 16SEP1988 18

9 2681 Ryan Lin 03MAR1988 18

10 3213 Thomas Gladwell 09JUN1988 18

11 3250 Misty Orlowski 22OCT1987 19

12 3456 Ruby Abdul 09JUN1988 18

Your output will contain a different number of rows.


Exercises

Partial Listing of NonVoters

Students who are not over 18 and cannot vote

Student

Obs ID Name Birthday Age

1 1005 Chaz Richardson 21JUN1989 17

2 1154 Barbara Muir 04APR1990 16

3 1245 Leticia Ritter 27MAR1990 16

4 1257 Richard Calle 01SEP1989 17

5 1258 Ronnie Trimpin 03OCT1989 17

6 2001 Troy Pruska 05AUG1989 17

7 2006 Annie Ritter 25FEB1990 16

8 2046 Derrick Ikiz 05MAR1989 17

9 2334 Jesse Liu 03AUG1991 15

10 2335 Taylor Lowet 07AUG1989 17

. . .

Your output will contain a different number of rows.


Section 12 5

Section 12.5

Converting Variable Type


Objectives4
Objectives

  • Describe the automatic conversion of character data into numeric data.

  • Explicitly convert character data into numeric data.

  • Describe the automatic conversion of numeric data into character data.

  • Explicitly convert numeric data into character data.


Data conversion
Data Conversion

  • In many applications, you might need to convert one data type to another.

    • You might need to read digits in character form into a numeric value.

    • You might need to write a numeric value to a character string.


Data conversion1
Data Conversion

  • You can convert data types by using one of the following methods:

    • implicitly by enabling SAS to do it for you

    • explicitly with these functions:

      • INPUT character-to-numeric conversion

      • PUT numeric-to-character conversion


Automatic character to numeric conversion
Automatic Character-to-Numeric Conversion

  • The prog2.salary1 data set contains a character variable GrossPay. Compute a 10-percent bonus for each employee.

  • What will happen when the character values of GrossPay are used in an arithmetic expression?


Automatic character to numeric conversion1
Automatic Character-to-Numeric Conversion

  • prog2.salary1

ID GrossPay

$11 $5

201-92-2498 52000 482-87-7945 32000 330-40-7172 49000

data bonuses; set prog2.salary1; Bonus=.10*GrossPay;run;


Automatic character to numeric conversion2
Automatic Character-to-Numeric Conversion

  • Partial Log

2 data bonuses;

3 set prog2.salary1;

4 Bonus=.10*GrossPay;

5 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

4:14

NOTE: The data set WORK.BONUSES has 3 observations and 3 variables.


Automatic character to numeric conversion3
Automatic Character-to-Numeric Conversion

  • proc print data=bonuses noobs; run;

PROC PRINT Output

ID GrossPay Bonus

201-92-2498 52000 5200

482-87-7945 32000 3200

330-40-7172 49000 4900


Automatic character to numeric conversion4
Automatic Character-to-Numeric Conversion

  • SAS automatically converts a character value to a numeric value when the character value is used in a numeric context, such as the following:

    • assignment to a numeric variable

    • an arithmetic operation

    • logical comparison with a numeric value

    • a function that takes numeric arguments


Automatic character to numeric conversion5
Automatic Character-to-Numeric Conversion

  • The automatic conversion

    • uses the w. informat

    • produces a numeric missing value from a character value that does not conform to standard numeric notation (digits with optional decimal point and/or leading sign and/or E-notation).


Automatic character to numeric conversion6

52000

52000

-8.96

-8.96

1.243E1

12.43

1,742.64

.

Automatic Character-to-Numeric Conversion

Character

value

Numeric

value

Automatic conversion


The input function
The INPUT Function

  • The INPUT function is used primarily for converting character values to numeric values.

  • The INPUT function returns the value produced when source is read with informat.

NumVar=INPUT(source,informat);


The input function1
The INPUT Function

  • data conversion; CVar1='32000'; CVar2='32,000'; CVar3='03may2008'; CVar4='050308'; NVar1=input(CVar1,5.); NVar2=input(CVar2,comma6.); NVar3=input(CVar3,date9.); NVar4=input(CVar4,mmddyy6.);run;


The input function2
The INPUT Function

  • Partial PROC CONTENTS Output

proc contents data=conversion; run;

----Alphabetic List of Variables and Attributes----

# Variable Type Len

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 CVar1 Char 5

2 CVar2 Char 6

3 CVar3 Char 9

4 CVar4 Char 6

5 NVar1 Num 8

6 NVar2 Num 8

7 NVar3 Num 8

8 NVar4 Num 8


The input function3
The INPUT Function

  • proc print data=conversion noobs; run;

PROC PRINT Output

CVar1 CVar2 CVar3 CVar4 NVar1

32000 32,000 03may2008 050308 32000

NVar2 NVar3 NVar4

32000 17655 17655


Explicit character to numeric conversion
Explicit Character-to-Numeric Conversion

  • The values of the variable GrossPay in the SAS data set prog2.salary2 contain commas. Attempt to use automatic conversion to compute a 10-percent bonus.

  • prog2.salary2

ID GrossPay

$11 $6

201-92-2498 52,000 482-87-7945 32,000 330-40-7172 49,000


Explicit character to numeric conversion1
Explicit Character-to-Numeric Conversion

  • data bonuses; set prog2.salary2; Bonus=.10*GrossPay;run;proc print data=bonuses;run;

PROC PRINT Output

ID GrossPay Bonus

201-92-2498 52,000 .

482-87-7945 32,000 .

330-40-7172 49,000 .


Explicit character to numeric conversion2
Explicit Character-to-Numeric Conversion

data bonuses; set prog2.salary2; Bonus=.10*input(GrossPay,comma6.);run;proc print data=bonuses;run;

PROC PRINT Output

ID GrossPay Bonus

201-92-2498 52,000 5200

482-87-7945 32,000 3200

330-40-7172 49,000 4900

c05s5d2.sas


Data conversion2
Data Conversion

  • proc contents data=bonuses; run;

Partial PROC CONTENTS Output

----Alphabetic List of Variables and Attributes----

# Variable Type Len

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

3 Bonus Num 8

2 GrossPay Char 6

1 ID Char 11

How can you convert GrossPay to a numeric variable with the same name?


Data conversion3

This assignment statement does not change GrossPay from a character variableto a numeric variable.

Data Conversion

  • You cannot convert data by assigning the converted variable value to a variable with the same name.

GrossPay=input(GrossPay,comma6.);


Data conversion4

A variable is character or numeric. After the variable type is established, it cannot be changed.

Data Conversion

  • On the left side of the assignment statement, you want GrossPay to be numeric. However, on the right side of the assignment statement, GrossPay is character.

GrossPay=input(GrossPay,comma6.);


Data conversion5
Data Conversion

  • First, use the RENAME= data set option to rename the variable that you want to convert.

  • To rename more than one variable from the same data set, separate with a space the variables to rename.

SAS-data-set(RENAME=(old-name=new-name))

data bonuses; set prog2.salary2(rename=(GrossPay= CharGross));additional SAS statementsrun;


Data conversion6
Data Conversion

  • Second, use the INPUT function in an assignment statement to create a new variable whose name is the original name of the variable you renamed previously.

data bonuses; set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); additional SAS statementsrun;


Data conversion7
Data Conversion

  • Third, use a DROP= data set option in the DATA statement to exclude the original variable from the output SAS data set.

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;


Data conversion compilation

PDV

ID

$ 4

CharGross

$ 6

Data Conversion: Compilation

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

...


Data conversion compilation1

PDV

ID

$ 4

CharGross

$ 6

Data Conversion: Compilation

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

GrossPay

N 8

...


Data conversion compilation2
Data Conversion: Compilation

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

PDV

ID

$ 4

CharGross

$ 6

GrossPay

N 8

Bonus

N 8

...


Data conversion compilation3

PDV

ID

$ 4

CharGross

$ 6

GrossPay

N 8

Bonus

N 8

D

Data Conversion: Compilation

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;


Converting character dates to sas dates
Converting Character Dates to SAS Dates

  • prog2.born

    How can you alter this program to compute each person's age based on today's date?

Name Date

$12 $7

Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71

data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.);

Age=int(yrdif(Birthday,'3may2008'd, 'ACT/ACT'));run;


Converting character dates to sas dates1
Converting Character Dates to SAS Dates

  • prog2.born

    Use the TODAY() function.

Name Date

$12 $7

Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71

data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.);

Age=int(yrdif(Birthday, today(), 'ACT/ACT'));run;


Converting character dates to sas dates2
Converting Character Dates to SAS Dates

  • proc print data=birth noobs;run;

PROC PRINT Output

Name Birthday Age

Ruth, G. H. 4486 36

Delgado, Ed 3159 39

Overby, Phil 4176 36


Automatic numeric to character conversion
Automatic Numeric-to-Character Conversion

  • The prog2.phones data set contains a numeric variable Code (area code) and a character variable Telephone (telephone number). Create a character variable that contains the area code in parentheses followed by the telephone number.


Automatic numeric to character conversion1
Automatic Numeric-to-Character Conversion

  • prog2.phones

Code Telephone

N8 $8

303 393-0956

919 770-8292 301 449-5239

data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;


Automatic numeric to character conversion2
Automatic Numeric-to-Character Conversion

  • Partial Log

13 data phonenumbers;

14 set prog2.phones;

15 Phone='(' !! Code !! ') ' !! Telephone;

16 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

15:17

NOTE: The data set WORK.PHONENUMBERS has 3 observations and 3 variables.


Automatic numeric to character conversion3
Automatic Numeric-to-Character Conversion

  • proc print data=phonenumbers noobs; run;

PROC PRINT Output

Code Telephone Phone

303 393-0956 ( 303) 393-0956

919 770-8292 ( 919) 770-8292

301 449-5239 ( 301) 449-5239


Automatic numeric to character conversion4
Automatic Numeric-to-Character Conversion

  • SAS automatically converts a numeric value to a character value when the numeric value is used in a character context, such as

    • assignment to a character variable

    • a concatenation operation

    • a function that accepts character arguments.


Automatic numeric to character conversion5

Numeric

value:8 bytes

Character

value:

12 bytes

303

303

9 leading

blanks

Automatic Numeric-to-Character Conversion

  • The automatic conversion

    • uses the BEST12. format

    • right-aligns the resulting character value.

Automatic conversion


Automatic numeric to character conversion6

9 leading

blanks

Automatic Numeric-to-Character Conversion

data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;

Phone

$23

( 303) 393-0956


The put function
The PUT Function

  • The PUT function writes values with a specific format.

  • The PUT function returns the value produced when source is written with format.

CharVar=PUT(source,format);


The put function1
The PUT Function

  • data conversion; NVar1=614; NVar2=55000; NVar3=366; CVar1=put(NVar1,3.); CVar2=put(NVar2,dollar7.); CVar3=put(NVar3,date9.); run;


The put function2
The PUT Function

The VARNUM option in the PROC CONTENTS statement prints a list of the variables by their logical position in the data set.

  • Partial PROC CONTENTS Output

proc contents data=conversion varnum;run;

-----Variables Ordered by Position-----

# Variable Type Len

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 NVar1 Num 8

2 NVar2 Num 8

3 NVar3 Num 8

4 CVar1 Char 3

5 CVar2 Char 7

6 CVar3 Char 9


The put function3
The PUT Function

  • proc print data=conversion noobs; run;

PROC PRINT Output

NVar1 NVar2 NVar3 CVar1 CVar2 CVar3

614 55000 366 614 $55,000 01JAN1961


Explicit numeric to character conversion
Explicit Numeric-to-Character Conversion

  • data phonenumbers; set prog2.phones; Phone='(' !! put(Code,3.) !! ') ' !! Telephone;run;

Partial Log

20 data phonenumbers;

21 set prog2.phone;

22 Phone='(' !! put(Code,3.) !! ') ' !! Telephone;

23 run;

NOTE: The data set WORK.PHONENUMBERS has 3 observations and 3 variables.


Automatic numeric to character conversion7
Automatic Numeric-to-Character Conversion

  • proc print data=phonenumbers noobs; run;

PROC PRINT Output

Code Telephone Phone

303 393-0956 (303) 393-0956

919 770-8292 (919) 770-8292

301 449-5239 (301) 449-5239


Exercise4
Exercise

  • This exercise reinforces the concepts discussed previously.


Exercises

Use the data set prog.students to create a new data set named students. Create a new character variable Telephone that has this pattern xxx-xxxx, where XXXXXXX is the value of Number.

Recall the previous program and alter it to create a new numeric variable Birthday from the DOB variable. Birthday should contain the MMDDYY10. format for SAS data values.

When you are confident that both variables were converted correctly, use a DROP= or KEEP= data set option to ensure that the only variables in the students data set are SSN, Telephone, and Birthday.


Exercises

libname prog 'SAS-directory';

data students(drop=Number DOB);

set prog.students;

/* The PUT function is used to convert NUMBER from numeric to character. The resulting character value is manipulated with

the SUBSTR function to extract the first 3 characters then the last 4 characters. */

Telephone=substr(put(Number,7.),1,3) || '-' ||

substr(put(Number,7.),4);

/* The INPUT function is used to convert DOB from character to numeric. The character values are in the form ddMMMyyyy, so the DATE9. informat is used in the conversion. */

Birthday=input(DOB,date9.);

format Birthday mmddyy10.;

run;

proc print data=students;

run;


Exercises

The SAS System

SSN Telephone Birthday

012-40-4928 546-7887 12/05/1968

012-83-3816 688-8321 05/03/1965

341-44-0781 941-8123 11/23/1972

423-01-7721 783-9191 06/28/1967

448-23-8111 942-8122 11/30/1960

723-14-8422 828-0911 02/12/1964

819-32-1294 387-8181 09/01/1968

831-34-2411 967-7810 12/24/1972

837-33-8374 992-7615 10/06/1971

877-22-7731 233-7449 07/08/1969


Exercises

  • Use the data set agents to create a data set called work.agents2. Create a variable called TrueLocation that puts the City and State together, separated by a comma. (Desired output is shown on the next slide.)

  • Things to think about:

    • Do you need to do this for all observations?

    • If your output looks unexpected, try using the $QUOTE. format. (You do not need to specify a width.)


Exercises

Work.Agents Output

TrueLocation CityCountry State

Auckland, New Zealand Auckland, New Zealand

Amsterdam, Netherlands Amsterdam, Netherlands

Anchorage, Alaska Anchorage, USA Alaska

Canberra, Australia Canberra, Australia Australian Capital

Athens (Athinai), Greece Athens (Athinai), Greece

Birmingham, Alabama Birmingham, USA Alabama

Bangkok, Thailand Bangkok, Thailand

Nashville, Tennessee Nashville, USA Tennessee

Boston, Massachusetts Boston, USA Massachusetts

Kansas City, Missouri Kansas City, USA Missouri


Exercises

libname prog2 'SAS-directory';

data work.agents2 (drop=country);

set prog2.agents;

length Country $ 20 TrueLocation $ 40;

Country=left(scan(CityCountry,2,','));

if Country='USA' then /* Note the space before USA */

TrueLocation = scan(CityCountry,1,',')

!! ', ' !! State;

else /* not USA */

TrueLocation = CityCountry;

run;

proc print data=work.agents2 noobs;

var TrueLocation CityCountry State;

run;


ad