chapter 14 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 14 PowerPoint Presentation
Download Presentation
Chapter 14

Loading in 2 Seconds...

play fullscreen
1 / 243

Chapter 14 - PowerPoint PPT Presentation


  • 363 Views
  • Uploaded on

Chapter 14. Combining SAS ® Data Sets. Section 14.1. Concatenating SAS Data Sets. Objectives. Define concatenation. Use the SET statement in a DATA step to concatenate two or more SAS data sets. Use the RENAME= data set option to change the names of variables.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 14' - hailey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 14

Chapter 14

Combining SAS® Data Sets

section 14 1

Section 14.1

Concatenating SAS

Data Sets

objectives
Objectives
  • Define concatenation.
  • Use the SET statement in a DATA step to concatenate two or more SAS data sets.
  • Use the RENAME= data set option to change the names of variables.
  • Use the SET and BY statements in a DATA step to interleave two or more SAS data sets.
combining sas data sets

Data Set 1

Data Set 2

Data Set 3

Combining SAS Data Sets
  • In this chapter, you will look at combining data sets in two ways.

Joining multiple data sets side by side (merging)

Stacking multiple data sets

(concatenation)

Data Set 1

Data Set 2

Data Set 3

Data

Set 3

Data Set 3

Rows might or might not have matches in each data set.

combining sas data sets1

Data Set 1

Data Set 2

Data Set 3

Combining SAS Data Sets
  • In this chapter, you will look at combining data sets in two ways, concatenation and merging.

Stacking multiple data sets

(concatenation)

Data

Set 3

concatenating sas data sets
Concatenating SAS Data Sets
  • Use the SET statement in a DATA step to concatenate SAS data sets.
  • General form of a DATA step concatenation:
  • Data set names are separated by a space in the SET statement.

DATA SAS-data-set; SET SAS-data-set1 SAS-data-set2 . . .;

<other SAS statements>RUN;

concatenating sas data sets1

jan

feb

mar

Concatenating SAS Data Sets
  • You can read any number of SAS data sets with a single SET statement.

SAS data sets

work.qtr1

jan

data work.qtr1;

set work.janwork.febwork.mar;

run;

feb

mar

...

business task
Business Task
  • Two SAS data sets, na1 and na2, contain data for newly hired navigators.
  • Concatenate the data sets into a new data set named newhires.
  • The data sets contain the same variables. You will see what happens when there are different variables.

na1

na2

concatenating sas data sets compilation

na1

na2

PDV

data newhires;

set na1 na2;

run;

Concatenating SAS Data Sets: Compilation

...

concatenating sas data sets compilation1

na1

na2

PDV

data newhires;

set na1 na2;

run;

No additional

variables

Concatenating SAS Data Sets: Compilation

...

concatenating sas data sets compilation2
Concatenating SAS Data Sets: Compilation

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

...

concatenating sas data sets execution
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

Initialize PDV

to missing

...

concatenating sas data sets execution1
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

setna1 na2;

run;

newhires

...

concatenating sas data sets execution2
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

concatenating sas data sets execution3

Automatic return

Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

Automatic output

newhires

...

concatenating sas data sets execution4

Automatic return

Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

setna1 na2;

run;

Automatic output

newhires

...

concatenating sas data sets execution5

Automatic return

Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

setna1 na2;

run;

Automatic output

newhires

...

concatenating sas data sets execution6
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

End of

File

...

concatenating sas data sets execution7
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

Initialize PDV to missing

newhires

When SAS moves to a new data set, the values in the PDV are reinitialized to missing.

...

concatenating sas data sets execution8

Automatic output

Automatic return

Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

...

concatenating sas data sets execution9

Automatic output

Automatic return

Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

newhires

...

concatenating sas data sets execution10
Concatenating SAS Data Sets: Execution

na1

na2

PDV

data newhires;

set na1 na2;

run;

End of File

newhires

...

concatenating sas data sets execution11
Concatenating SAS Data Sets: Execution

When SAS reaches the end of the file on the last data set, DATA step execution ends.

What if the data sets contain different variables?

newhires

business task1
Business Task
  • Two SAS data sets, fa1 and fa3, contain data for newly hired flight attendants.
  • Concatenate the data sets into a new data set named newfa.
  • The data sets contain similar data, but each has one unique variable (JobCode versus Country).

fa1

fa3

concatenating sas data sets compilation3
Concatenating SAS Data Sets: Compilation

fa3

fa1

PDV

data newfa;

set fa1 fa3;

run;

...

concatenating sas data sets compilation4
Concatenating SAS Data Sets: Compilation

fa3

fa1

PDV

data newfa;

set fa1 fa3;

run;

Add additional

variable.

...

concatenating sas data sets compilation5
Concatenating SAS Data Sets: Compilation

fa3

fa1

PDV

data newfa;

set fa1 fa3;

run;

newfa

...

concatenating sas data sets execution12
Concatenating SAS Data Sets: Execution

fa3

fa1

PDV

data newfa;

set fa1 fa3;

run;

newfa

...

concatenating sas data sets execution13

Remember

Concatenating SAS Data Sets: Execution
  • When SAS moves from one data set to another, the values in the PDV are reinitialized to missing.
concatenating sas data sets execution14
Concatenating SAS Data Sets: Execution

If SAS did not re-initialize the PDV when it moved from one data set to another, then JobCode would contain previousdata for the rows from fa2.

fa2

fa1

newfa

...

business task2
Business Task
  • Two SAS data sets, fa1 and fa2, contain data for newly hired flight attendants.
  • Concatenate the data sets into a new data set named newfa.

fa1

fa2

The data sets contain similar data, but the column names are not the same.

concatenating sas data sets execution15
Concatenating SAS Data Sets: Execution

fa2

fa1

JobCode and JCode represent the same data and should be in one column.

data newfa;

set fa1 fa2;

run;

newfa

...

the rename data set option
The RENAME= Data Set Option
  • You can use a RENAME= data set option to change the name of a variable.
  • General form of the RENAME= data set option:

SAS-data-set(RENAME=(old-name-1=new-name-1

old-name-2=new-name-2.

.

.

old-name-n=new-name-n))

the rename data set option1
The RENAME= Data Set Option

fa1

fa2

There are two sets of parentheses that are required. The outer set is for the data set options, such as RENAME= and KEEP= for fa2. They appear inside the outer set of parentheses.

data newfa;

set fa1fa2(rename=(JCode=JobCode));

run;

the rename data set option2
The RENAME= Data Set Option

fa1

fa2

The inner parentheses surround all the variables that you want to rename. Separate each variable to rename with a space. For example, rename Gender to Sex in fa2.

data newfa;

set fa1fa2(rename=(JCode=JobCode Gender=Sex));

run;

the rename data set option3
The RENAME= Data Set Option

fa1

fa2

The equals sign is part of the RENAME= option.

data newfa;

set fa1fa2(rename=(JCode=JobCode));

run;

New Name

Old Name

the rename data set option4
The RENAME= Data Set Option

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

run;

PDV

...

the rename data set option5
The RENAME= Data Set Option

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

run;

PDV

SAS now knows JCode to be JobCode, which was already created in the PDV.

Reference JobCode, not JCode, in your DATA step.

...

the rename data set option6
The RENAME= Data Set Option

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

run;

newfa

What if you want JCode instead of JobCode?

How would you modify the program?

...

the rename data set option7
The RENAME= Data Set Option

fa1

fa2

data newfa;

set fa1(rename=(JobCode=JCode)) fa2;

run;

newfa

The RENAME= option is tangent to the data set that you want to change.

...

exercise
Exercise
  • This exercise reinforces the concepts discussed previously.
slide42

Exercises

Principal Skinner has seven data sets containing student elective choices, one for each guidance counselor. She wants one data set, StudentElectives, that contains each row from all seven data sets.

The data sets to read from are CraigStudents, LopezStudents, ShahStudents, LiStudents, RobbinsStudents, PerryStudents, and GreeneStudents.

The data set needs to be in sorted order by student ID number and should contain StudentID,Name,and Choice1 columns.

Produce a listing report with an appropriate title. Produce a frequency report analyzing Choice1 with an appropriate title. (Suppress page numbers and the date and time, as well as cumulative statistics.)

Save your program as skinner.sas to be used later.

slide43

Exercises

data StudentElectives;

set prog2.CraigStudents prog2.LopezStudents

prog2.ShahStudents prog2.LiStudents

prog2.RobbinsStudents prog2.PerryStudents

prog2.GreeneStudents;

drop Choice2 Choice3;

*Keep StudentID Name Choice1;

run;

proc sort data= StudentElectives;

by StudentID;

run;

options nodate nonumber;

title 'Student First Choice Electives';

proc print data= StudentElectives;

run;

title 'Frequency Counts for First Choice Electives';

proc freq data= StudentElectives;

tables Choice1 /nocum;

run;

slide44

Exercises

Partial PROC PRINT Output

Student First Choice Electives

Student

Obs ID Name Choice1

1 1005 Chaz Richardson SAT Verbal/Math Prep

2 1154 Barbara Muir Conversations in Diversity

3 1155 Angel Reisman Advanced Shop

4 1245 Leticia Ritter African American Literature

5 1257 Richard Calle Astronomy

6 1258 Ronnie Trimpin Advanced Functions and Modeling

.

.

.

108 9940 Moises Curbelo Astronomy

109 9942 Brooke Sears Creative Writing

110 9961 Darnell Lowrance Marine Ecology

111 9962 Jamie Rowan Advanced Shop

112 9995 Kelvin Garcia Visual Arts

113 9997 Tiffany Pierre Visual Arts

114 9999 Randy Leifer Sculpture and Ceramics

slide45

Exercises

PROC FREQ Output

Frequency Counts for First Choice Electives

The FREQ Procedure

Choice1 Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Advanced Functions and Modeling 5 4.39

Advanced Shop 7 6.14

African American Literature 2 1.75

Anatomy and Physiology 2 1.75

Astronomy 9 7.89

Band 7 6.14

Conversations in Diversity 6 5.26

Creative Writing 3 2.63

Environmental Science 7 6.14

Home Economics 2 1.75

Intro to Communications and Mass Media 6 5.26

Java Programming 3 2.63

Law and Justice 3 2.63

Marine Ecology 6 5.26

Newspaper 6 5.26

SAS Programming 4 3.51

SAT Verbal/Math Prep 9 7.89

Sculpture and Ceramics 3 2.63

Textile Art 3 2.63

Theatre Workshop 7 6.14

Trends in Young Adult Literature 6 5.26

Visual Arts 3 2.63

Yearbook 5 4.39

slide46

Exercises

Open the program Lengths.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires.

The two data sets have the same variables, but the variable Name has a different length in each data set.

Answer the questions on the following slide as they relate to the newhires data set.

slide47

Exercises

  • Will the DATA step run successfully or will there be an error?
  • If it runs successfully:
    • What do you think the length of Name will be in the new data set? Why?
    • Submit the program to verify your answer.
    • Modify the last DATA step to put NA2beforeNA1 in the SET statement.
    • What do you think the length of Name will be in the new data set? Why?
    • Submit the program to verify your answer.
slide48

Exercises – Different Lengths

  • Did the DATA step run successfully or was there an error?
    • It ran successfully.
  • What is the length of Name in the new data set? Why?
    • The length of Name was 4, the length from the first data set in the SET statement (NA1).
  • After DATA step modification:
    • The length of Name is 6, the length from the first data set in the SET statement (NA2).
  • Why?
    • Because at compile time, SAS creates a spot in the PDV and assigns variable attributes such as name, type, and length in the order that it sees them. SAS sees the data set NA1 or NA2 in the SET statement first, so Name is assigned the length from that data set.
slide49

Exercises

Open the program Formats.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires.

The two data sets have the same variables, but the variable Name has a different format in each data set.

Answer the questions on the following slide as they relate to the newhires data set.

slide50

Exercises

  • Will the DATA step run successfully or will there be an error?
  • If it successfully runs:
    • What do you think the format of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
    • Modify the last DATA step to put NA2beforeNA1 in the SET statement.
    • What do you think the format of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
slide51

Exercises – Different Formats

  • Did the DATA step run successfully or was there an error?
    • It ran successfully.
  • What is the format of Salary in the new data set? Why?
    • The format of Salary was DOLLAR10., the format from the first data set in the SET statement (NA1).
  • After DATA step modification:
    • The format of Salary is DOLLAR12.2, the format from the first data set in the SET statement (NA2).
  • Why?
    • Because at compile time, SAS creates a spot in the PDV and assigns variable attributes such as name, type, and length in the order that it sees them. SAS sees the data set NA1 or NA2 in the SET statement first, so Salary is assigned the format from that data set.
slide52

Exercises

Open the program Labels.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires.

The two data sets have the same variables, but the variable Name has a different label in each data set.

Answer the questions on the following slide as they relate to the newhires data set.

slide53

Exercises

  • Will the DATA step run successfully or will there be an error?
  • If it successfully runs:
    • What do you think the label of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
    • Modify the last DATA step to put NA2beforeNA1 in the SET statement.
    • What do you think the label of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
slide54

Exercises – Different Labels

  • Did the DATA step run successfully or was there an error?
    • It ran successfully.
  • What is the label for Salary in the new data set? Why?
    • The label of Salary is Annual Salary, the label from the first data set in the SET statement (NA1).
  • After DATA step modification:
    • The label of Salary is Yearly Salary, the label from the first data set in the SET statement (NA2).
  • Why?
    • Because at compile time, SAS creates a spot in the PDV and assigns variable attributes such as name, type, and length in the order that it sees them. SAS sees the data set NA1 or NA2 in the SET statement first, so Salary is assigned the label from that data set.
slide55

Exercises

Open the program types.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires.

The two data sets have the same variables, but the variable Name has a different data type in each data set.

Answer the questions on the following slide as they relate to the newhires data set.

slide56

Exercises

  • Will the DATA step run successfully or will there be an error?
  • If it successfully runs:
    • What do you think the type of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
    • Modify the last DATA step to put NA2beforeNA1 in the SET statement.
    • What do you think the type of Salary will be in the new data set? Why?
    • Submit the program to verify your answer.
slide57

Exercises – Different Types

  • Did the DATA step run successfully or was there an error?
    • It stopped due to an error.
  • What is the type for Salary in the new data set? Why?
    • Character is the first type of Salary that SAS sees. However, there are no rows in the data set because of an error at compile time.
  • Why did a difference in type cause an error?
    • At compile time, SAS creates a spot in the PDV and assigns variable attributes such as name, type, and length in the order that it sees them.
    • SAS cannot resolve the type for Salary because the data types require different processing. Variables must be the same data type.
concatenating data
Concatenating Data
  • If both data sets are sorted before you combine them, they might not be sorted after the concatenation.
  • You can sort the resulting data set, but it is more efficient to preserve the sorted order as you combine them.
interleaving sas data sets
Interleaving SAS Data Sets
  • Use the SET statement with a BY statement in a DATA step to interleave SAS data sets.
  • General form of a DATA step interleave:

DATA SAS-data-set;SET SAS-data-set1 SAS-data-set2 . . . ;

BYBY-variable;

<other SAS statements>RUN;

interleaving sas data sets1
Interleaving SAS Data Sets
  • Interleaving SAS data sets concatenates SAS data sets so that the observations in the resulting data set are in order.

ia.miamiemp

The original data sets must be in order.

work.allemp

ia.parisemp

data work.allemp;

set ia.miamiemp

ia.parisemp

ia.romeemp;

by ID;

run;

ia.romeemp

...

interleaving sas data sets2
Interleaving SAS Data Sets
  • Interleave the fa1 and fa2 data sets by Name.

fa1

fa2

The data sets must be sorted first.

proc sort data=fa1;

by name;

run;

proc sort data=fa2;

by name;

run;

...

interleaving sas data sets3
Interleaving SAS Data Sets

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

by Name;

run;

Which value comes first?

newfa

GRANT

PDV

...

interleaving sas data sets4

Remember

Interleaving SAS Data Sets

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

by Name;

run;

Which value comes first?

newfa

JONES

PDV

When SAS moves to a different data set, it initializes the PDV to missing before loading the JONES observation.

...

interleaving sas data sets5
Interleaving SAS Data Sets

fa1

fa2

data newfa;

set fa1 fa2(rename=(JCode=JobCode));

by Name;

run;

newfa

...

exercise1
Exercise
  • This exercise reinforces the concepts discussed previously.
slide66

Exercises

Principal Skinner is a very efficient person. She wants you to modify your program so that the rows are concatenated using the interleaving technique.

Open the program named skinner.sas that you saved earlier. Modify the program so that the StudentElectives data set is created by preserving sorted order by StudentID.

slide67

Exercises

proc sort data=prog2.CraigStudents out=CraigStudents;

by StudentID;

run;

proc sort data=prog2.LopezStudents out=LopezStudents;

by StudentID;

run;

proc sort data=prog2.ShahStudents out=ShahStudents;

by StudentID;

run;

proc sort data=prog2.LiStudents out=LiStudents;

by StudentID;

run;

proc sort data=prog2.RobbinsStudents out=RobbinsStudents;

by StudentID;

run;

proc sort data=prog2.PerryStudents out=PerryStudents;

by StudentID;

run;

proc sort data=prog2.GreeneStudents out=GreeneStudents;

by StudentID;

run;

The data sets must be sorted.

continued...

slide68

Exercises

 Changes are in red and the PROC SORT after the DATA step was removed.

data StudentElectives;

set CraigStudents LopezStudents ShahStudents

LiStudents RobbinsStudents PerryStudents

GreeneStudents;

by StudentID;

drop Choice2 Choice3;

* Keep StudentID Name Choice1;

run;

options nodate nonumber;

title 'Student First Choice Electives';

proc print data= StudentElectives;

run;

title 'Frequency Counts for First Choice Electives';

proc freq data= StudentElectives;

tables Choice1 /nocum;

run;

slide69

Exercises

Partial PROC PRINT Output

Student First Choice Electives

Student

Obs ID Name Choice1

1 1005 Chaz Richardson SAT Verbal/Math Prep

2 1154 Barbara Muir Conversations in Diversity

3 1155 Angel Reisman Advanced Shop

4 1245 Leticia Ritter African American Literature

5 1257 Richard Calle Astronomy

6 1258 Ronnie Trimpin Advanced Functions and Modeling

.

.

.

108 9940 Moises Curbelo Astronomy

109 9942 Brooke Sears Creative Writing

110 9961 Darnell Lowrance Marine Ecology

111 9962 Jamie Rowan Advanced Shop

112 9995 Kelvin Garcia Visual Arts

113 9997 Tiffany Pierre Visual Arts

114 9999 Randy Leifer Sculpture and Ceramics

exercise section 14 1
Exercise – Section 14.1
  • This exercise reinforces the concepts discussed previously.
section 14 2

Section 14.2

Merging SAS Data Sets

objectives1
Objectives
  • Prepare data for merging using the SORT procedure and data set options.
  • Merge SAS data sets on a single common variable.
combining sas data sets2
Combining SAS Data Sets
  • Look at combining data sets horizontally, or merging data.

Joining multiple data sets side by side (merging)

Start with data sets where all rows have matches and when there are no matches.

Data Set 1

Data Set 2

Data Set 3

Data Set 3

Rows might or might not have matches in each data set.

merging data sets

DATA Step

Merging Data Sets
  • A merge combines two or more existing data sets by joining observations side-by-side.
match merge

DATA Step

Match-Merge
  • The most common type of merge is a match-merge, which uses a common variable to join observations.
merging sas data sets

Remember

Merging SAS Data Sets
  • Use the MERGE statement in a DATA step to join corresponding observations from two or more SAS data sets.
  • General form of a DATA step match-merge:

DATA SAS-data-set;MERGESAS-data-sets;BY BY-variable(s); <other SAS statements>RUN;

In order to use a BY statement in the DATA step, the data must be sorted by that variable.

merging sas data sets1
Merging SAS Data Sets
  • You can merge data sets without a BY statement. Using the MERGE statement without a BY statement results in a blind merge.
  • SAS matches the first row from data set 1 to the first row in data set 2, the second row from data set 1 to the second row in data set 2, and so on.

DATA SAS-data-set;MERGESAS-data-sets; <other SAS statements>RUN;

merging sas data sets2
Merging SAS Data Sets
  • You can read any number of SAS data sets with a single MERGE statement.

SAS data sets

costs

sales

goals

taxes

data compare;

merge costssalesgoals taxes;

by Month;

run;

compare

costs

sales

goals

taxes

business task3
Business Task
  • International Airlines is comparing monthly sales performance to monthly sales goals.
  • The sales and goals data are stored in separate SAS data sets.
business task4
Business Task

To calculate the difference between revenues and goals, the performance and goals data sets must be merged.

ia.performance

ia.goals

Match-merge the data sets by Month and compute the difference between the variable values for Sales andGoal.

ia.compare

merging sas data compilation

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Compilation

ia.performance

ia.goals

PDV

...

merging sas data compilation1

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Compilation

ia.performance

ia.goals

PDV

...

merging sas data compilation2
Merging SAS Data: Compilation

ia.performance

ia.goals

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Month already exists in the PDV, so SAS does not re-create it.

PDV

...

merging sas data compilation3
Merging SAS Data: Compilation

ia.performance

ia.goals

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

PDV

...

merging sas data compilation4
Merging SAS Data: Compilation

ia.performance

ia.goals

data ia.compare; merge ia.performance ia.goals; by Month;Difference=Sales-Goal;run;

PDV

...

merging sas data compilation5
Merging SAS Data: Compilation

ia.performance

ia.goals

data ia.compare; merge ia.performanceia.goals; by Month; Difference=Sales-Goal;run;

PDV

Descriptor

portion

created

ia.

compare

merging sas data execution
Merging SAS Data: Execution

ia.

goals

ia.

performance

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

PDV

Initialize PDV to missing

merging sas data execution1

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

Do the BY variables match?

YES

PDV

...

merging sas data execution2

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

performance

ia.

goals

Read 1 obs from each matching data set.

Month=1

PDV

...

merging sas data execution3

data ia.compare; merge ia.performance ia.goals; by Month;Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

performance

ia.

goals

PDV

...

merging sas data execution4

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Automatic return

Automatic output

ia.

compare

Merging SAS Data: Execution

ia.

goals

ia.

performance

PDV

...

merging sas data sets execution
Merging SAS Data Sets: Execution
  • SAS reinitializes only the newvariables in the PDV, not the variables that were read from the data sets listed in the MERGE statement.
merging sas data execution5
Merging SAS Data: Execution

ia.

goals

ia.

performance

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

PDV

Reinitialize new variables to missing

merging sas data execution6

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

performance

ia.

goals

Do the BY variables match?

YES

PDV

...

merging sas data execution7

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

performance

ia.

goals

Did the BY variable change?

YES

PDV

Reinitialize rest of PDV to missing.

...

merging sas data sets execution1
Merging SAS Data Sets: Execution
  • SAS reinitializes only the new variables in the PDV, not the variables that were read from the data sets listed in the MERGE statement.
  • If the BY variable does not match what is in the PDV, the entire PDV is initialized.
merging sas data execution8

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

Read 1 obs from each matching data set.

Month=2

PDV

...

merging sas data execution9

data ia.compare; merge ia.performance ia.goals; by Month;Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

PDV

merging sas data execution10

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Automatic return

PDV

Automatic output

Merging SAS Data: Execution

ia.

goals

ia.

performance

ia.

compare

...

merging sas data execution11
Merging SAS Data: Execution

ia.

goals

ia.

performance

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

PDV

Reinitialize new variables to missing.

merging sas data execution12

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

Does the BY variable match?

YES

PDV

...

merging sas data execution13

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

Did the BY variable change?

YES

PDV

Reinitialize rest of PDV to missing.

...

merging sas data execution14

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

Read 1 obs from each matching data set.

Month=3

PDV

...

merging sas data execution15

data ia.compare; merge ia.performance ia.goals; by Month;Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

PDV

...

merging sas data execution16

data ia.compare; merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Automatic return

Automatic output

Merging SAS Data: Execution

ia.

goals

ia.

performance

PDV

ia.

compare

...

merging sas data execution17

data ia.compare;merge ia.performance ia.goals; by Month; Difference=Sales-Goal;run;

Merging SAS Data: Execution

ia.

goals

ia.

performance

End of

File

End of

File

PDV

ia.

compare

exercise2
Exercise
  • This exercise reinforces the concepts discussed previously.
slide108

Exercises

  • Mr. Shah stores his data in four different data sets:
    • ShahMaster - contains student IDs and names
    • Shah1 - contains student IDs and 1st Elective Choice
    • Shah2 - contains student IDs and 2nd Elective Choice
    • Shah3- contains student IDs and 3rd Elective Choice
  • He decided that this is not the best way to store the data.
  • Create a data set called myStudents that matches all of Mr. Shah’s students and electives.
  • Create a listing report of all of Mr. Shah’s students with no page numbers and no date/time.
slide109

Exercises

/* Data sets are all sorted by Student ID. You can add a PROC SORT step for each data set.*/

data mystudents;

merge prog2.ShahMaster prog2.Shah1

prog2.Shah2 prog2.Shah3;

by StudentID;

run;

options nodate nonumber ls=90;

proc print data=mystudents;

run;

slide110

Exercises - Partial Output

Student

Obs ID Name Choice1

1 2001 Troy Pruska Advanced Shop

2 2335 Taylor Lowet Band

3 2584 Patrice Ray Java Programming

4 3259 Marie Hibara Theatre Workshop

5 3489 Evelyn Hehmann Intro to Communications and Mass Media

6 4433 Cole Billy Yearbook

Obs Choice2 Choice3

1 Anatomy and Physiology Law and Justice

2 SAS Programming Band

3 African American Literature Law and Justice

4 SAT Verbal/Math Prep Theatre Workshop

5 Intro to Communications and Mass Media Java Programming

6 Creative Writing SAS Programming

7 Astronomy Band

match merging
Match-Merging
  • When you match-merge two or more data sets, it is common to have the following:
    • repeated BY values
    • non-matches
data step merge

A

ID

1

3

4

Age

31

42

21

Name

SAM

SUE

BOB

ID

1

2

4

DATA Step Merge

B

data new;

merge a b;

by ID;

run;

NEW

ID

1

2

3

4

Name

SAM

SUE

BOB

Age

31

42

21

business task5
Business Task
  • What if the data sets have rows that do not match?
  • Merge two data sets to acquire the names of the German crew who are scheduled to fly next week.

ia.gercrew

ia.gersched

business task6
Business Task

ia.gercrew

ia.gersched

To match-merge the data sets by EmpID, the data sets must be ordered by EmpID.

proc sort data=ia.gersched

out=work.gersched;

by EmpID;

run;

merging sas data execution18
Merging SAS Data: Execution

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

Initialize PDV to missing

...

merging sas data execution19
Merging SAS Data: Execution

ia. gercrew

work. gersched

Read 1 obs from each matching data set for first BY group.

First BY Group:EmpID=E00632

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

...

merging sas data execution20

Automatic return

Automatic output

work.

nextweek

Merging SAS Data: Execution

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

...

merging sas data execution21
Merging SAS Data: Execution

ia. gercrew

work. gersched

Does the BY variable match?

NO

Which one comes FIRST?

...

merging sas data execution22

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

Did the BY variable change?

YES

PDV

Reinitialize PDV to missing.

...

merging sas data execution23
Merging SAS Data: Execution

ia. gercrew

work. gersched

Read 1 obs from each matching data set for next BY group.

Next BY Group:EmpID=E01483

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

...

merging sas data execution24

Automatic return

Automatic output

Merging SAS Data: Execution

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

work.

nextweek

...

merging sas data sets execution2

Remember

Merging SAS Data Sets: Execution
  • SAS reinitializes only the new variables in the PDV, not the variables that were read from the data sets listed in the MERGE statement.
  • If the BY variable does not match what is in the PDV, the entire PDV is initialized.
merging sas data execution25
Merging SAS Data: Execution

ia. gercrew

work. gersched

PDV

work.

nextweek

If SAS did not reinitialize the PDV when the BY variables did not match, then observation 2 would have the value from the previous FlightNum (5250).

merging sas data execution26

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

Does the BY variable match?

YES

PDV

...

merging sas data execution27

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

Does the BY variable match what is in the PDV?

NO

PDV

Reinitialize PDV to missing.

...

merging sas data execution28
Merging SAS Data: Execution

ia. gercrew

work. gersched

Read 1 obs from each matching data set for next BY group.

Next BY Group:EmpID=E01996

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

...

merging sas data execution29

Automatic return

Automatic output

Merging SAS Data: Execution

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

work.

nextweek

...

merging sas data execution30

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

Do the BY variable match?

Yes

PDV

...

merging sas data execution31

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

Did the BY variable change?

Yes

PDV

Reinitialize PDV to missing.

...

merging sas data execution32
Merging SAS Data: Execution

ia. gercrew

work. gersched

Read 1 obs from each matching data set for next BY group.

Next BY Group:EmpID=E04064

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

...

merging sas data execution33

Automatic return

Automatic output

Merging SAS Data: Execution

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

PDV

work.

nextweek

...

merging sas data execution34

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

run;

Merging SAS Data: Execution

ia. gercrew

work. gersched

End of File

End of File

PDV

work.

nextweek

eliminating nonmatches
Eliminating Nonmatches
  • You want only the people who will to fly next week.
  • One person is not scheduled for next week.
  • Use IF-THEN logic to delete the row.

ia.gercrew

work.gersched

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

if FlightNum = ' ' then delete;

run;

eliminating nonmatches1
Eliminating Nonmatches

What if the row is in work.gersched (meaning they are scheduled to fly) but the flight number is not assigned?

ia.gercrew

work.gersched

The IF statement deletes a row that you might not want to remove.

data work.nextweek;

merge ia.gercrew work.gersched;

by EmpID;

if FlightNum = ' ' then delete;

run;

identifying data set contributors
Identifying Data Set Contributors
  • When you read multiple SAS data sets in one DATA step, you can use the IN= data set option to detect which data set contributed to an observation.
  • General form of the IN= data set option:
  • where variable is any valid SAS variable name.
  • The variable name can be any unique, valid variable name. The programmer must supply the variable name. SAS supplies the value.

SAS-data-set(IN=variable)

the in data set option
The IN= Data Set Option
  • variable is a temporary numeric variable with one of the following values:
  • 0 to indicate false. The data set did not contribute to the current observation.
  • 1 to indicate true. The data set did contribute to the current observation.

Variables created with IN= are automatically dropped from the output data set.

the in data set option1

D

D

The IN= Data Set Option

ia. gercrew

work. gersched

data work.combine;

merge ia.gercrew(in=InCrew)

work.gersched(in=InSched);

by EmpID;

run;

PDV

...

the in data set option2

D

D

The IN= Data Set Option

ia. gercrew

work. gersched

data work.combine;

merge ia.gercrew(in=InCrew)

work.gersched(in=InSched);

by EmpID;

run;

PDV

...

the in data set option3

D

D

The IN= Data Set Option

ia. gercrew

work. gersched

data work.combine;

merge ia.gercrew(in=InCrew)

work.gersched(in=InSched);

by EmpID;

run;

PDV

...

the in data set option4

D

D

The IN= Data Set Option

ia. gercrew

work. gersched

data work.combine;

merge ia.gercrew(in=InCrew)

work.gersched(in=InSched);

by EmpID;

run;

PDV

...

eliminating nonmatches2

D

Eliminating Nonmatches

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew

work.gersched(in=InSched);

by EmpID;

if InSched=1;

run;

True

PDV

work.

nextweek

...

eliminating nonmatches3

D

Eliminating Nonmatches

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew

work.gersched(in=inSched);

by EmpID;

if InSched=1;

run;

False

PDV

work.

nextweek

...

eliminating nonmatches4

D

Eliminating Nonmatches

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew

work.gersched(in=InSched);

by EmpID;

if InSched=1;

run;

True

PDV

work.

nextweek

...

eliminating nonmatches5

D

Eliminating Nonmatches

ia. gercrew

work. gersched

data work.nextweek;

merge ia.gercrew

work.gersched(in=InSched);

by EmpID;

if InSched=1;

run;

True

PDV

work.

nextweek

exercise3
Exercise
  • This exercise reinforces the concepts discussed previously.
slide146

Exercises

Create a data set called students_left that contains the students who are not advised by Mrs. Perry or Mr. Greene.

The other_GCs data set contains student IDs of students advised by Mrs. Perry and Mr. Greene. The data set prog2.schools contains all of the students.

The students_left data set should only contain student IDs and names. It should contain 84 observations.

Create a listing report with an appropriate title.

slide147

Exercises

proc sort data=prog2.other_GCs out=other_GCs;

by StudentID;

run;

data students_left;

merge prog2.schools(keep=studentID name)

other_GCs(in=A);

by StudentID;

if not A;

*if A=0; /* Another way to code it. */

run;

title 'Students not assigned to Perry and Greene';

proc print data=students_left;

run;

slide148

Exercises- Partial Output

Students not assigned to Perry and Greene

Student

Obs ID Name

1 1005 Chaz Richardson

2 1154 Barbara Muir

3 1155 Angel Reisman

4 1245 Leticia Ritter

5 1257 Richard Calle

6 1258 Ronnie Trimpin

.

.

.

81 9940 Moises Curbelo

82 9962 Jamie Rowan

83 9997 Tiffany Pierre

84 9999 Randy Leifer

exercise section 14 2
Exercise – Section 14.2
  • This exercise reinforces the concepts discussed previously.
section 14 3

Section 14.3

More Match-Merging Two or More SAS Data Sets

objectives2
Objectives
  • Perform a match-merge.
  • Perform explicit output for matching and non-matching observations.
match merging1
Match-Merging
  • The data set transact contains an account number and information on transactions for a week. The data set branches contain an account number and the branch location for that account.

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

desired output
Desired Output
  • The bank manager wants to see reports based on three data sets.
  • Goal: Create a data set named newtrans that shows this week's transactions.

Act

Num Trans Amnt Branch

56891 D 126.32 N. Lincoln

56891 C 560 N. Lincoln

58876 D 14.56 W. Argyle

59987 C 371.69 E. Wacker

desired output1
Desired Output
  • Goal: Create a data set named noactiv that shows accounts with no transactions this week.

Act

Num Branch

56900 S. Cicero

59900 N. Damen

desired output2
Desired Output
  • Goal: Create a data set named noacct that shows transactions with no matching account number.

Act

Num Trans Amnt

57900 C 235

using the in data set option
Using the IN= Data Set Option

data newtrans

noactiv (drop=Trans Amnt)

noacct (drop=Branch);

merge prog2.transact(in=InTrans)

prog2.branches(in=InBanks);

by ActNum;

additional SAS statements

run;

if the observation is a match
If the Observation Is a Match
  • Both transactand branchescontributed to the observation.
  • InTrans=1 and InBanks=1

prog2.transact

prog2.branches

Current Observation

...

if the observation is not a match
If the Observation Is Not a Match
  • branches contributed to the observation. transact did not. (The account had no transactions this week.)
  • InTrans=0 and InBanks=1

prog2.transact

prog2.branches

Current Observation

...

if the observation is not a match1
If the Observation Is Not a Match
  • transact contributed to the observation. branches did not. (A transaction occurred, but the account number was invalid.)
  • InTrans=1 and InBanks=0

prog2.transact

prog2.branches

Current Observation

...

using in to identify matches and non matches
Using IN= to Identify Matches and Non-Matches

data newtrans

noactiv (drop=Trans Amnt)

noacct (drop=Branch);

merge prog2.transact(in=InTrans)

prog2.branches(in=InBanks);

by ActNum;

if InTrans=1 and InBanks=1

then output newtrans;

else if InBanks=1 and InTrans=0

then output noactiv;

else if InBanks=0 and InTrans=1

then output noacct;

run;

using in to identify matches and non matches1
Using IN= to Identify Matches andNon-Matches
  • Alternate Solution

data newtrans

noactiv (drop=Trans Amnt)

noacct (drop=Branch);

merge prog2.transact(in=InTrans)

prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks

then output newtrans;

else if InBanks and not InTrans

then output noactiv;

else if intrans and not InBanks

then output noacct;

run;

slide162

Compile

BRANCH

INBANKS

INTRANS

ACTNUM

TRANS

AMNT

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

...

slide163

Execute

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

AMNT

INTRANS

INBANKS

0

0

.

...

slide164

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

Match

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

D

126.32

56891

...

slide165

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

D

126.32

56891

...

slide166

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide167

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

BRANCH

0

1

N. Lincoln

.

1

D

126.32

56891

...

slide168

BRANCH

ACTNUM

TRANS

Write out observation to newtrans.

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

True

Explicit Output

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide169

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Automatic Return

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide170

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

No Match

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide171

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

Which one comes first?

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide172

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

Did the BY variable

change?

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

slide173

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

NO.

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Do not reinitialize PDV.

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

126.32

1

56891

...

merging sas data sets execution3

Remember

Merging SAS Data Sets: Execution
  • SAS reinitializes only the new variables in the PDV, not the variables that were read from the data sets listed in the MERGE statement.
  • If the BY variable does notmatch what is in the PDV, the entire PDV is initialized.
  • If the BY variables do match, SAS will not reinitialize the PDV, enabling the current row to match to more than one row.
slide175

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

560

C

126.32

1

56891

...

slide176

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

560

C

126.32

1

56891

...

slide177

BRANCH

ACTNUM

TRANS

Write out observation to newtrans.

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

True

Explicit Output

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

560

C

126.32

1

56891

...

slide178

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Automatic Return

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

D

560

C

126.32

1

56891

...

slide179

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

No Match

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Which one comes first?

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

C

560

1

56891

...

slide180

BRANCH

ACTNUM

TRANS

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

No Match

Did the BY variable

change?

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

AMNT

0

0

.

1

N. Lincoln

C

560

1

56891

...

slide181

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

Yes

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Reinitialize PDV.

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

1

1

N. Lincoln

56891

C

560

.

0

0

...

slide182

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

Read from data set with lower BY value.

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

56900

1

S. Cicero

...

slide183

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

56900

1

S. Cicero

...

slide184

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

False

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

1

56900

S. Cicero

...

slide185

Write out observation to noactiv.

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

True

Explicit Output

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

1

56900

S. Cicero

...

slide186

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Automatic Return

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

1

56900

S. Cicero

...

slide187

Act

Num Trans Amnt

56891 D 126.32

56891 C 560

57900 C 235

58876 D 14.56

59987 C 371.69

Act

Num Branch

56891 N. Lincoln

56900 S. Cicero

58876 W. Argyle

59900 N. Damen

59987 E. Wacker

data newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);

merge prog2.transact(in=InTrans) prog2.branches(in=InBanks);

by ActNum;

if InTrans and InBanks then output newtrans;

else if InBanks and not InTrans then output noactiv;

else if InTrans and not InBanks then output noacct;

run;

Continue processinguntil end of both data sets.

INTRANS

INBANKS

ACTNUM

TRANS

AMNT

BRANCH

0

0

.

1

56900

S. Cicero

...

viewing only the matches
Viewing Only the Matches

proc print data=newtrans noobs;

run;

PROC PRINT Output

Act

Num Trans Amnt Branch

56891 D 126.32 N. Lincoln

56891 C 560 N. Lincoln

58876 D 14.56 W. Argyle

59987 C 371.69 E. Wacker

non matches from branches
Non-Matches from branches

proc print data=noactiv noobs;

run;

PROC PRINT Output

Act

Num Branch

56900 S. Cicero

59900 N. Damen

non matches from transact
Non-Matches from transact

proc print data=noacct noobs;

run;

PROC PRINT Output

Act

Num Trans Amnt

57900 C 235

exercise section 14 3
Exercise – Section 14.3
  • This exercise reinforces the concepts discussed previously.
section 14 4

Section 14.4

Many-to-Many Merges

objectives3
Objectives
  • Define types of DATA step merges.
  • Illustrate how the DATA step handles different types of merges.
other merges
Other Merges
  • In addition to one-to-one merges, the DATA step merge performs other types of data combinations:
merging
Merging

What is the result if the two data sets are merged by the variable Name?

work.one

work.two

merging1
Merging

By default, the DATA step merge will not match all the rows for Ben in data set one to all the rows for Ben in data set two.

work.one

work.two

...

cartesian product
Cartesian Product
  • All possible row combinations on the matching BY value are known as a modified Cartesian product.
  • This is not the default action of a SAS DATA step merge.
  • PROC SQL returns the modified Cartesian product.
slide198

Modified Cartesian PROC SQL

DefaultDATA Step Merge Results

the sas merge process review

Remember

The SAS Merge Process (Review)
  • How SAS processes when merging data:
    • Do the BY variable values match each other? If not, which one comes first?
    • Does that value match what is in the PDV?

If not, reinitialize the PDV.

    • Load the observation (or observations) into the PDV.
merging data
Merging Data

DATA step merge processing:

work.one

work.two

...

merging data1
Merging Data
  • Resulting Data Set
exercise4
Exercise
  • This exercise reinforces the concepts discussed previously.
slide203

Exercises

Create a data set called Holidays by merging the Symbol and Countrydata sets by the variable Holiday.

Save your code as holiday.sas to be used in the next section.

slide204

Exercises

Output

Obs Holiday Symbol Country

1 Christmas Christmas Tree Germany

2 Christmas Nativity Canada

3 Christmas gifts United States

4 Christmas gifts England

5 Christmas gifts Mexico

6 Mothers Day flowers United States

7 Mothers Day family Mexico

8 Mothers Day family Canada

9 Mothers Day family Holland

10 Mothers Day family Australia

11 New Years Day Confetti United States

12 New Years Day Clock Canada

13 New Years Day Fireworks Mexico

14 New Years Day Fireworks Australia

15 Thanksgiving Cornucopia Canada

16 Thanksgiving Pumpkin Pie Unites States

slide205

Exercises

proc sort data=prog2.symbol out=symbol;

by Holiday;

run;

proc sort data=prog2.country out=country;

by Holiday;

run;

data holiday;

merge symbol country;

by Holiday;

run;

proc print data=holiday;

run;

section 14 5

Section 14.5

Simple Joins Using the SQL Procedure

objectives4
Objectives
  • Perform an inner join using the SQL procedure.
joining data using proc sql
Joining Data Using PROC SQL
  • PROC SQL produces the modified Cartesian product that you saw in the previous section.

work.one

work.two

joining data using proc sql1
Joining Data Using PROC SQL
  • Recall the desired output:
the sql procedure
The SQL Procedure
  • The SQL procedure enables you to write ANSI standard SQL code in SAS to process SAS tables.
  • SQL is used in many relational databases, such as Oracle and Teradata.
  • ANSI, the American National Standards Institute
proc sql versus the data step benefits
PROC SQL versus the DATA Step: Benefits
  • The SQL procedure enables you to do the following:
    • join tables and produce a report in one step without creating a SAS data set
    • join tables without using an explicit PROC SORT
    • use complex matching criteria (not only an equijoin)
  • By default, PROC SQL returns a report, not a SAS data set. However, you can create a data set with PROC SQL.
proc sql versus data step costs
PROC SQL versus DATA Step: Costs
  • In general, the SQL procedure requires more CPU time and memory than a DATA step merge.
the sql procedure syntax overview
The SQL Procedure: Syntax Overview
  • The PROC SQL statement signals the start of the SQL procedure.

PROC SQL;

the sql procedure syntax overview1
The SQL Procedure: Syntax Overview
  • The QUIT statement ends the PROC SQL step.
  • PROC SQL supports run-group processing. The RUN statement has no effect on PROC SQL. You must use the QUIT statement to end the SQL step.

QUIT;

the sql procedure syntax overview2
The SQL Procedure: Syntax Overview
  • Statements within the SQL step (also called queries) are made of smaller building blocks called clauses.
  • The following clauses are discussed in this section:
    • SELECT
    • FROM
    • WHERE
  • There is one semicolon at the end of each complete query or SELECT statement. There are no semicolons after each clause.
proc sql
PROC SQL
  • The most basic SQL query includes the SELECT and FROM clauses.
  • The SELECT clause specifies the columns to display and the FROM clause specifies the table, or SAS data set, to read.
the select clause
The SELECT Clause
  • The SELECT clause identifies the columns to include in the query result or table.
  • Columns listed in the SELECT clause are separated by commas. There is no comma following the last variable in the list.
  • To select all columns, use an asterisk in place of the column names.

SELECTvar-1, var-2 ...

SELECT*

the from clause
The FROM Clause
  • The FROM clause identifies the SAS table(s) from which to read.

FROMSAS-data-set …

producing a report
Producing a Report
  • The following produce the same output.

proc print data=work.one;

var Name Number;

run;

proc sql;

select Name, Number

from work.one

;

quit;

using proc sql to join tables
Using PROC SQL to Join Tables
  • To join two or more SAS tables, list them in the FROM clause separated by commas.
  • General form of an SQL join:

PROC SQL;

SELECTvar-1, var-2…

FROMSAS-data-set-1,SAS-data-set-2...

;

joining two or more data sets
Joining Two or More Data Sets
  • Join work.one and work.two and create the modified Cartesian product.

proc sql;

select *

from work.one, work.two

;

quit;

sql joins without a where clause
SQL Joins without a WHERE Clause
  • An SQL join specified without a WHERE clause results in a Cartesian product. All possible row combinations are output.
  • In the example, the first row (Ben) would be matched to all the rows in the second data set.
  • You only want the rows where the Names match, or the modified Cartesian product.
sql join without a where clause
SQL Join without a WHERE Clause
  • Partial Output of Cartesian Product

Name Number Name Color

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Ben 7 Ben Blue

Ben 7 Ben Red

Ben 7 Becky Yellow

Ben 7 Becky Red

Ben 7 Eli Purple

Ben 7 Eli Green

Ben 7 Jim Brown

Ben 7 Jim Green

Ben 7 Jim Pink

Ben 17 Ben Blue

Ben 17 Ben Red

Ben 17 Becky Yellow

Ben 17 Becky Red

Ben 17 Eli Purple

Ben 17 Eli Green

sql join without a where clause1
SQL Join without a WHERE Clause
  • Name from both tables is selected. How can you select one Name in the output?
  • To specify a column from a table, use the table name as the table qualifier. To display Name from table one, specify table-name.Name.

proc sql;

select one.Name, Number, Color

from work.one,

work.two

;

quit;

sql join without a where clause2
SQL Join without a WHERE Clause
  • If the column name appears in both data sets, you must use the table qualifier with the column name.
  • Because Number and Color are unique columns (they appear in only one of the data sets), you do not need to use the table qualifier.

proc sql;

select one.Name, Number, Color

from work.one, work.two

;

quit;

the where clause
The WHERE Clause
  • In a join, the WHERE clause specifies the join criteria.
  • The WHERE expression is any valid SAS condition.

WHEREexpression

joining on a common variable
Joining on a Common Variable
  • The inner join returns only the rows with matching values of Name.
  • Name from table one will be matched with Name from table two.
identifying variables with the same names

Remember

Identifying Variables with the Same Names

proc sql;

select one.Name, Number, Color

from work.one, work.two

where one.Name = Two.Name

;

quit;

You do not need to use the table name as a prefix (table qualifier) if the column name appears in only one table.

assigning an alias for a sas table
Assigning an Alias for a SAS Table
  • You can also specify an alias (or nickname) for a SAS table. The alias replaces the table name as the column prefix.
  • An alias can be any valid SAS name.

FROMSAS-data-set-1 <AS> alias-1,

SAS-data-set-2 <AS> alias-2 ...

assigning an alias for a sas table1
Assigning an Alias for a SAS Table

proc sql;

select O.Name, Number, Color

from work.one as O,

work.two as T

where O.Name = T.Name

;

quit;

exercise5
Exercise
  • This exercise reinforces the concepts discussed previously.
slide232

Exercises

The table transact contains an account number and information on transactions for a week.

The table branches contains an account number and the branch location for that account.

The bank manager wants to see only the accounts that have valid transactions (only rows with matching values of ActNum).

slide233

Exercises

Solution

proc sql;

select T.ActNum, Trans,

Amnt, Branch

from prog2.transact as T,

prog2.branches as B

where T.ActNum = B.ActNum

;

quit;

slide234

Exercises

Output

ActNum Trans Amnt Branch

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

56891 D 126.32 N. Lincoln

56891 C 560 N. Lincoln

58876 D 14.56 W. Argyle

59987 C 371.69 E. Wacker

creating a table in proc sql
Creating a Table in PROC SQL
  • To create a table instead of a report, use the CREATE TABLE clause before the SELECT clause.
  • SAS will not send output to the OUTPUT window because of the CREATE TABLE clause.
  • You want to use another SQL query or PROC PRINT to display the table.

CREATE TABLE table-name AS

creating a table in proc sql1
Creating a Table in PROC SQL

proc sql;

create table work.three as

select O.Name, Number, Color

from work.one as O,

work.two as T

where O.Name=T.Name

;

quit;

/* View the created table */

proc print data=three;

run;

creating a table in proc sql2
Creating a Table in PROC SQL

proc sql;

create table work.three as

select O.Name, Number, Color

from work.one as O,

work.two as T

where O.Name = T.Name

;

/* View the created table */

select * from work.three;

quit;

exercise6
Exercise
  • This exercise reinforces the concepts discussed previously.
slide239

Exercises

Modify the code from the previous exercise to create a table called work.transactions. Produce a listing report of the data.

proc sql;

select T.ActNum, Trans,

Amnt, Branch

from prog2.transact as T,

prog2.branches as B

where T.ActNum=B.ActNum

;

quit;

slide240

Exercises

Solution (Changes in blue):

proc sql;

create table work.transactions as

select T.ActNum, Trans,

Amnt, Branch

from prog2.transact as T,

prog2.branches as B

where T.ActNum=B.ActNum

;

select * from transactions;

quit;

slide241

Exercises

Open the holiday.sas program that you created in the previous section.

Merge the two files using the DATA step method and then using the SQL statement method.

Compare the output between the two methods.

slide242

Exercises

libname prog2 'SAS-data-library';

proc sql;

create table holiday as

select s.holiday, symbol, country

from prog2.symbol as S,

prog2.country as C

where s.holiday = c.holiday;

select * from holiday;

quit;

slide243

Exercises – Partial Output

Holiday Symbol Country

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Thanksgiving Cornucopia Canada

Thanksgiving Pumpkin Pie Canada

Thanksgiving Cornucopia Unites States

Thanksgiving Pumpkin Pie Unites States

Christmas Christmas Tree Germany

Christmas Nativity Germany

Christmas gifts Germany

Christmas Christmas Tree Canada

Christmas Nativity Canada

Christmas gifts Canada

Christmas Christmas Tree United States

Christmas Nativity United States

Christmas gifts United States

Christmas Christmas Tree England

Christmas Nativity England

Christmas gifts England

Christmas Christmas Tree Mexico

Christmas Nativity Mexico

Christmas gifts Mexico