Chapter 14

Chapter 14 Combining SAS® Data Sets

Section 14.1 Concatenating SAS Data Sets

Objectives • Define concatenation. • Use the SET statement in a DATA step to concatenate two or more SAS data sets. • Use the RENAME= data set option to change the names of variables. • Use the SET and BY statements in a DATA step to interleave two or more SAS data sets.

Data Set 1 Data Set 2 Data Set 3 Combining SAS Data Sets • In this chapter, you will look at combining data sets in two ways. Joining multiple data sets side by side (merging) Stacking multiple data sets (concatenation) Data Set 1 Data Set 2 Data Set 3 Data Set 3 Data Set 3 Rows might or might not have matches in each data set.

Data Set 1 Data Set 2 Data Set 3 Combining SAS Data Sets • In this chapter, you will look at combining data sets in two ways, concatenation and merging. Stacking multiple data sets (concatenation) Data Set 3

Concatenating SAS Data Sets • Use the SET statement in a DATA step to concatenate SAS data sets. • General form of a DATA step concatenation: • Data set names are separated by a space in the SET statement. DATA SAS-data-set; SET SAS-data-set1 SAS-data-set2 . . .; <other SAS statements>RUN;

jan feb mar Concatenating SAS Data Sets • You can read any number of SAS data sets with a single SET statement. SAS data sets work.qtr1 jan data work.qtr1; set work.janwork.febwork.mar; run; feb mar ...

Business Task • Two SAS data sets, na1 and na2, contain data for newly hired navigators. • Concatenate the data sets into a new data set named newhires. • The data sets contain the same variables. You will see what happens when there are different variables. na1 na2

na1 na2 PDV data newhires; set na1 na2; run; Concatenating SAS Data Sets: Compilation ...

na1 na2 PDV data newhires; set na1 na2; run; No additional variables Concatenating SAS Data Sets: Compilation ...

Concatenating SAS Data Sets: Compilation na1 na2 PDV data newhires; set na1 na2; run; newhires ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; newhires Initialize PDV to missing ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; setna1 na2; run; newhires ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; newhires

Automatic return Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; Automatic output newhires ...

Automatic return Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; setna1 na2; run; Automatic output newhires ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; newhires End of File ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; Initialize PDV to missing newhires When SAS moves to a new data set, the values in the PDV are reinitialized to missing. ...

Automatic output Automatic return Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; newhires ...

Concatenating SAS Data Sets: Execution na1 na2 PDV data newhires; set na1 na2; run; End of File newhires ...

Concatenating SAS Data Sets: Execution When SAS reaches the end of the file on the last data set, DATA step execution ends. What if the data sets contain different variables? newhires

Business Task • Two SAS data sets, fa1 and fa3, contain data for newly hired flight attendants. • Concatenate the data sets into a new data set named newfa. • The data sets contain similar data, but each has one unique variable (JobCode versus Country). fa1 fa3

Concatenating SAS Data Sets: Compilation fa3 fa1 PDV data newfa; set fa1 fa3; run; ...

Concatenating SAS Data Sets: Compilation fa3 fa1 PDV data newfa; set fa1 fa3; run; Add additional variable. ...

Concatenating SAS Data Sets: Compilation fa3 fa1 PDV data newfa; set fa1 fa3; run; newfa ...

Concatenating SAS Data Sets: Execution fa3 fa1 PDV data newfa; set fa1 fa3; run; newfa ...

Remember Concatenating SAS Data Sets: Execution • When SAS moves from one data set to another, the values in the PDV are reinitialized to missing.

Concatenating SAS Data Sets: Execution If SAS did not re-initialize the PDV when it moved from one data set to another, then JobCode would contain previousdata for the rows from fa2. fa2 fa1 newfa ...

Business Task • Two SAS data sets, fa1 and fa2, contain data for newly hired flight attendants. • Concatenate the data sets into a new data set named newfa. fa1 fa2 The data sets contain similar data, but the column names are not the same.

Concatenating SAS Data Sets: Execution fa2 fa1 JobCode and JCode represent the same data and should be in one column. data newfa; set fa1 fa2; run; newfa ...

The RENAME= Data Set Option • You can use a RENAME= data set option to change the name of a variable. • General form of the RENAME= data set option: SAS-data-set(RENAME=(old-name-1=new-name-1 old-name-2=new-name-2. . . old-name-n=new-name-n))

The RENAME= Data Set Option fa1 fa2 There are two sets of parentheses that are required. The outer set is for the data set options, such as RENAME= and KEEP= for fa2. They appear inside the outer set of parentheses. data newfa; set fa1fa2(rename=(JCode=JobCode)); run;

The RENAME= Data Set Option fa1 fa2 The inner parentheses surround all the variables that you want to rename. Separate each variable to rename with a space. For example, rename Gender to Sex in fa2. data newfa; set fa1fa2(rename=(JCode=JobCode Gender=Sex)); run;

The RENAME= Data Set Option fa1 fa2 The equals sign is part of the RENAME= option. data newfa; set fa1fa2(rename=(JCode=JobCode)); run; New Name Old Name

The RENAME= Data Set Option fa1 fa2 data newfa; set fa1 fa2(rename=(JCode=JobCode)); run; PDV ...

The RENAME= Data Set Option fa1 fa2 data newfa; set fa1 fa2(rename=(JCode=JobCode)); run; PDV SAS now knows JCode to be JobCode, which was already created in the PDV. Reference JobCode, not JCode, in your DATA step. ...

The RENAME= Data Set Option fa1 fa2 data newfa; set fa1 fa2(rename=(JCode=JobCode)); run; newfa What if you want JCode instead of JobCode? How would you modify the program? ...

The RENAME= Data Set Option fa1 fa2 data newfa; set fa1(rename=(JobCode=JCode)) fa2; run; newfa The RENAME= option is tangent to the data set that you want to change. ...

Exercise • This exercise reinforces the concepts discussed previously.

Exercises Principal Skinner has seven data sets containing student elective choices, one for each guidance counselor. She wants one data set, StudentElectives, that contains each row from all seven data sets. The data sets to read from are CraigStudents, LopezStudents, ShahStudents, LiStudents, RobbinsStudents, PerryStudents, and GreeneStudents. The data set needs to be in sorted order by student ID number and should contain StudentID,Name,and Choice1 columns. Produce a listing report with an appropriate title. Produce a frequency report analyzing Choice1 with an appropriate title. (Suppress page numbers and the date and time, as well as cumulative statistics.) Save your program as skinner.sas to be used later.

Exercises data StudentElectives; set prog2.CraigStudents prog2.LopezStudents prog2.ShahStudents prog2.LiStudents prog2.RobbinsStudents prog2.PerryStudents prog2.GreeneStudents; drop Choice2 Choice3; *Keep StudentID Name Choice1; run; proc sort data= StudentElectives; by StudentID; run; options nodate nonumber; title 'Student First Choice Electives'; proc print data= StudentElectives; run; title 'Frequency Counts for First Choice Electives'; proc freq data= StudentElectives; tables Choice1 /nocum; run;

Exercises Partial PROC PRINT Output Student First Choice Electives Student Obs ID Name Choice1 1 1005 Chaz Richardson SAT Verbal/Math Prep 2 1154 Barbara Muir Conversations in Diversity 3 1155 Angel Reisman Advanced Shop 4 1245 Leticia Ritter African American Literature 5 1257 Richard Calle Astronomy 6 1258 Ronnie Trimpin Advanced Functions and Modeling . . . 108 9940 Moises Curbelo Astronomy 109 9942 Brooke Sears Creative Writing 110 9961 Darnell Lowrance Marine Ecology 111 9962 Jamie Rowan Advanced Shop 112 9995 Kelvin Garcia Visual Arts 113 9997 Tiffany Pierre Visual Arts 114 9999 Randy Leifer Sculpture and Ceramics

Exercises PROC FREQ Output Frequency Counts for First Choice Electives The FREQ Procedure Choice1 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Advanced Functions and Modeling 5 4.39 Advanced Shop 7 6.14 African American Literature 2 1.75 Anatomy and Physiology 2 1.75 Astronomy 9 7.89 Band 7 6.14 Conversations in Diversity 6 5.26 Creative Writing 3 2.63 Environmental Science 7 6.14 Home Economics 2 1.75 Intro to Communications and Mass Media 6 5.26 Java Programming 3 2.63 Law and Justice 3 2.63 Marine Ecology 6 5.26 Newspaper 6 5.26 SAS Programming 4 3.51 SAT Verbal/Math Prep 9 7.89 Sculpture and Ceramics 3 2.63 Textile Art 3 2.63 Theatre Workshop 7 6.14 Trends in Young Adult Literature 6 5.26 Visual Arts 3 2.63 Yearbook 5 4.39

Exercises Open the program Lengths.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires. The two data sets have the same variables, but the variable Name has a different length in each data set. Answer the questions on the following slide as they relate to the newhires data set.

Exercises • Will the DATA step run successfully or will there be an error? • If it runs successfully: • What do you think the length of Name will be in the new data set? Why? • Submit the program to verify your answer. • Modify the last DATA step to put NA2beforeNA1 in the SET statement. • What do you think the length of Name will be in the new data set? Why? • Submit the program to verify your answer.

Exercises – Different Lengths • Did the DATA step run successfully or was there an error? • It ran successfully. • What is the length of Name in the new data set? Why? • The length of Name was 4, the length from the first data set in the SET statement (NA1). • After DATA step modification: • The length of Name is 6, the length from the first data set in the SET statement (NA2). • Why? • Because at compile time, SAS creates a spot in the PDV and assigns variable attributes such as name, type, and length in the order that it sees them. SAS sees the data set NA1 or NA2 in the SET statement first, so Name is assigned the length from that data set.

Exercises Open the program Formats.sas, which creates two data sets for newly hired navigators, NA1 and NA2. The data sets are concatenated to create a data set named newhires. The two data sets have the same variables, but the variable Name has a different format in each data set. Answer the questions on the following slide as they relate to the newhires data set.

Exercises • Will the DATA step run successfully or will there be an error? • If it successfully runs: • What do you think the format of Salary will be in the new data set? Why? • Submit the program to verify your answer. • Modify the last DATA step to put NA2beforeNA1 in the SET statement. • What do you think the format of Salary will be in the new data set? Why? • Submit the program to verify your answer.

Chapter 14

Chapter 14

Presentation Transcript

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14.

Chapter 14

Chapter 14

CHAPTER 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14

Chapter 14