Multiple Uses for a Simple SQL Procedure

Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida

Introduction • The SQL Procedure…Equivalent or advantageous to other SAS data steps and procedures? • Handy SQL Procedure: CREATE TABLE… INSERT INTO…SELECT • Several brief examples of using the above procedure will be presented SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies.

Overview of Topics • Inserting Rows from Multiple Tables into One Table • Inserting Rows from Multiple Queries into One Table • Changing the Order of Variables in a SAS Data Set • Changing Variable Attributes and/or Names in a SAS Data Set

1. Inserting Rows from Multiple Tables into One Table • Why would you want to insert rows from multiple tables into one table? • Information related to one customer/patient/client is stored in multiple data sets and linked by a common identifier • Information related to one issue/problem you would like to study is stored in multiple data sets and needs to be combined

Health Care Claims Example • In this first example, there are multiple source files with different record layouts • HomeHealth_clm – record layout 1 • Hospice_clm – record layout 1 • SkilledNursing_clm – record layout 1 • Inpatient_clm – record layout 1 • Outpatient_clm – record layout 2 • PartB_clm – record layout 2 • All files contain information needed to complete your research study • You need to combine information from all source data sets into one “master” analysis data set

Record Layout 1: person_id health_status_cd service_category_cd race_cd procedure_amt* procedure_dt* Record Layout 2: person_id health_status_cd service_category_cd race_cd service_amt* service_dt* Record Layouts of Health Care Claims Example Data Sets *In this example, source variables have different names in the different record layouts, although they contain analogous information and should be analyzed as one variable in the “master” data set

Create a new, blank table using the following code: PROC SQL; CREATE TABLE allclaims ( person_id char(11), health_status_cd char(2), service_category_cd char(2), race_cd char(1), serv_amt_cu num, service_dt num ) ; QUIT;

Use a macro to insert values from multiple source files into one data set /** 1. Define macro **/ %MACRO insert(amount, date, file); PROC SQL; INSERT INTO allclaims SELECT person_id, status_cd, service_category_cd, race_cd, &amount., &date. FROM &file.; QUIT; %MEND insert;

(continued) /** 2. Call macro **/ %insert(service_amt, service_dt, HomeHealth_clm); %insert(service_amt, service_dt, Hospice_clm); %insert(service_amt, service_dt, Inpatient_clm); %insert(service_amt, service_dt, SkilledNursing_clm); %insert(procedure_amt, procedure_dt, Outpatient_clm); %insert(procedure_amt, procedure_dt, PartB_clm);

Notice… • …that different source variables for the original files are inserted into a single variable in the new file • Both variables must be of the same type in the original file as being created in the new file (i.e. numeric into numeric) • service_amt (num) vs. procedure_amt (num) • service_dt (num) vs. procedure_dt (num) • One master data set was created for the purpose of completing analysis on information related to health care claims data from multiple source files

2. Inserting Rows from Multiple Queries into One Table • Why would you want to insert rows from multiple queries into one table? • You want to create summary records from your master data set based on different “where clause” conditions in separate queries and insert them into one summary data set

Health Care Claims Example, Part 2 • Continuing with the health care claims example provided above, a summary table with one record per person is desired to be used in statistical analyses, i.e. ANOVA, PROC LOGISTIC, etc. • You want the summary data set to be created with the following characteristics: • One record per person, • “Race_cd” (5 class levels) to be categorized to create the new variable, “minority_cd” (2 class levels),

(continued) • Summary variable “mean_cu” to be created as the mean amount paid per person, • Records with missing service amount to be excluded, • Persons with more than one value for minority_cd were excluded (to eliminate persons with indistinguishable minority status) • For statistical tests with a CLASS variable such as minority status, you want to make sure the classes are mutually exclusive, or you will compromise the validity of your results

First, create the blank summary table using the following code: PROC SQL; CREATE TABLE minority ( person_id char(11), health_status_cd char(2), service_category_cd char(2), minority_cd char(4), mean_cu num ) ;

Next, Insert values from multiple queries into one summary data set (Query 1) /** Query 1 **/ INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘mh’ and count(distinct minority_cd) = 1 GROUP BY person_id;

Continue inserting values into the summary data set (Query 2) /** Query 2 **/ INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1 GROUP BY person_id; QUIT;

Cool Code… • The count function in this WHERE clause is an example of PROC SQL code that can accomplish in one step, what would take more than one step and several more lines of regular SAS code where serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1 group by person_id • The case expression can be used to create a new variable that is a “re-categorization” of the values of another variable case when race_cd = ‘1’then ‘1’ else ‘0’ end as minority_cd

3. Changing the Order of Variables in a SAS Data Set • Why would you want to change the order of the variables in your data set? • Perhaps your data step manipulations have altered the order of the variables from the original, and you need to export the data in the same format you received it • Or you wish to arrange certain variable types together for coding processes to be run on sequential variables in your data set

Example of Changing the Order of Variables in Your Data Set • In the following example, oldfile (Table 1) has variables in a certain order that you would like to change, and newfile (Table 2) has variables in the order to which you would like to change

…After changing Before changing

/** Step 1 **/ PROC SQL; CREATE TABLE newfile ( pin char(11), sex_cd char(1), race_cd char(1), DOB num, age num, diagnosis_cd char(5), procedure_cd char(5), enrollment_dt num, termination_dt num, death_dt num ); /** Step 2 **/ INSERT INTO newfile SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM oldfile; QUIT; The following code was used to change the variable order:

Remember… • …the order in which you specify the variables in the INSERT INTO…SELECT statement must be the exact order of the variables that are created in the blank data set, NOT the order of the data set from which they came.

4. Changing Variable Attributes and/or Names in a SAS Data Set • The names of the variables do not have to be identical in the newly created table as the table from which you are selecting the records • Other attributes (i.e. formats, informats, labels, character length, etc.) can be easily added or changed • For example, we will create a new file called, “newfile2”, with different variable names and attributes from the data set in the previous example called, “newfile”

/** Step 1 **/ PROC SQL; CREATE TABLE newfile2 ( pin_id char(11), sex_cd char(1), race_cd char(1), DOB_dt num format=date9., age_iv num, diagnosis_cd char(5), procedure_cd char(5), enrollment_dt num format=date9., termination_dt num format=date9., death_dt num format=date9. ); /** Step 2 **/ INSERT INTO newfile2 SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM newfile; QUIT; Use the following code to change the name of variables and add a date formats in your data set:

Notice… • If all you want to do is change names or attributes of variables in a data set, then PROC DATASETS is a much more efficient tool; however, • If you have other steps that need to be accomplished by writing a new file, then the ability to change names and attributes is available in PROC SQL

Conclusion • There are many more applications for the CREATE TABLE…INSERT INTO…SELECT statements in PROC SQL than the few examples shown • Many times it is just another way to do the same thing • Other times it may cut down the number of sorts, data steps & procedures or lines of code required • Try it out for yourself…you may find something you will wish you had always known!

My preferred SQL coding resource is the following user’s guide: • SAS Institute Inc. (2000), SAS® SQL Procedure User’s Guide, Version 8, Cary, NC: SAS Institute Inc.

Speaker: Location of Company: Telephone: Fax: Email: Rebecca Larsen, Assistant in Research 13301 Bruce B. Downs Boulevard MHC2617 University of South Florida Tampa, Florida 33612-3807 (813) 974-7206 (813) 974-6411 rlarsen@fmhi.usf.edu About the Speaker To view the presentation slides online, go to http://psrdc.fmhi.usf.edu/SQL_Savannah_2002.ppt

Multiple Uses for a Simple SQL Procedure