- 87 Views
- Uploaded on
- Presentation posted in: General

Nearest neighbor matching

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Nearest neighbor matching

USING THE GREEDY MATCH MACRO

Note: Much of the code originally was written by Lori Parsons

http://www2.sas.com/proceedings/sugi26/p214-26.pdf

This code has been written with simplicity as a primary concern.

If you do not have a large number of controls, you may want to modify it

/* Define the library for formats */

LIBNAME saslib "G:\oldpeople\sasdata\" ;

OPTIONS NOFMTERR FMTSEARCH = (saslib) ;

/* Define the library for study data */

LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;

Include the Macro

%INCLUDE

'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;

%propen(libname, dsname, idvariable, dependent, propensity)

LIBNAME = directory for data sets

DSNAME = dataset with study data

IDVARIABLE = subject ID variable

DEPENDENT = dependent variable

PROPENSITY = propensity score produced in logistic regression

%propen(study,allpropen,id,athome,prob);

FOR EXAMPLE

Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did

A

Challenge

Data in5 ;

set &lib..&dsn;

Creates a temporary data set

%Do countr = 1 %to 5;

%let digits = %eval(6 - &countr);

%let roundto = %eval(10**&digits);

%let roundto = %sysevalf(1/&roundto);

%let nextin = %eval(&digits - 1);

%Do countr = 1 %to 5;/* Starts %DO loop */

Use %EVAL function to do integer arithmetic

%let digits = %eval(6 - &countr);

Use %SYSEVALF function to do non-integers

/* Output control to one data set, intervention to another *//* Create random number to sort within group */

DATA yes1 (KEEP= &probid_ydepend_yrandnum)

no1 (KEEP = &probid_ndepend_nrandnum);

SET in&digits;

We go through this loop 5 times and create data sets of records

matching to 5, 4, 3, 2 and 1 decimal places

We only keep four variables

randnum = RANUNI(0);

&prob = ROUND(&prob,&roundto);

Create a random number and

Round propensity score to a set number of digits

IF &depend = 1 THEN DO ;

id_y = &id ;

depend_y = &depend ;

OUTPUT yes1 ;

END ;

We need to rename the dependent & id variables or they’ll get overwritten

ELSE IF &depend = 0 THEN DO ;

id_n = &id ;

depend_n = &depend ;

OUTPUT no1 ;

END ;

Notice the data sets were named no1 and yes1

It becomes evident why shortly

/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */

%let j = %eval(&i +1);

procsortdata = yes&i;

by &probrandnum;

datayes&iyes&j;

set yes&i;

by &prob;

if first.&prob then output yes&i;

else output yes&j;

NOTE: Matching without replacement

procsortdata = no&i;

by &probrandnum;

datano&ino&j;

set no&i;

by &prob;

if first.&prob then output no&i;

else output no&j;

The randnum insures matching scores are pulled at random

DATAmatch&i;

MERGE yes&i(in= ina)no&i(in= inb);

BY &prob;

IF ina AND inb;

run;

%END ;

DATAallmatches;

SET

%DO k = 1 %TO 20;

match&k

%END ;

Concatenate all data sets with matches (N=20)

DATA

allyes (RENAME = (id_y = &id depend_y = &depend))

allno (RENAME = (id_n = &id depend_n = &depend));

SET allmatches ;

DATAmatchfile;

SET allyesallno;

And sort it …

procsortdata = matchfile;

by &id &depend ;

/* Creates a data set of all subjects with n-digit match */

/* Creates a second data set of subjects with no match */

data matches&digitsin&nextin ;merge in&digits (in = ina) matchfile (in= inb) ;by &id &depend ;if ina and inb then output matches&digits ;else output in&nextin ;

JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH

End loop. Now match to 4 decimal places, etc

data &lib..finalset;

set

%do m = 1 %to 5;

matches&m

%end ;

Title "Distribution of Dependent Variable in &lib..finalset " ;

procfreqdata = &lib..finalset;

tables &depend ;

run;

%mendpropen;

run;

** P <.01 **** P < .0001