Nearest neighbor matching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Nearest neighbor matching PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on
  • Presentation posted in: General

Nearest neighbor matching. USING THE GREEDY MATCH MACRO. Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern.

Download Presentation

Nearest neighbor matching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nearest neighbor matching

Nearest neighbor matching

USING THE GREEDY MATCH MACRO

Note: Much of the code originally was written by Lori Parsons

http://www2.sas.com/proceedings/sugi26/p214-26.pdf

This code has been written with simplicity as a primary concern.

If you do not have a large number of controls, you may want to modify it


Nearest neighbor matching

/* Define the library for formats */

LIBNAME saslib "G:\oldpeople\sasdata\" ;

OPTIONS NOFMTERR FMTSEARCH = (saslib) ;


Nearest neighbor matching

/* Define the library for study data */

LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;


Include the macro

Include the Macro

%INCLUDE

'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;


Propen libname dsname idvariable dependent propensity

%propen(libname, dsname, idvariable, dependent, propensity)

LIBNAME = directory for data sets

DSNAME = dataset with study data

IDVARIABLE = subject ID variable

DEPENDENT = dependent variable

PROPENSITY = propensity score produced in logistic regression


Propen study allpropen id athome prob

%propen(study,allpropen,id,athome,prob);

FOR EXAMPLE

Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did


Explaining the macro

Explaining the macro

A

Challenge


Macro propen lib dsn id depend prob

%macropropen(lib,dsn,id,depend,prob);

Data in5 ;

set &lib..&dsn;

Creates a temporary data set


Propensity scores rounded to 5 then 4 2 3 and 1 decimals

Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals

%Do countr = 1 %to 5;

%let digits = %eval(6 - &countr);

%let roundto = %eval(10**&digits);

%let roundto = %sysevalf(1/&roundto);

%let nextin = %eval(&digits - 1);


Macro notes

MACRO NOTES

%Do countr = 1 %to 5;/* Starts %DO loop */

Use %EVAL function to do integer arithmetic

%let digits = %eval(6 - &countr);

Use %SYSEVALF function to do non-integers


Output control to one data set intervention to another create random number to sort within group

/* Output control to one data set, intervention to another *//* Create random number to sort within group */


Create 2 data sets

Create 2 data sets

DATA yes1 (KEEP= &probid_ydepend_yrandnum)

no1 (KEEP = &probid_ndepend_nrandnum);

SET in&digits;

We go through this loop 5 times and create data sets of records

matching to 5, 4, 3, 2 and 1 decimal places

We only keep four variables


Assignment statements

Assignment statements

randnum = RANUNI(0);

&prob = ROUND(&prob,&roundto);

Create a random number and

Round propensity score to a set number of digits


Output to case data set

Output to Case Data set …

IF &depend = 1 THEN DO ;

id_y = &id ;

depend_y = &depend ;

OUTPUT yes1 ;

END ;

We need to rename the dependent & id variables or they’ll get overwritten


Or output control data set

… Or output control data set

ELSE IF &depend = 0 THEN DO ;

id_n = &id ;

depend_n = &depend ;

OUTPUT no1 ;

END ;

Notice the data sets were named no1 and yes1

It becomes evident why shortly


Nearest neighbor matching

/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */


Do i 1 to 20

%Do i = 1 %to 20;

%let j = %eval(&i +1);

procsortdata = yes&i;

by &probrandnum;

datayes&iyes&j;

set yes&i;

by &prob;

if first.&prob then output yes&i;

else output yes&j;

NOTE: Matching without replacement


Same thing for controls

Same thing for controls

procsortdata = no&i;

by &probrandnum;

datano&ino&j;

set no&i;

by &prob;

if first.&prob then output no&i;

else output no&j;

The randnum insures matching scores are pulled at random


Merge matches end loop

Merge matches, end loop

DATAmatch&i;

MERGE yes&i(in= ina)no&i(in= inb);

BY &prob;

IF ina AND inb;

run;

%END ;


Adds all matches into a single data set

/* Adds all matches into a single data set */

DATAallmatches;

SET

%DO k = 1 %TO 20;

match&k

%END ;

Concatenate all data sets with matches (N=20)


Create two data sets with ids

Create two data sets with IDs

DATA

allyes (RENAME = (id_y = &id depend_y = &depend))

allno (RENAME = (id_n = &id depend_n = &depend));

SET allmatches ;


Create one file of all matched ids

Create one file of all matched IDs

DATAmatchfile;

SET allyesallno;

And sort it …

procsortdata = matchfile;

by &id &depend ;


Proc sort data in digits by id depend

proc sort data = in&digits ;by &id &depend ;


Nearest neighbor matching

/* Creates a data set of all subjects with n-digit match */

/* Creates a second data set of subjects with no match */

data matches&digitsin&nextin ;merge in&digits (in = ina) matchfile (in= inb) ;by &id &depend ;if ina and inb then output matches&digits ;else output in&nextin ;


Title matches roundto proc freq data matches digits tables depend run end

JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH

Title "Matches &roundto " ;proc freq data = matches&digits ;tables &depend ;run ;%end ;

End loop. Now match to 4 decimal places, etc


Adds 1 to 5 digit matches into a single data set

/* Adds 1- to 5-digit matches into a single data set */

data &lib..finalset;

set

%do m = 1 %to 5;

matches&m

%end ;


One final check done

One final check & done !

Title "Distribution of Dependent Variable in &lib..finalset " ;

procfreqdata = &lib..finalset;

tables &depend ;

run;

%mendpropen;

run;


Did it work

Did it work?

** P <.01 **** P < .0001


Model comparison

Model Comparison


Odds ratio

Odds ratio


How near

How near?


  • Login