100 likes | 238 Views
This presentation explores various methods for merging two datasets in SAS using the IN= data set option. It highlights the importance of having a unique identifier, discusses sorting requirements, and examines typical problems encountered during merges. Key takeaways include the implications of different merge conditions, such as keeping observations from either or both datasets, as well as common pitfalls like format discrepancies and variable name conflicts. Useful references and examples are provided to enhance understanding.
E N D
Merging in SAS • These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ • In the slides, the red data goes into the merged data set. The greyed out observations are left out.
Notes • The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset). • Assumption: Both data sets are sorted by the unique identifier(s).
Typical problems • If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether) • If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side. • Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.
References Good references are • http://ftp.sas.com/techsup/download/technote/ts644.html • and a manual called "Combining and modifying SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).