1 / 10

Merging in SAS

Merging in SAS. These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“

evan
Download Presentation

Merging in SAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merging in SAS • These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ • In the slides, the red data goes into the merged data set. The greyed out observations are left out.

  2. The perfect merge

  3. Not so perfect (if a or b;)

  4. If a=b; (both datasets contribute)

  5. If a; (must be in dataset A)

  6. If b; (must be in dataset B)

  7. Notes • The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset). • Assumption: Both data sets are sorted by the unique identifier(s).

  8. Sample code

  9. Typical problems • If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether) • If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side. • Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.

  10. References Good references are • http://ftp.sas.com/techsup/download/technote/ts644.html • and a manual called "Combining and modifying SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).

More Related