1 / 24

Chapter 21 Creating Multiple Observations from a Single Record

Chapter 21 Creating Multiple Observations from a Single Record. Objectives Create multiple observations from a single record containing repeating blocks of data Create multiple observations from a single record containing one ID filed followed by the same number of repeating fields.

veda-giles
Download Presentation

Chapter 21 Creating Multiple Observations from a Single Record

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 21Creating Multiple Observations from a Single Record • Objectives • Create multiple observations from a single record containing repeating blocks of data • Create multiple observations from a single record containing one ID filed followed by the same number of repeating fields. • Create multiple observations from a single records containing one ID field followed by a varying number of repeating fields

  2. Three Situations Involving Multiple Observations from a Single record Situation 1: Repeating blocks of data that represent separate observations: Each record in the following data consists of three individuals’ test scores: TOM 80 JOHN 60 TERRY 90 KEN 85 STAN 78 SCOTT 86 There are six individuals in this data. Three observations are in one record.

  3. Situation 2: an ID field followed by an equal number of repeated fields that represent separate observations The following data set consists of individuals’ top three hobbies. An ID followed by three hobbies: 01 WLAKING RUNNING SWINNING 02 GOLFING TENNIS BASEBALL 03 SWINNING TENNIS BASKETBALL

  4. Situation 3: An ID field followed by a varying number of repeating fields that represent separate observation The following is the transactional data from a grocery store, which records individual’s shopping list: 001 PORK CHEESE BEER VEGETABLE 002 Cake BEER WINE DOGFOOD COOKIE 003 BEER CHEESE WINE 004 CANDY CHOCHLATE

  5. How does SAS read multiple observations from one single record? SAS introduces two line-holding specifiers: • The trailing sign, @ : This sign holds the input record for the execution of the next INPUT statement. • The double trailing sign, @@: This sign holds the input record for the execution of the next record statement, even across iteration of the data step. • NOTE: Both trailing @ and double trailing @@ must be the last item in the INPUT statement.

  6. Trailing @ Versus Double Trailing @

  7. Situation 1: Reading Repeating Blocks of Data A raw data file contains each employee’s identification number and this year’s contribution to his or her retirement plan. Each record contains information for multiple employees. E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980

  8. Desired Output The output SAS data set should have one observation per employee. EmpID Contrib E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980

  9. Processing: What Is Required? E00973 1400 E09872 2003 E73150 2400 Read for Obs. 1 Read for Obs. 2 Read for Obs. 3 ProcessOtherStatements ProcessOtherStatements ProcessOtherStatements Output Output Output ...

  10. Use the Double Trailing @@to read repeating blocks of data • The double trailing @@ holds the raw data record across iterations of the DATA step until the line pointer moves past the end of the line. INPUTvar1 var2 var3 … @@;

  11. The Double Trailing @@ data work.retire; length EmpID $ 6; infile 'raw-data-file'; input EmpID $ Contrib @@; run; Hold until endof record. ...

  12. Creating Multiple Observations Per Record Partial Log NOTE: 2 records were read from the infile 'retire.dat'. The minimum record length was 35. The maximum record length was 36. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.RETIRE has 5 observations and 2 variables. The "SAS went to a new line" message is expected because the @@ option indicates that SAS should read until the end of each record.

  13. Creating Multiple Observations Per Record proc print data=retire noobs; run; PROC PRINT Output EmpID Contrib E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980

  14. Exercise Open c21_1 program. Run each program and observe the results. Make sure you learn how to use trailing @@.

  15. Situation 2: ID followed by the same # of repeating fields The following data consists of employee’s quarterly sales. Each record consists of Employ ID followed each of the four quarter sales. A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 The goal is to create a data set with the following variables: ID Quarter Sales A05 1 2304.53 A05 2 3012.55 A05 3 2567.12 A05 4 3835.55 - - - - - - - - - - - - - -

  16. Use Trailing @ to read records with ID followed by same # of repeating fields Data EXPENSE1; input ID $ @; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; PROC PRINT; title 'Use @ - read ID, then, read mulptiple expenses for EXPENSE variable'; RUN;

  17. Using DO Loop and Trailing @ together data expense2; input id $ @; do quarter = 1 to 4; input expense: comma. @ ; output; end; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; proc print; title 'Use @ - read ID, then, DO loop to read muptiple expenses for EXPENSE variable'; run;

  18. Exercise Open the program c21_2 Run each program and learn how to use one single trailing @ .

  19. Situation 3: Reading data with ID followed by varying # of repeating fields The following data consists of employee’s quarterly sales. Each record consists of Employ ID followed each of the four quarter sales. A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 The goal is to create a data set with the following variables: ID Quarter Sales A05 1 2304.53 A05 2 3012.55 A05 3 2567.12 A05 4 3835.55 - - - - - - - - - - - - - - NOTE: There are some quarterly sales missing at the 3rd and 4th quarter. As a result, the # of repeating fields varies.

  20. Reading records with ID followed by varying # of repeating fields To read this type of data, we need to use trailing @ to hold the input statement to process the data step, then read next variables until @, then, process these variables as a record until the end of the data step, then, read the next variables and so on. If the # of repeating fields are not same, then, one can consider there are missing data at the end of the record, and apply the MISSOVER option in the INFILE statement to handle the varying # of fields. NOTE: If the record length is not fixed, then, PAD option will be needed to fix the record length problem.

  21. data expense3; infiledatalinesmissover pad; input id $ EXPENSE : COMMA. @; QUARTER = 0; do until (EXPENSE eq. ); QUARTER+1; output; input expense: comma. @ ; end; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; proc print; title 'Use @ - read ID, then, use DO WHILE to read multiple expenses for EXPENSE variable'; run; NOTE: This program uses MISSOVER option to handle the missing at the end of a record and use PAD option to take care of the variable record length problem.

  22. To read the record with ID followed by varying # of fields with missing in the middle or beginning • It is possible that there are missing data in the beginning, middle or end of a record. • To handle this situation, in addition to using MISSOVER and PAD, one may use DSD • It is possible that the data is recorded in free format, and list input will be needed. In this situation, it usually also requires to specify a delimiters using DLM = ‘delimiters’ option

  23. This program uses MISSOVER to handle the missing at the end for varying # of repeating fields, Use DSD to handle the missing in the middle. data expense4; infiledatalinesdlm = '/' missoverdsd; input id $ EXPENSE : COMMA. @; quarter=0; do until(EXPENSE eq. ); QUARTER+1; output; input expense: comma. @ ; end; datalines; A05/2,304.53 /3,012.55 /2,567.12/ 3,835.55 A06 // 4,132.75/ 3,655.21/ 4,886.32 A07/ 1,965.34 /2,540.67/ 2,103.65/ 3,023.54 A08/ 5,341.55/ 5,021.40/ 6,011.61 /7,561.48 A09 /3,455.91/ 3,122.43 /3,664.13/ A10/ 4,678.43/ 5,217.90/ 4,633.85/ 5,725.35 ; proc print; title 'Use @ - read ID, the, use DO UNTIL to read multiple expenses for EXPENSE variable'; run;

  24. Exercise Open program c21_mulObs_SingRec(ID+VaryNumRep) Run each program and learn how to read multiple observations from single record with the same number of repeats using Trailing @ for situation when # of repeats are different.

More Related