1 / 96

Chapter 10

Chapter 10. Summarizing Data. Section 10.1. Creating an Accumulating Total Variable. Objectives. Understand how the SAS System initializes the value of a variable in the PDV. Prevent reinitialization of a variable in the PDV. Create an accumulating variable. SAS Vocabulary. RETAIN SUM.

Download Presentation

Chapter 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10 Summarizing Data

  2. Section 10.1 Creating an Accumulating Total Variable

  3. Objectives • Understand how the SAS System initializes the value of a variable in the PDV. • Prevent reinitialization of a variable in the PDV. • Create an accumulating variable.

  4. SAS Vocabulary • RETAIN • SUM

  5. Creating an Accumulating Variable • The SAS data set prog2.daysalescontains daily sales data for a retail store. There is one observation for each day in April showing the date (SaleDate) and the total receipts for that day (SaleAmt). SaleDate SaleAmt 01APR2001 498.49 02APR2001 946.50 03APR2001 994.97 04APR2001 564.59 05APR2001 783.01 06APR2001 228.82 07APR2001 930.57 08APR2001 211.47 09APR2001 156.23 10APR2001 117.69 11APR2001 374.73 12APR2001 252.73

  6. Sale SaleDate Amt Mth2Dte 01APR2001 498.49 498.49 02APR2001 946.50 1444.99 03APR2001 994.97 2439.96 04APR2001 564.59 3004.55 05APR2001 783.01 3787.56 + Creating an Accumulating Variable • The store manager wants to see a running total of sales for the month as of each day. • Partial Output

  7. Creating Mth2Dte • By default, variables created with an assignment statement are initialized to missing at the top of the DATA step. • An accumulating variable must retain its value from one observation to the next. Mth2Dte=Mth2Dte+SaleAmt;

  8. The RETAIN Statement • General form of the RETAIN statement: • The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. • Previous values of retained variables are available for processing across iterations of the DATA step. RETAINvariable-name <initial-value> …;

  9. The RETAIN Statement • The RETAIN statement has the following characteristics: • retains the value of the variable in the PDV across iterations of the DATA step • initializes the retained variable to missing before the first execution of the DATA step if an initial value is not specified • is a compile-time-only statement • has no effect on variables that are read with SET, MERGE, or UPDATE statements; values read from SAS data sets are automatically retained. • a variable referenced in the RETAIN statement appears in the output SAS data set only if it is given an initial value or referenced elsewhere in the DATA step

  10. Retain Mth2Dte and Set an Initial Value • If you do not supply an initial value, all the values of Mth2Dte will be missing. retain Mth2Dte 0;

  11. Creating an Accumulating Variable data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run;

  12. Compile R SALEDATE SALEAMT MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 ...

  13. Execute R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT . . 0

  14. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT . . 15066 498.49 0 ...

  15. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 0 + 498.49 SALEDATE SALEAMT . . 0 15066 498.49 498.49 ...

  16. R SALEDATE SALEAMT MTH2DTE Write out observation to data set mnthtot. data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Output 15066 498.49 498.49 ...

  17. R SALEDATE SALEAMT MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Return 15066 498.49 498.49 ...

  18. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT 15066 498.49 498.49 ...

  19. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT 15066 498.49 15067 946.50 498.49 ...

  20. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 498.49 + 946.50 SALEDATE SALEAMT 15066 498.49 15067 946.50 1444.99 498.49 ...

  21. R SALEDATE SALEAMT MTH2DTE Write out observation to data set mnthtot. data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Output 15067 946.50 1444.99 ...

  22. R SALEDATE SALEAMT MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Return 15067 946.50 1444.99 ...

  23. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT 15067 946.50 1444.99 ...

  24. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 SALEDATE SALEAMT 15067 946.50 15068 994.97 1444.99 ...

  25. R MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 1444.99 + 994.97 SALEDATE SALEAMT 15067 946.50 15068 994.97 2439.96 1444.99 ...

  26. R SALEDATE SALEAMT MTH2DTE Write out observation to data set mnthtot. data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Output 15068 994.97 2439.96 ...

  27. R SALEDATE SALEAMT MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Automatic Return 15068 994.97 2439.96 ...

  28. R SALEDATE SALEAMT MTH2DTE data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt; run; SaleDate SaleAmt 15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01 Continue processinguntil end of SAS data set 15068 994.97 2439.96

  29. Creating an Accumulating Variable proc print data=mnthtot noobs; format SaleDate date9.; run; Partial PROC PRINT Output Sale SaleDate Amt Mth2Dte 01APR2001 498.49 498.49 02APR2001 946.50 1444.99 03APR2001 994.97 2439.96 04APR2001 564.59 3004.55 05APR2001 783.01 3787.56

  30. Exercise • This exercise reinforces the concepts discussed previously.

  31. Exercises Using the TotalPurchases data set, create a data set called Budget and a listing report: Tot Budgeted ItemNo ItemName Needed Cost TotalSpent 636589 Mugs 12 oz 128 $826.88 $826.88 167643 Mugs 20 oz 24 $172.80 $999.68 891674 Wine Glass Fluted 100 $350.00 $1,349.68 197646 Wine Glass Clear 100 $350.00 $1,699.68 324346 Cutting Board Maple 15 $155.25 $1,854.93 354664 Carving Knife 10 inch 30 $1,111.80 $2,966.73 994643 Paring Knife 30 $599.25 $3,565.98 764646 Drink Carafe 48 $1,233.00 $4,798.98 546546 Fluted Pitchers 36 $1,007.71 $5,806.69

  32. Exercises A Solution: data budget; set totalpurchases; retain TotalSpent 0; keep ItemNo ItemName TotNeeded BudgetedCost TotalSpent; TotalSpent =TotalSpent + BudgetedCost; format TotalSpent dollar12.2; run; procprintdata=budget noobs; run;

  33. Accumulating Totals: Missing Values • What if any of the values for SaleAmtare missing? data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2dte=Mth2Dte+SaleAmt; run;

  34. Accumulating Totals: Missing Values Sale SaleDate Amt Mth2Dte 01APR2001 498.49 498.49 02APR2001 . . 03APR2001 994.97 . 04APR2001 564.59 . 05APR2001 783.01 . Subsequent values of Mth2Dte are missing. Missing value

  35. Accumulating Totals: Missing Values • The result of any mathematical operation involving a missing value results in a missing value. • With the RETAIN statement, one missing value for SaleAmt causes all subsequent values of Mth2Dte to be missing.

  36. Accumulating Totals: Missing Values • One method to solve this problem is to use the SUM function in the assignment statement. • Another method is to use a SUMstatement. Mth2Dte=sum(Mth2Dte,SaleAmt);

  37. The SUM Statement • When you create an accumulating variable, an alternative to the RETAIN statement is the SUMstatement. • General form of the SUM statement: • Note: There is no equals sign in the SUM statement. variable+ expression;

  38. The SUM Statement • The SUM statement • creates the variable on the left side of the plus sign if it does not already exist • initializes the variable to zero before the first iteration of the DATA step • automatically retains the variable (you don’t need a RETAIN statement) • adds the value of the expression to the variable at execution • ignores missing values.

  39. Accumulating Totals: Missing Values data mnthtot2; set prog2.daysales2; Mth2Dte+SaleAmt; run; The RETAIN statement is not necessary in this example. The variable to be retained must be placed before the plus sign.

  40. Accumulating Totals: Missing Values proc print data=mnthtot2 noobs; format SaleDate date9.; run; Partial PROC PRINT Output SaleDate SaleAmt Mth2Dte 01APR2001 498.49 498.49 02APR2001 . 498.49 03APR2001 994.97 1493.46 04APR2001 564.59 2058.05 05APR2001 783.01 2841.06

  41. Retaining Subtraction • To subtract values, use a negative sign in front of the variable you want to subtract. data mnthtot2; set prog2.daysales2; Mth2Dte+(-SaleAmt); run;

  42. Retaining Subtraction • To perform calculations other than addition and subtraction, use a RETAIN statement and an assignment statement. data mnthtot2; retain Discount 0; set prog2.daysales; Discount=SaleAmt*.25; run;

  43. RETAIN versus SUM • When do you use the use RETAIN and when do you use SUM? • Use a SUM statement if • there are missing values in the data • you are using addition or subtraction. • Use the RETAIN statement with an assignment statement • if there are missing values in the data, use a function • you are using other calculations such as multiplication.

  44. Exercise • This exercise reinforces the concepts discussed previously.

  45. Exercises Using the TotalPurchases data set, create a data set called Purchased and a listing report: Tot Budgeted NumItems ItemNo ItemName Purchased Cost TotalSpent Purchased 636589 Mugs 12 oz 64 $826.88 $826.88 64 167643 Mugs 20 oz 12 $172.80 $999.68 76 891674 Wine Glass Fluted 85 $350.00 $1,349.68 161 197646 Wine Glass Clear . $350.00 $1,699.68 161 324346 Cutting Board Maple 15 $155.25 $1,854.93 176 354664 Carving Knife 10 inch 15 $1,111.80 $2,966.73 191 994643 Paring Knife 4 $599.25 $3,565.98 195 764646 Drink Carafe 48 $1,233.00 $4,798.98 243 546546 Fluted Pitchers 40 $1,007.71 $5,806.69 283 Note that NumItemsPurchased is the sum of TotPurchased.

  46. Exercises A Solution: data purchased; set totalpurchases; keep ItemNo ItemName TotPurchased BudgetedCost TotalSpent NumItemsPurchased; TotalSpent + BudgetedCost; NumItemsPurchased + TotPurchased; format TotalSpent dollar12.2; run; procprintdata=purchased noobs; run;

  47. Exercise – Section 10.1 • This exercise reinforces the concepts discussed previously.

  48. Section 10.2 Accumulating Totals for a Group of Data

  49. Objectives • Define First. and Last. processing. • Calculate an accumulating total for groups of data. • Use a subsetting IF statement to output selected observations.

  50. SAS Vocabulary • PROC SORT • LAST.BYVariable • FIRST.BYVariable

More Related