1 / 41

SAS: ARRAY PROCESSING

SAS: ARRAY PROCESSING. Jordan Elm. INTRODUCTION. Most mathematical and computer languages have some notation for repeating. EG: a matrix, a vector, a dimension, a table In a SAS data step, this structure is called an Array. A group of variables defined in a data step.

seth
Download Presentation

SAS: ARRAY PROCESSING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS: ARRAY PROCESSING Jordan Elm

  2. INTRODUCTION • Most mathematical and computer languages have some notation for repeating. • EG: a matrix, a vector, a dimension, a table • In a SAS data step, this structure is called an Array. • A group of variables defined in a data step. • Array elements don’t need to be contiguous, the same length, or even related at all. • All elements must be character or numeric.

  3. Why do we need SAS arrays? • Use arrays to help read and analyze repetitive data with a minimum of coding. • An array and a loop can make the program smaller. • Examples of Use • Recoding variables(eg. missing values set to -999) • Applying the same computation to many variables simultaneously (eg. Fahrenheit to Celsius) • Computing new variables (eg. Continuous to Binary) • Reshaping Data (Wide to Long/ Long to Wide)

  4. Example: Applying the same computation to many variables simultaneously • For each record (row) there are 24 variables (temp1-temp24) with the temperatures for each hour of the day. • Temps are in Fahrenheit and need to convert them to Celsius. data; inputetc. celsius_temp1 = 5/9(temp1 – 32); celsius_temp2 = 5/9(temp2 – 32); ... celsius_temp24 = 5/9(temp24 – 32); run;

  5. Define arrays • Define arrays and use a loop data ; inputetc.; array temperature_array {24} temp1-temp24; array celsius_array {24} celsius_temp1-celsius_temp24; do i = 1to24; celsius_array{i} = 5/9(temperature_array{i} – 32); end; run;

  6. Recoding Variables • Missing coded as -999 data...; set...; array inc[12] faminc1 - faminc12; do i = 1to12; if inc[i]=-999then inc[i]=.; end;

  7. Of Note • While TEMP1 is equivalent to the first element, TEMP2 to the second etc., the variables do not need to be named consecutively. • The array would work just as well with non-consecutive variable names. array sample_array {5} x a i r d; • In this example, the variable x is equivalent to the first element, a to the second etc.

  8. BASIC ARRAY CONCEPTS • SAS arrays are another way to temporarily group and refer to SAS variables. • A SAS array is not a new data structure, the array name is not a variable, and arrays do not define additional variables. • Rather, a SAS array provides a different name to reference a group of variables.

  9. BASIC ARRAY CONCEPTS • The ARRAY statement defines variables to be processed as a group. • The variables referenced by the array are called elements. • Once an array is defined, the array name and an index reference the elements of the array. • Since similar processing is generally completed on the array elements, references to the array are usually found within DO groups.

  10. ARRAY STATEMENT

  11. ARRAY STATEMENT • The statement used to define an array is the ARRAY statement. array array-name {n} <$> <length> array-elements <(initial-values)>; • array-name – Any valid SAS name • n – Number of elements within the array • $ - Indicates the elements within the array are character type variables • length – A common length for the array elements • array-elements – List of SAS variables to be part of the array • initial values – Provides the initial values for each of the array elements

  12. BASIC CONCEPTS (cont) • The ARRAY statement is a compiler statement within the data step. • Array elements cannot be used in compiler statements such as DROP or KEEP. • An array must be defined within the data step prior to being referenced or an error will occur. • Defining an array within one data step and referencing the array within another data step will cause errors. • Must define an array within every data step where the array will be referenced

  13. Array Statements: Special Variables • When all numeric or all character variables in the data set are to be elements within the array, no need to list the individual variables as elements. • _NUMERIC_ - when all the numeric variables will be used as elements • _CHARACTER_ - when all the character variables will be used as elements • _ALL_ - when all variables on the data set will be used as elements and the variables are all the same type array sample_array {5} _ALL_;

  14. Number of Elements • N is the array subscript in the array definition and it refers to the number of elements within the array. • A numeric constant, • a variable whose value is a number, • a numeric SAS expression, • or an asterisk (*) • The subscript must be enclosed within • braces {}, • square brackets [], • or parentheses (). array sample_array {5} _ALL_;

  15. Number of Elements • When the asterisk is used, it is not necessary to know how many elements are contained within the array. SAS will count the number of elements for you. • An example of using the asterisk is when one of the special variables defines the elements. array allnums {*} _numeric_; • When it is necessary to know how many elements are in the array, the DIM function can be used to return the count of elements. do i = 1todim(allnums); allnums{i} = round(allnums{i},.1); end;

  16. ARRAY REFERENCES • When an array is defined with the ARRAY statement SAS creates an array reference. The array reference is in the following form: array-name{n} • The value of n will be the element’s position within the array. • For example, in the temperature array the temperature for 1:00 PM is in the variable TEMP13. • Therefore, the array reference will be: temperature_array{13}

  17. ARRAY REFERENCES • The variable name and the array reference are interchangeable. Variable NameArray Reference temp1 temperature_array{1} temp2 temperature_array{2} temp3 temperature_array{3} • An array reference may be used within the data step in almost any place other SAS variables may be used including as an argument to many SAS functions.

  18. USING ARRAY INDEXES • The array index is the range of array elements. • In SAS, subscripts are 1-based by default where arrays in other languages may be 0-based. • When we set the array bounds with the subscript and only specify the number of elements within the array as our upper bound, the lower bound is by default 1. • There may be scenarios when we want the index to begin at a lower bound other than 1. • Say we only want the temperatures for the daytime, temperatures 6 through 18. • arraytemperature_array {6:18} temp6 – temp18;

  19. Identify patterns across variables using arrays • The objective is to identify the number of missing values for each row. • Create a new variable named nmiss, which will be the number of missing values across variables faminc1 - faminc12 data mspatterns; set recode_missing; array inc(12) faminc1-faminc12; /* existing vars */ nmiss = 0; do i = 1to12; if inc(i) = .then nmiss = nmiss + 1; end; run;

  20. Reshaping DATA from Wide to Long data wide; input famid faminc96 faminc97 faminc98; cards; 1 40000 40500 41000 2 45000 45400 45800 3 75000 76000 77000 ; run;

  21. Reshaping wide to long:creating only one variable using arrays data long; set wide; array Afaminc(96:98) faminc96 - faminc98; do year = 96to98; faminc = Afaminc[year]; output; end; drop faminc96-faminc98; run;

  22. LONG DATA Obs famid year faminc 1 1 96 40000 2 1 97 40500 3 1 98 41000 4 2 96 45000 5 2 97 45400 6 2 98 45800 7 3 96 75000 8 3 97 76000 9 3 98 77000

  23. TEMPORARY ARRAYS • A temporary array is an array that only exists for the duration of the data step where it is defined. • Useful for storing constant values (for calculations). • No corresponding variables to identify the array elements. • The elements are defined by the key word _TEMPORARY_.

  24. TEMPORARY ARRAYS array rate {6} _temporary_ (0.05 0.08 0.12 0.20 0.27 0.35); • The asterisk subscript cannot be used when defining a temporary array and explicit array bounds must be specified for temporary arrays.

  25. TEMPORARY ARRAY • For example: when a customer is delinquent in payment of their account balance, a penalty is applied. The amount of the penalty depends upon the number of months that the account is delinquent. • Without array processing : if month_delinquent eq 1then balance= balance + (balance*0.05); elseif month_delinquent eq 2then balance= balance + (balance * 0.08); elseif month_delinquent eq 3then balance= balance + (balance * 0.12); elseif month_delinquent eq 4then balance= balance + (balance * 0.20); elseif month_delinquent eq 5then balance= balance + (balance * 0.27); elseif month_delinquent eq 6then balance= balance + (balance * 0.35);

  26. TEMPORARY ARRAY • Simplifies the code, and improves performance time. data ...; set...; array rate {6} _temporary_ (0.050.080.120.200.270.35); if month_delinquent ge 1 and month_delinquent le 6then balance = balance + (balance * rate{month_delinquent});

  27. TEMPORARY ARRAY • Setting initial values is not required on the ARRAY statement. The values within a temporary array may be set in another manner within the data step. array rateb {6} _temporary_; do i = 1to6; rateb{i} = i * 0.5; end;

  28. Implicit Arrays array clin q1b q2b q3b q4b q5b q6b q7b q8b q9b q10b q11b q12b q13b; DOOVER clin; if clin in (2,.) then clin=0; elseif clin > 2then clin=1; end; data baselinedemo; set baselinedemo; array zero male Indian Asian black white more hisp rhand; doover zero; if zero=.then zero=0; end; run;

  29. WHEN TO USE ARRAYS • It makes sense to use arrays when there are repetitive VARIABLES that are related WITHIN A SINGLE DATASTEP and the programmer needs to iterate though most of them. • The combination of arrays and do loops in the data step lend power to programming. • The variables in the array do not need to be related or contiguous

  30. COMMON ERRORS AND MISUNDERSTANDINGS • INVALID INDEX RANGE • FUNCTION NAME AS AN ARRAY NAME • ARRAY REFERENCED IN MULTIPLE DATA STEPS, BUT DEFINED IN ONLY ONE

  31. LIMITATIONS OF ARRAY STATEMENTS • Can only be used within a DATA Step (not a PROC). If want to do same action for several datasteps/procs, macro approach may be easier • SAS Array references cannot be used as: • As an input to a MACRO parameter • In a FORMAT, LABEL, DROP, KEEP, LENGTH or OUTPUT statement • SAS Arrays refer to Variables or Constants (not Datasets or the value of a variable)

  32. USE MACROS TO CREATE AN “ARRAY”

  33. Using MACROS To Define ARRAYS Within a Single Datastep data; inputetc. array temperature_array {24} temp1-temp24; array celsius_array {24} celsius_temp1-celsius_temp24; do i = 1to24; celsius_array{i} = 5/9(temperature_array{i} – 32); end; run; %macrow; data; input etc. %do i = 1%to24; celsius&i = 5/9(temp&i – 32); %end; run; %mend; %w;

  34. CREATE A MACRO-DEFINED "ARRAY" Using %LET x=ARRAY ELEMENTS (VARIABLES) %DO … %LET y= %SCAN MORE FLEXIBLE, -Switching from Datastep to Proc -Using Previously defined MACROS

  35. %LET and %SCAN • Creates a macro variable and assigns it a value • %LET can be used inside or outside of a macro program. CAN BE A STRING OF WORDS. Syntax %LET macro-variable =<value>; %SCAN Search for a word that is specified by its position in a string

  36. MACRO Arrays • Programs are reusable and easier to understand. • %byid is a MACRO I wrote which I use a lot. %let x=q1b q2b q3b q4b q5b q6b q7b q8b q9b q10b q11b q12b; %do i=1%to12; %let y=%scan(&x, &i); %byid(&y) %end; • Don’t forget to embed this code within %macro and %mend;

  37. ONE DIMENSION ARRAYS

  38. MULTI-DIMENSION ARRAYS • Multi-dimensional arrays may be created in two or more dimensions. • Conceptually, a two-dimensional array is a table with rows and columns (Although to SAS, it is still a group of variables) • Within the Program Data Vector the variable structure may be visualized as:

  39. MULTI-DIMENSION ARRAYS • The array statement to define this two-dimensional array will be: array sale_array {3, 12} sales1-sales12 exp1-exp12 comm1-comm12; • Number of elements indicates # of rows (1st dimension), and # of columns (2nd dimension). • Must reference the element number for both dimensions. • The reference to the sixth element for the expense group in the sales array is: sale_array{2,6} refers to EXP6 • Three and more dimensions can be defined as well.

  40. References • Steve First and Teresa Schudrowitz. Arrays Made Easy: An Introduction to Arrays and Array Processing. Paper 242-30. SUGI 30 http://www2.sas.com/proceedings/sugi30/242-30.pdf • Steve First and Teresa Schudrowitz, Systems Seminar Consultants, Inc., Madison, WI Introduction to SAS. UCLA: Academic Technology Services, Statistical Consulting Group. from http://www.ats.ucla.edu/stat/sas/notes2/. http://www.ats.ucla.edu/stat/sas/seminars/SAS_arrays/default_new.htm

More Related