1 / 24

Time Series Data Processes

Time Series Data Processes. by Tai Yu April 15, 2013. WHAT ARE YOU DOING???. 61 %macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5, mn_yr6); 62 63 data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4.

danyl
Download Presentation

Time Series Data Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Data Processes by Tai Yu April 15, 2013

  2. WHAT ARE YOU DOING??? • 61%macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5, mn_yr6); • 62 • 63data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. • asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1.; • 64 set wfpf_delinq_data; • 65 • 66 if daysdelq<1 THEN DPD=0; • else if daysdelq>180 THEN DPD=180; else DPD=daysdelq; • 67 • 68 if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01&mn_yr0.:00:00:00"dt • then do;dpd_1=dpd; output asof_&mn_yr1.; end; • 69 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt • then do; dpd_2=dpd; output asof_&mn_yr2.; end; • 70 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt • then do; dpd_3=dpd; outputasof_&mn_yr3.; end; • 71 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt • then do; dpd_4=dpd; output asof_&mn_yr4.; end; • 72 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt • then do; dpd_5=dpd; output asof_&mn_yr5.; end; • 73 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt • then do; dpd_6=dpd; output asof_&mn_yr6.; end; • 74 • 75run;

  3. WHAT ARE YOU DOING??? • 84data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_6; run; • 85data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_6; run; • 86data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_6; run; • 87data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_6; run; • 88data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6 ; run; • 89data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 ; run; • 90 • 91data Cohort_12mn_dpd_&mn_yr1.; • 92 merge _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. • _asof_&mn_yr2. _asof_&mn_yr1. (in=a); • 93 by acct_id; • 94 if a; • 95 run; • 96 • 97 %mend; • 98 • 99 get_data(Sep2006, • Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006,);

  4. WHAT ARE YOU DOING??? • 61%macro get_data(mn_yr0 ,mn_yr1 ,mn_yr2 ,mn_yr3 ,mn_yr4 ,mn_yr5 , mn_yr6 , • 62 mn_yr6, mn_yr7 ,mn_yr8 ,mn_yr9 ,mn_yr10,mn_yr11 , mn_yr12); • 63 data asof_&mn_yr12. asof_&mn_yr11. asof_&mn_yr10. asof_&mn_yr9. asof_&mn_yr8. asof_&mn_yr7. • 64 asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1. ; • 65 set wfpf_delinq_data; • 66 • 67 if daysdelq<1 THEN DPD=0; else if daysdelq>180 THEN DPD=180; else DPD=daysdelq; • 68 • 69 if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01& asof_&mn_yr0. :00:00:00"dt • then do;dpd_1=dpd; output asof_&mn_yr1.; end; • 70 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt • then do; dpd_2=dpd; output asof_&mn_yr2.; end; • 71 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt • then do; dpd_3=dpd; outputasof_&mn_yr3.; end; • 72 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt • then do; dpd_4=dpd; output asof_&mn_yr4.; end; • 73 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt • then do; dpd_5=dpd; output asof_&mn_yr5.; end; • 74 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt • then do; dpd_6=dpd; output asof_&mn_yr6.; end; • 75 else if "28&mn_yr7.:00:00:00"dt<=ASOF_DT<"01&mn_yr6.:00:00:00"dt • then do; dpd_7=dpd; output asof_&mn_yr7.; end; • 76 else if "28&mn_yr8.:00:00:00"dt<=ASOF_DT<"01&mn_yr7.:00:00:00"dt • then do; dpd_8=dpd; output asof_&mn_yr8.; end; • 77 else if "28&mn_yr9.:00:00:00"dt<=ASOF_DT<"01&mn_yr8.:00:00:00"dt • then do; dpd_9=dpd; output asof_&mn_yr9.; end; • 78 else if "28&mn_yr10.:00:00:00"dt<=ASOF_DT<"01&mn_yr9.:00:00:00"dt • then do; dpd_10=dpd; output asof_&mn_yr10.; end; • 79 else if "28&mn_yr11.:00:00:00"dt<=ASOF_DT<"01&mn_yr10.:00:00:00"dt • then do;dpd_11=dpd; output asof_&mn_yr11.;end; • 80 else if "28&mn_yr12.:00:00:00"dt<=ASOF_DT<"01&mn_yr11.:00:00:00"dt • then do;dpd_12=dpd; output asof_&mn_yr12.; end; • 81run;

  5. WHAT ARE YOU DOING??? • 82 • 83data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_12; run; • 84data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_12; run; • 85data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_12; run; • 86data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_12; run; • 87data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6--dpd_12; run; • 88 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 dpd_7--dpd_12; run; • 89data _asof_&mn_yr7.; set asof_&mn_yr7.; drop dpd_1--dpd_6 dpd_8--dpd_12; run; • 90data _asof_&mn_yr8.; set asof_&mn_yr8.; drop dpd_1--dpd_7 dpd_9--dpd_12; run; • 91data _asof_&mn_yr9.; set asof_&mn_yr9.; drop dpd_1--dpd_8 dpd_10--dpd_12; run; • 92data _asof_&mn_yr10.; set asof_&mn_yr10.; drop dpd_1--dpd_9 dpd_11--dpd_12; run; • 93data _asof_&mn_yr11.; set asof_&mn_yr11.; drop dpd_1--dpd_10 dpd_12; run; • 94data _asof_&mn_yr12.; set asof_&mn_yr12.; drop dpd_1--dpd_11 ; run; • 95 • 96data Cohort_12mn_dpd_&mn_yr1.; • 97merge _asof_&mn_yr12. _asof_&mn_yr11. _asof_&mn_yr10. _asof_&mn_yr9. _asof_&mn_yr8. • 98 _asof_&mn_yr7. _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. _asof_&mn_yr2. • 99 _asof_&mn_yr1.(in=a); • 100by acct_id; • 101if a; • 102 run; • 103 • 104 %mend; • 105 • 106 get_data(Sep2006, • Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006, • Feb2006,Jan2006,Dec2005,Nov2005,Oct2005,Sep2005);

  6. What is Time Series Data? • Definition of Time Series: • A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. (by Australian Bureau of Statistics) • An ordered sequence of values of a variable at equally spaced time intervals. (by Engineering Statistics Handbook)

  7. What is Time Series Data? • For example, the monthly delinquent status of a customer over 12-month period

  8. Stages of Time Series Analysis • Analyze data to obtain an understanding of the underlying drivers that produced the collected data. 2. Develop model(s) to forecast possible outcomes through the collected data. 3. Compare monitoring results with predicted outcomes to make appropriate control process modification(s).

  9. Applications of Time Series Data Stock Market Inventory Sales Workload

  10. A Typical Time Series Data Process Transpose 12 monthly delinquent status observations of account to a single account observation with 12-month delinquent status

  11. Time Series Data by SAS Procedure • 001 %macro DLQ_12_Month(perf_obs_date); • 002 • 003 data perf_12_month; • 004 set acct_basel_dpd; • 005 • 006 format perf_obs_dt date9.; • 007 • 008 perf_obs_dt = intnx("MONTH","&perf_obs_date."d,0,'END'); • 009 • 010 dlq_status_month = intck("MONTH", datepart(asof_dt)), perf_obs_dt ) + 1; • 011 • 012 if 1 <= dlq_status_month <= 12; • 013 run; • 014 • 015 proc sort data = perf_12_month; • 016 by acct_id perf_obs_dt dlq_status_month; • 017 run; • 018

  12. Time Series Data by SAS Procedure After executing line 003 to line 018, the data set is now with two additional variables: Performance Observed Date (perf_obs_dt) Delinquent Status Month (dlq_status_month)

  13. Time Series Data by SAS Procedure • 018 • 019 proc transpose data = perf_12_month • 020 out = Cohort_12mn_dpd_&perf_obs_date • 021 (drop = _name_ where=(dpd1 ^=.)) • 022 prefix = dpd • 023 ; • 024 by acct_id perf_obs_dt; • 025 • 026 id dlq_status_month; • 027 • 028 var basel_dpd; • 029 run; • 030 • 031 %mend; • 032 • 033 %DLQ_12_Month(01AUG2006); PROC TRANSPOSE: • Transposes the variable basel_dpd by acct_id perf_obs_dt • Creates new variables dpd_1 to dpd_12 by PREFIX and ID options

  14. Time Series Data by SAS Procedure After executing line 019 to line 029, the new data set is a single account observation with 12-month delinquent status

  15. Time Series Data by SAS Data Step • 019 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id perf_obs_dt dpd1 - dpd12); • 020 set perf_12_month; • 021 by acct_id perf_obs_dt dlq_status_month; • 022 array dpd[12] dpd1 - dpd12; • 023 retain dpd1 - dpd12 j 0; • 024 • 025 if first.acct_id and first.perf_obs_dt • 026 then do; • 027 do i = 1 to 12; • 028 dpd[i] = 0; • 029 end; • 030 j = 0; • 031 end; • 032 • 033 j = j + 1; • 034 • 035 dpd[j] = basel_dpd; • 036 • 037 if last.acct_id and last.perf_obs_dt ; • 038 run; • 039 %mend; • 040 • 041 %DLQ_12_Month(01AUG2006);

  16. Time Series Data by SAS Data Step SAS Data Step: • Declares ARRAY DPD to create new variable DPD_1 to DPD_12 • Declares RETAIN to pass the values of variable DPD_1 to DPD_12 from one observation to the next observation • Initiates the values of variable DPD_1 to DPD_12 to 0s when the first account id and the first observation date are detected and neutralizes index J • Assigns the values of variable DPD_1 to DPD_12 by index J • Outputs the values of variable DPD_1 to DPD_12 to new data set only when the last account id and the last observation date is detected

  17. Time Series Data by SAS Function • 019 data Cohort_12mn_dpd_&perf_obs_date • 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); • 021 set perf_12_month; • 022 by acct_id perf_obs_dt dlq_status_month; • 023 array dpd[12] dpd1 - dpd12;; • 024 • 025 %do i = 1 %to 12; • 026 %let j = %eval(12 - &i); • 027 dpd[&i] = lag&j(basel_dpd); • 028 %end; • 029 • 030 if dlq_status_month = 12; • 031 run; • 032 • 033 %mend; • 034 • 035 %DLQ_12_Month(01AUG2006);

  18. Time Series Data by SAS Function • SAS LAG Function: • Stores a value in a queue and returns a value stored previously in that queue. • Each occurrence of a LAGn function in a program generates its own queue of values. • When an occurrence of LAGn is executed, the value at the top of its queue is removed and returned, the remaining values are shifted upwards, and the new value of the argument is placed at the bottom of the queue. observation of the prior execution. LAG11 LAG10 LAG9 LAG8 LAG7 LAG6 LAG5 LAG4 LAG3 LAG2 LAG1 LAG0

  19. Time Series Data by SAS Function • 019 data Cohort_12mn_dpd_&perf_obs_date • 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); • 021 set perf_12_month; • 022 by acct_id perf_obs_dt dlq_status_month; • 023 array dpd[12] dpd1 - dpd12;; • 024 • 025 if dlq_status_month = 12; • 026 • 027 %do i = 1 %to 12; • 028 %let j = %eval(12 - &i); • 029 dpd[&i] = lag&j(basel_dpd); • 030 %end; • 031 • 032 * if dlq_status_month = 12; • 033 run; • 034 • 035 %mend; • 036 • 037 %DLQ_12_Month(01AUG2006);

  20. Time Series Data by SAS Function SAS LAG Function: • The queue for each occurrence of LAGn is initialized with n missing values. • Missing values are returned for the first n executions of each occurrence of LAGn, after which the lagged values of the argument begin to appear. • Stores values at the bottom of the queue and returns values from the top of the queue occurs only when the function is executed. An occurrence of the LAGn function that is executed conditionally will store and return values only from the observations for which the condition is satisfied.

  21. Time Series Data by SAS Function Special Case: When not all time series is populated. The sub-setting IF statement (if dlq_status_month = 12;) will never be satisfied. SAS returns no observation to the output dataset.

  22. Time Series Data by SAS Function • 019 proc sort data = perf_12_month; • 020 by acct_id load_dt descending dlq_status_month; • 021 run; • 022 • 023 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id load_dt dpd1 - dpd12); • 024 set perf_12_month; • 025 by acct_id load_dt descending dlq_status_month; • 026 • 027 array dpd[12] dpd1 - dpd12;; • 028 • 029 %do i = 1 %to 12; • 030 %let j = %eval(&i - 1); • 031 dpd[&i] = lag&j(basel_dpd); • 032 %end; • 033 • 034 if dlq_status_month = 1; • 035 run; • 036 %mend; • 037 • 038 %DLQ_12_Month(01AUG2006);

  23. Time Series Data by SAS Function Special Case: When not all time series is populated. By • Sorting the variable dlq_status_month in descending order • Conditioning the sub-setting IF statement is “True” when the beginning of the time series date period (if dlq_status_month = 1) is reached.

  24. Weakness and Strength of Each Approach Approach StrengthWeakness =================================================== PROC TRANSPOSE DATA STEP LAG FUNCTION Easy Coding Limited Variable Flexible Manipulating Initialization Self Explanatory Conditional Execution

More Related