220 likes | 364 Views
Generating Correlated Random Variables. Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com. Why?. I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables. Previous Method.
E N D
Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com
Why? • I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables.
Previous Method Use Excel to fill down and then generate another column that was fairly correlated
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;
Generating Correlated Random Variables using Proc IML • To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML. • IML = Interactive Matrix Language
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Use is similar to set. Reading in the simulated data and the means
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Variance covariance matrix
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Applying Cholesky’s decompositon
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Concatenating the variables
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Correlated Variables
Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Outputting the variables
References • Generating Multivariate Normal Data by using Proc IML Lingling Han, University of Georgia, Athens, GA
Appendix • Correlation Coefficient =
R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2;
R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1 y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2
R Code - Generating Correlated Random Variables using Matrices C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2) cholc = chol(C) R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow = F) mean = matrix(c(mean1,mean2), nrow = 100, ncol = 2, byrow = T) RC = mean + R %*% cholc Use previous values of r1 and r2