An Introduction and Application of IML to Weighted ChiSquare Statistics

An Introduction and Application of IML toWeighted ChiSquare Statistics Lisa Price, Bruce Johnston Junming Yang, Dan DiPrimeo BASAS April 14, 2008

What is IML? • Interactive • Matrix • Language

What is IML? • A powerful, flexible programming language in a dynamic, interactive environment. • The fundamental object is a data matrix. • Use IML interactively (I=interactive!) to see results immediately, or store in a module. • Powerful: built-in operators to perform matrix operations. • Data management commands

What is IML?

Why IML? • Can do graphs and analyses that other SAS Modules don’t readily do – in programming statements that are easily translated from mathematics and statistics statements. • A powerful graphics package for scientific exploration • In SAS 9.2, a mechanism for submitting R statements in IML Workshop • Write a program in a language nobody else knows. Great job security.

Goal • To get you acquainted with IML via • a VERY brief intro • A real life example

Define Matrix A proc iml; reset print; /* send to the .lst file */ A = {135, 441, 226 }; /* Define A to be a 3 x 3 matrix */ A 3 rows 3 cols (numeric) 1 3 5 4 4 1 2 2 6

Define Matrix B B = {134, 352, 421 }; /* Define B to be a 3 x3 positive definite symmetric matrix */ B 3 rows 3 cols (numeric) 1 3 4 3 5 2 4 2 1

Define Matrix C C = {211, 346}; /* C is a 2 x 3 matrix */ C 2 rows 3 cols (numeric) 2 1 1 3 4 6

Define Matrix D D = {22, 45, 51}; /* D is a 3 x 2 matrix */ D 3 rows 2 cols (numeric) 2 2 4 5 5 1

Define Matrix X X = {110, 110, 110, 101, 101, 101}; /* X is a design matrix for ANOVA, 6 obs, 1 independent X, 2 levels of X */ X 6 rows 3 cols (numeric) 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1

Compute the Inverse Of A inverseA = inv(A); /* compute the inverse of A*/ INVERSEA 3 rows 3 cols (numeric) -0.5 0.1818182 0.3863636 0.5 0.0909091 -0.431818 0 -0.090909 0.1818182

Compute the Transpose Of C transposeC = t(C); /* or, C` */ /* compute the transpose of C */ TRANSPOSEC 3 rows 2 cols (numeric) 2 3 1 4 1 6

Binary Operation: A+B AplusB = A + B; /* Add 2 matrices, of same size */ APLUSB 3 rows 3 cols (numeric) 2 6 9 7 9 3 6 4 7

Binary Operation: A*B(Matrix Multiplicaton) AtimesB = A * B; /* Matrix multiplication */ ATIMESB 3 rows 3 cols (numeric) 30 28 15 20 34 25 32 28 18

Question: • I would like to multiply two matrices, term by term, rather than matrix multiplication. That is, • What will be the operator? • Answer: # • Compare with *, which we already saw

Reduction Operators Asumrows = A[,+]; /* Reduction operator, sum the rows */ ASUMROWS 3 rows 1 col (numeric) 9 9 10

Reduction Operators Asumcols = A[+,]; /* Reduction operator, sum the columns */ ASUMCOLS 1 row 3 cols (numeric) 7 9 12

Question: How to get column or row products? • prodC = C[op,] • What is op? • Answer: #

Question • Suppose we wanted to sum all the terms of the matrix A. How to do this? • Answer: sumA = sum(A); • Answer: sumA = A[+,][,+]; • Answer: sumA = A[,+][+,]; • Answer: sumA = A[+,+];

Catenating Matrices AnextB = A||B; /* Put 2 matrices side by side */ ANEXTB 3 rows 6 cols (numeric) 1 3 5 1 3 4 4 4 1 3 5 2 2 2 6 4 2 1

Catenating Matrices AtopB = A//B /* Put 2 matrices on top of each other */ ATOPB 6 rows 3 cols (numeric) 1 3 5 4 4 1 2 2 6 1 3 4 3 5 2 4 2 1

Diagonal Operators diagA = diag(A); /* Change non-diagonal elements of A to 0, keep diagonals as they are */ DIAGA 3 rows 3 cols (numeric) 1 0 0 0 4 0 0 0 6

Diagonal Operators vdiagA = vecdiag(A); /* Take the diagonal elements of A, put into a column matrix */ VDIAGA 3 rows 1 col (numeric) 1 4 6

Functions logA = log(A); /* Log of each term */ LOGA 3 rows 3 cols (numeric) 0 1.0986123 1.6094379 1.3862944 1.3862944 0 0.6931472 0.6931472 1.7917595

One more function ssgvdiagA = ssq(vdiagA); /* Square each element of vdiagA, sum them */ SSGVDIAGA 1 row 1 col (numeric) 53

Programming Statements • In addition to the functions and operators that we talked about earlier, we have programming statements: • If/Then • Do • Jumping (GOTO, LINK Statements) • Modules

Modules • Modules are similar to subroutines, or functions, that can be called anywhere in a program, and reused later.

Programming Statements, Modules start TransMatrix(D); Dcol = ncol(D); /* Number of columns in D */ Drow = nrow(D); /* Number of rows in D */ Dtranspose_temp = shape(.,Dcol, Drow); do i = 1 to Dcol; do j = 1 to Drow; Dtranspose_temp[i,j] = D[j,i]; /* transpose matrix D */ end; end; return (Dtranspose_temp); finish TransMatrix; Dtranspose = TransMatrix(X); print Dtranspose;

Programming Statements, Modules The previous slide illustrates • Programming Statements (In the form of Do Loop) • Modules (creating groups of statements that can be invoked anywhere in the program, i.e., a subroutine, and creating a separate environment local to the module).

Bring a dataset into IML Convert to a Matrix prociml; use Dset1; /* dataset to read data from*/

Bring a dataset into IML Convert to a Matrix read all var{x} into xobs; /* X variable into X matrix */ XOBS 10 rows 1 col (numeric) 550 200 280 340 410 160 380 510 510 475

A real-life example • Consider the following table, with x the number of responders, n the number of patients, and p the proportion (x/n):

A real life example • The problem is to test the equality of the proportions responding between the two treatment groups, controlling for stratum, and to generate point estimates and confidence intervals. • PROC FREQ, CMH provides pvalues for hypothesis testing, but not point estimates, CIs. • Test desired can be found in Mehrotra and Railkar, Statistics in Medicine, 2000. • How to proceed?

We need to define two matrices of the same size, X, N, where X is the number of patients who respond to treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I. N is the number of patients in treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I. We can read in the data or enter it; we show the matrices next page, but leave off the coding. Defining the X, N matrices

Defining the X, N matrices

We need to calculate 3 terms on the next page: The row sums of the matrix N (will give a K x 1 matrix) (sum within each stratum) Another matrix, phat, which is term by term each element of x divided by corresponding term of n. The percent of patients responding across each stratum Compute row sums, point estimates

ndot= n[,+]; phat= x/n; pbar= (n#phat)[,+]/ndot; Compute row sums, point estimates

Compute Weights • On the next page, we will compute the weights associated with each stratum. • These are the harmonic means, and each row contributes a weight proportional to the product of the terms in each column.

Compute Weights • wnum = n[,#]/ndot; • wden = wnum[+,]; • w = wnum/wden;

On the next page, we need to compute the proportion of patients in each treatment group (each column), responding to treatment; controlling for stratum (the weight calculated previously). Also, we compute the estimated variance of each estimate of the proportion Weighted Proportion of Responders (Each Group) and Variance

pwt = (w#phat)[+,]; varpwt = (w#w#phat#(1-phat)/n)[+,]; Weighted Proportion of Responders (Each Group) and Variance

Compute the weighted difference of proportions between two groups, and the estimated variance • We will compute on the next page the difference in proportions between the two treatment groups (column 2 minus column 1). • To do this, for each row, we calculate the unweighted difference between column 2 and column 1. Then, we multiply each difference by the weight associated with each row. • Then we sum these terms, and get the point estimate

Compute the weighted difference of proportions between two groups, and the estimated variance • deltahat = phat[,2] – phat[,1]; • deltawt = (w # deltahat) [+,];

To compute confidence intervals, we will need the estimated variance of the point estimates. The variance of this point estimate is provided on the next page; and confidence intervals following that page. Compute the weighted difference of proportions between two groups, and the estimated variance

vardeltai = (phat#(1-phat) / n ) [,+]; vardeltaw = (w # w # vardeltai ) [+,]; Compute the weighted difference of proportions between two groups, and the estimated variance

pwtcil = pwt – 1.96*sqrt(varpwt); pwtciu = pwt +1.96*sqrt(varpwt); delwcil = deltawt–1.96*sqrt(vardeltaw); delwciu = deltawt+ 1.96*sqrt(vardeltaw); Confidence intervals for proportion of each treatment group, and for the difference in proportions

The chi-square statistic on the next page is identical to that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n. That is, the numerator has a term of (ndot – 1) rather than ndot. p-values for testing the hypothesis

chictmp = ((n[,#] # deltahat) / ndot ) [+,]; chicnum = chictmp ** 2; chicden = (n[,#] # pbar # (1-pbar) / ndot ) [+,]; chic2 = chicnum/chicden; p-values for testing the hypothesis

X 3 rows 2 cols (numeric) 140 180 50 90 60 60 N 3 rows 2 cols (numeric) 450 440 170 200 200 180 proc iml; reset print; x = {140 180, 50 90, 60 60}; n = {450 440, 170 200, 200 180};

An Introduction and Application of IML to Weighted ChiSquare Statistics

An Introduction and Application of IML to Weighted ChiSquare Statistics

Presentation Transcript

Statistics: An Introduction

An Application of Statistics to the Three Stooges Films

Statistics: an introduction

Statistics 300: Introduction to Probability and Statistics

Official Statistics – an introduction and overview

An Introduction to Statistics and Research Design

An Introduction to Use, Application, Technology and Coding

An Application of Statistics to the Three Stooges Films…

Introduction to Diffusion-weighted Imaging

Statistics An Introduction

Statistics And Application

An Introduction to producing Official Statistics

Statistics: an introduction

An Introduction to Statistics

An Introduction to Statistics

An Introduction to Business Statistics

An Introduction to the Statistics of Uncertainty

An Introduction to Statistics

Introduction to Diffusion-weighted Imaging

Statistics 300: Introduction to Probability and Statistics

An Introduction to Statistics

Statistics 300: Introduction to Probability and Statistics