Loading in 5 sec....

An Introduction and Application of IML to Weighted ChiSquare StatisticsPowerPoint Presentation

An Introduction and Application of IML to Weighted ChiSquare Statistics

- 97 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' An Introduction and Application of IML to Weighted ChiSquare Statistics' - germaine-jordon

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### An Introduction and Application of IML toWeighted ChiSquare Statistics

Lisa Price, Bruce Johnston

Junming Yang, Dan DiPrimeo

BASAS April 14, 2008

What is IML?

- Interactive
- Matrix
- Language

What is IML?

- A powerful, flexible programming language in a dynamic, interactive environment.
- The fundamental object is a data matrix.
- Use IML interactively (I=interactive!) to see results immediately, or store in a module.
- Powerful: built-in operators to perform matrix operations.
- Data management commands

Why IML?

- Can do graphs and analyses that other SAS Modules don’t readily do – in programming statements that are easily translated from mathematics and statistics statements.
- A powerful graphics package for scientific exploration
- In SAS 9.2, a mechanism for submitting R statements in IML Workshop
- Write a program in a language nobody else knows. Great job security.

Goal

- To get you acquainted with IML via
- a VERY brief intro
- A real life example

Define Matrix A

proc iml;

reset print; /* send to the .lst file */

A = {135, 441, 226 };

/* Define A to be a 3 x 3 matrix */

A 3 rows 3 cols (numeric)

1 3 5

4 4 1

2 2 6

Define Matrix B

B = {134, 352, 421 };

/* Define B to be a 3 x3 positive definite symmetric matrix */

B 3 rows 3 cols (numeric)

1 3 4

3 5 2

4 2 1

Define Matrix X

X = {110, 110, 110, 101, 101, 101};

/* X is a design matrix for ANOVA, 6 obs, 1 independent X, 2 levels of X */

X 6 rows 3 cols (numeric)

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

Compute the Inverse Of A

inverseA = inv(A);

/* compute the inverse of A*/

INVERSEA 3 rows 3 cols (numeric)

-0.5 0.1818182 0.3863636

0.5 0.0909091 -0.431818

0 -0.090909 0.1818182

Compute the Transpose Of C

transposeC = t(C); /* or, C` */

/* compute the transpose of C */

TRANSPOSEC 3 rows 2 cols (numeric)

2 3

1 4

1 6

Binary Operation: A+B

AplusB = A + B;

/* Add 2 matrices, of same size */

APLUSB 3 rows 3 cols (numeric)

2 6 9

7 9 3

6 4 7

Binary Operation: A*B(Matrix Multiplicaton)

AtimesB = A * B;

/* Matrix multiplication */

ATIMESB 3 rows 3 cols (numeric)

30 28 15

20 34 25

32 28 18

Question:

- I would like to multiply two matrices, term by term, rather than matrix multiplication. That is,
- What will be the operator?
- Answer: #
- Compare with *, which we already saw

Reduction Operators

Asumrows = A[,+];

/* Reduction operator, sum the rows */

ASUMROWS 3 rows 1 col (numeric)

9

9

10

Reduction Operators

Asumcols = A[+,];

/* Reduction operator, sum the columns */

ASUMCOLS 1 row 3 cols (numeric)

7 9 12

Question: How to get column or row products?

- prodC = C[op,]
- What is op?
- Answer: #

Question

- Suppose we wanted to sum all the terms of the matrix A. How to do this?
- Answer: sumA = sum(A);
- Answer: sumA = A[+,][,+];
- Answer: sumA = A[,+][+,];
- Answer: sumA = A[+,+];

Catenating Matrices

AnextB = A||B;

/* Put 2 matrices side by side */

ANEXTB 3 rows 6 cols (numeric)

1 3 5 1 3 4

4 4 1 3 5 2

2 2 6 4 2 1

Catenating Matrices

AtopB = A//B

/* Put 2 matrices on top of each other */

ATOPB 6 rows 3 cols (numeric)

1 3 5

4 4 1

2 2 6

1 3 4

3 5 2

4 2 1

Diagonal Operators

diagA = diag(A);

/* Change non-diagonal elements of A to 0, keep diagonals as they are */

DIAGA 3 rows 3 cols (numeric)

1 0 0

0 4 0

0 0 6

Diagonal Operators

vdiagA = vecdiag(A);

/* Take the diagonal elements of A, put into a column matrix */

VDIAGA 3 rows 1 col (numeric)

1

4

6

Functions

logA = log(A);

/* Log of each term */

LOGA 3 rows 3 cols (numeric)

0 1.0986123 1.6094379

1.3862944 1.3862944 0

0.6931472 0.6931472 1.7917595

One more function

ssgvdiagA = ssq(vdiagA);

/* Square each element of vdiagA, sum them */

SSGVDIAGA 1 row 1 col (numeric)

53

Programming Statements

- In addition to the functions and operators that we talked about earlier, we have programming statements:
- If/Then
- Do
- Jumping (GOTO, LINK Statements)
- Modules

Modules

- Modules are similar to subroutines, or functions, that can be called anywhere in a program, and reused later.

Programming Statements, Modules

start TransMatrix(D);

Dcol = ncol(D); /* Number of columns in D */

Drow = nrow(D); /* Number of rows in D */

Dtranspose_temp = shape(.,Dcol, Drow);

do i = 1 to Dcol;

do j = 1 to Drow;

Dtranspose_temp[i,j] = D[j,i]; /* transpose matrix D */

end;

end;

return (Dtranspose_temp);

finish TransMatrix;

Dtranspose = TransMatrix(X);

print Dtranspose;

Programming Statements, Modules

The previous slide illustrates

- Programming Statements (In the form of Do Loop)
- Modules (creating groups of statements that can be invoked anywhere in the program, i.e., a subroutine, and creating a separate environment local to the module).

Bring a dataset into IML Convert to a Matrix

read all var{x} into xobs;

/* X variable into X matrix */

XOBS 10 rows 1 col (numeric)

550

200

280

340

410

160

380

510

510

475

A real-life example

- Consider the following table, with x the number of responders, n the number of patients, and p the proportion (x/n):

A real life example

- The problem is to test the equality of the proportions responding between the two treatment groups, controlling for stratum, and to generate point estimates and confidence intervals.
- PROC FREQ, CMH provides pvalues for hypothesis testing, but not point estimates, CIs.
- Test desired can be found in Mehrotra and Railkar, Statistics in Medicine, 2000.
- How to proceed?

We need to define two matrices of the same size, X, N, where

X is the number of patients who respond to treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I.

N is the number of patients in treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I.

We can read in the data or enter it; we show the matrices next page, but leave off the coding.

Defining the X, N matricesWe need to calculate 3 terms on the next page:

The row sums of the matrix N (will give a K x 1 matrix) (sum within each stratum)

Another matrix, phat, which is term by term each element of x divided by corresponding term of n.

The percent of patients responding across each stratum

Compute row sums, point estimatesCompute Weights

- On the next page, we will compute the weights associated with each stratum.
- These are the harmonic means, and each row contributes a weight proportional to the product of the terms in each column.

Compute Weights

- wnum = n[,#]/ndot;
- wden = wnum[+,];
- w = wnum/wden;

On the next page, we need to compute the proportion of patients in each treatment group (each column), responding to treatment; controlling for stratum (the weight calculated previously).

Also, we compute the estimated variance of each estimate of the proportion

Weighted Proportion of Responders (Each Group) and Variancepwt = (w#phat)[+,]; patients in each treatment group (each column), responding to treatment; controlling for stratum (the weight calculated previously).

varpwt = (w#w#phat#(1-phat)/n)[+,];

Weighted Proportion of Responders (Each Group) and VarianceCompute the weighted difference of proportions between two groups, and the estimated variance

- We will compute on the next page the difference in proportions between the two treatment groups (column 2 minus column 1).
- To do this, for each row, we calculate the unweighted difference between column 2 and column 1. Then, we multiply each difference by the weight associated with each row.
- Then we sum these terms, and get the point estimate

Compute the weighted difference of proportions between two groups, and the estimated variance

- deltahat = phat[,2] – phat[,1];
- deltawt = (w # deltahat) [+,];

To compute confidence intervals, we will need the estimated variance of the point estimates.

The variance of this point estimate is provided on the next page; and confidence intervals following that page.

Compute the weighted difference of proportions between two groups, and the estimated variancevardeltai = variance of the point estimates.

(phat#(1-phat) / n ) [,+];

vardeltaw =

(w # w # vardeltai ) [+,];

Compute the weighted difference of proportions between two groups, and the estimated variancepwtcil = variance of the point estimates.

pwt – 1.96*sqrt(varpwt);

pwtciu =

pwt +1.96*sqrt(varpwt);

delwcil =

deltawt–1.96*sqrt(vardeltaw);

delwciu =

deltawt+ 1.96*sqrt(vardeltaw);

Confidence intervals for proportion of each treatment group, and for the difference in proportionsThe chi-square statistic on the next page is identical to that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

That is, the numerator has a term of (ndot – 1) rather than ndot.

p-values for testing the hypothesischictmp = ((n[,#] # deltahat) / ndot ) [+,]; that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

chicnum = chictmp ** 2;

chicden = (n[,#] # pbar # (1-pbar) / ndot ) [+,];

chic2 = chicnum/chicden;

p-values for testing the hypothesisX 3 rows 2 cols (numeric) that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

140 180

50 90

60 60

N 3 rows 2 cols (numeric)

450 440

170 200

200 180

proc iml;

reset print;

x = {140 180, 50 90, 60 60};

n = {450 440, 170 200, 200 180};

/* ndot: sum the rows of the n matrix */ that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

ndot = n[,+];

NDOT 3 rows 1 col (numeric)

890

370

380

/* phat: point estimates, proportion of responders each cell */

phat = x/n;

PHAT 3 rows 2 cols (numeric)

0.3111111 0.4090909

0.2941176 0.45

0.3 0.3333333

/* pbar: proportion of responders in each row, weighted by size */

pbar = (n#phat)[,+]/ndot;

/* difference in proportions from column 2 (group 1) to column 1 (group 0) */

delta = phat[,2] - phat[,1];

PBAR 3 rows 1 col (numeric)

0.3595506

0.3783784

0.3157895

DELTA 3 rows 1 col (numeric)

0.0979798

0.1558824

0.0333333

/* Compute weights. */ size */

/* Wnum is, for each row, the product of the columns by the sum of cols */

/* Wden is the sum of the column vector */

/* w is normalized, which will sum to 1, each row of wnum divided by wden */

wnum = n[,#]/ndot;

wden = wnum[+,];

w = wnum/wden;

WNUM 3 rows 1 col (numeric)

222.47191

91.891892

94.736842

WDEN 1 row 1 col (numeric)

409.10064

W 3 rows 1 col (numeric)

0.5438073

0.2246193

0.2315734

/* PWT is weight (a scalar) * phat. then we sum over the rows, to get column proportions */

/* VARPWT is the corresponding variance of pwt */

pwt = (w#phat)[+,];

varpwt = (w#w#phat#(1-phat)/n)[+,];

PWT 1 row 2 cols (numeric)

0.304721 0.4007364

VARPWT 1 row 2 cols (numeric)

0.0002588 0.0002911

/* delta is a 3 x 1, the differences in each row. multiply by w, a 3 x 1. deltawt is the sum */

/* vardelta is the variance of the difference of each row */

/* vardelw is the variance of the weighted difference */

deltawt = (w#delta)[+,];

vardelta= (phat#(1-phat)/n)[,+];

vardelw = (w#w#vardelta)[+,];

DELTAWT 1 row 1 col (numeric)

0.0960154

VARDELTA 3 rows 1 col (numeric)

0.0010257

0.0024587

0.0022846

VARDELW 1 row 1 col (numeric)

0.0005499

/* Lower Confidence Limits for each of the column proportions */

pwtctil = pwt - 1.96*sqrt(varpwt);

/* Upper Confidence Limits for each of the column proportions */

pwtctiu = pwt + 1.96*sqrt(varpwt);

PWTCTIL 1 row 2 cols (numeric)

0.2731918 0.3672948

PWTCTIU 1 row 2 cols (numeric)

0.3362502 0.4341781

/* Lower, Upper Limits for the DeltaWT, the difference between the two groups */

delwcil = deltawt - 1.96*sqrt(vardelw);

delwciu = deltawt + 1.96*sqrt(vardelw);

DELWCIL 1 row 1 col (numeric)

0.0500542

DELWCIU 1 row 1 col (numeric)

0.1419766

/* ChiSquare Computations */ between the two groups */

chictmp = (n[,#]#delta/ndot)[+,];

chicnum = chictmp**2;

chicden = (n[,#]#pbar#(1-pbar)/ndot)[+,];

chic2 = chicnum/chicden;

pchic2 = 1-probchi(chic2,1);

CHICTMP 1 row 1 col (numeric)

39.279972

CHICNUM 1 row 1 col (numeric)

1542.9162

CHICDEN 1 row 1 col (numeric)

93.312668

CHIC2 1 row 1 col (numeric)

16.534906

PCHIC2 1 row 1 col (numeric)

0.0000478

Other Uses between the two groups */

- Graphics.
- IML Studio can submit FORTRAN, C++, and R statements
- Formerly Stat Studio
- Which was formerly IML Workshop

- Formerly Stat Studio

For more reference between the two groups */

- See the SAS/IML Users Guide, or Online Documentation
- Mehrotra, D., and Railkar, R. Minimum risk weights for comparing treatments in stratified binomial trials. Statistics in Medicine 2000; 19: 811-825.
- Graybill, F. Theory and Application of the General Linear Model. Wadsworth & Brooks: Pacific Grove, 1976

Summary between the two groups */

- IML is a powerful tool for data analysis, statistics, and graphics.
- Consider instead of datastep programming if the opportunity presents itself.
- Not always the best alternative, but helps inform the decision making.

Questions? between the two groups */

Download Presentation

Connecting to Server..