An introduction and application of iml to weighted chisquare statistics
Download
1 / 62

An Introduction and Application of IML to Weighted ChiSquare Statistics - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

An Introduction and Application of IML to Weighted ChiSquare Statistics. Lisa Price, Bruce Johnston Junming Yang, Dan DiPrimeo BASAS April 14, 2008. What is IML?. Interactive Matrix Language. What is IML?. A powerful, flexible programming language in a dynamic, interactive environment.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Introduction and Application of IML to Weighted ChiSquare Statistics' - germaine-jordon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An introduction and application of iml to weighted chisquare statistics

An Introduction and Application of IML toWeighted ChiSquare Statistics

Lisa Price, Bruce Johnston

Junming Yang, Dan DiPrimeo

BASAS April 14, 2008


What is iml
What is IML?

  • Interactive

  • Matrix

  • Language


What is iml1
What is IML?

  • A powerful, flexible programming language in a dynamic, interactive environment.

  • The fundamental object is a data matrix.

  • Use IML interactively (I=interactive!) to see results immediately, or store in a module.

  • Powerful: built-in operators to perform matrix operations.

  • Data management commands



Why iml
Why IML?

  • Can do graphs and analyses that other SAS Modules don’t readily do – in programming statements that are easily translated from mathematics and statistics statements.

  • A powerful graphics package for scientific exploration

  • In SAS 9.2, a mechanism for submitting R statements in IML Workshop

  • Write a program in a language nobody else knows. Great job security.


Goal

  • To get you acquainted with IML via

    • a VERY brief intro

    • A real life example


Define matrix a
Define Matrix A

proc iml;

reset print; /* send to the .lst file */

A = {135, 441, 226 };

/* Define A to be a 3 x 3 matrix */

A 3 rows 3 cols (numeric)

1 3 5

4 4 1

2 2 6


Define matrix b
Define Matrix B

B = {134, 352, 421 };

/* Define B to be a 3 x3 positive definite symmetric matrix */

B 3 rows 3 cols (numeric)

1 3 4

3 5 2

4 2 1


Define matrix c
Define Matrix C

C = {211, 346};

/* C is a 2 x 3 matrix */

C 2 rows 3 cols (numeric)

2 1 1

3 4 6


Define matrix d
Define Matrix D

D = {22, 45, 51};

/* D is a 3 x 2 matrix */

D 3 rows 2 cols (numeric)

2 2

4 5

5 1


Define matrix x
Define Matrix X

X = {110, 110, 110, 101, 101, 101};

/* X is a design matrix for ANOVA, 6 obs, 1 independent X, 2 levels of X */

X 6 rows 3 cols (numeric)

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1


Compute the inverse of a
Compute the Inverse Of A

inverseA = inv(A);

/* compute the inverse of A*/

INVERSEA 3 rows 3 cols (numeric)

-0.5 0.1818182 0.3863636

0.5 0.0909091 -0.431818

0 -0.090909 0.1818182


Compute the transpose of c
Compute the Transpose Of C

transposeC = t(C); /* or, C` */

/* compute the transpose of C */

TRANSPOSEC 3 rows 2 cols (numeric)

2 3

1 4

1 6


Binary operation a b
Binary Operation: A+B

AplusB = A + B;

/* Add 2 matrices, of same size */

APLUSB 3 rows 3 cols (numeric)

2 6 9

7 9 3

6 4 7


Binary operation a b matrix multiplicaton
Binary Operation: A*B(Matrix Multiplicaton)

AtimesB = A * B;

/* Matrix multiplication */

ATIMESB 3 rows 3 cols (numeric)

30 28 15

20 34 25

32 28 18


Question
Question:

  • I would like to multiply two matrices, term by term, rather than matrix multiplication. That is,

  • What will be the operator?

  • Answer: #

  • Compare with *, which we already saw


Reduction operators
Reduction Operators

Asumrows = A[,+];

/* Reduction operator, sum the rows */

ASUMROWS 3 rows 1 col (numeric)

9

9

10


Reduction operators1
Reduction Operators

Asumcols = A[+,];

/* Reduction operator, sum the columns */

ASUMCOLS 1 row 3 cols (numeric)

7 9 12


Question how to get column or row products
Question: How to get column or row products?

  • prodC = C[op,]

  • What is op?

  • Answer: #


Question1
Question

  • Suppose we wanted to sum all the terms of the matrix A. How to do this?

  • Answer: sumA = sum(A);

  • Answer: sumA = A[+,][,+];

  • Answer: sumA = A[,+][+,];

  • Answer: sumA = A[+,+];


Catenating matrices
Catenating Matrices

AnextB = A||B;

/* Put 2 matrices side by side */

ANEXTB 3 rows 6 cols (numeric)

1 3 5 1 3 4

4 4 1 3 5 2

2 2 6 4 2 1


Catenating matrices1
Catenating Matrices

AtopB = A//B

/* Put 2 matrices on top of each other */

ATOPB 6 rows 3 cols (numeric)

1 3 5

4 4 1

2 2 6

1 3 4

3 5 2

4 2 1


Diagonal operators
Diagonal Operators

diagA = diag(A);

/* Change non-diagonal elements of A to 0, keep diagonals as they are */

DIAGA 3 rows 3 cols (numeric)

1 0 0

0 4 0

0 0 6


Diagonal operators1
Diagonal Operators

vdiagA = vecdiag(A);

/* Take the diagonal elements of A, put into a column matrix */

VDIAGA 3 rows 1 col (numeric)

1

4

6


Functions
Functions

logA = log(A);

/* Log of each term */

LOGA 3 rows 3 cols (numeric)

0 1.0986123 1.6094379

1.3862944 1.3862944 0

0.6931472 0.6931472 1.7917595


One more function
One more function

ssgvdiagA = ssq(vdiagA);

/* Square each element of vdiagA, sum them */

SSGVDIAGA 1 row 1 col (numeric)

53


Programming statements
Programming Statements

  • In addition to the functions and operators that we talked about earlier, we have programming statements:

  • If/Then

  • Do

  • Jumping (GOTO, LINK Statements)

  • Modules


Modules
Modules

  • Modules are similar to subroutines, or functions, that can be called anywhere in a program, and reused later.


Programming statements modules
Programming Statements, Modules

start TransMatrix(D);

Dcol = ncol(D); /* Number of columns in D */

Drow = nrow(D); /* Number of rows in D */

Dtranspose_temp = shape(.,Dcol, Drow);

do i = 1 to Dcol;

do j = 1 to Drow;

Dtranspose_temp[i,j] = D[j,i]; /* transpose matrix D */

end;

end;

return (Dtranspose_temp);

finish TransMatrix;

Dtranspose = TransMatrix(X);

print Dtranspose;


Programming statements modules1
Programming Statements, Modules

The previous slide illustrates

  • Programming Statements (In the form of Do Loop)

  • Modules (creating groups of statements that can be invoked anywhere in the program, i.e., a subroutine, and creating a separate environment local to the module).


Bring a dataset into iml convert to a matrix
Bring a dataset into IML Convert to a Matrix

prociml;

use Dset1; /* dataset to read data from*/


Bring a dataset into iml convert to a matrix1
Bring a dataset into IML Convert to a Matrix

read all var{x} into xobs;

/* X variable into X matrix */

XOBS 10 rows 1 col (numeric)

550

200

280

340

410

160

380

510

510

475


A real life example
A real-life example

  • Consider the following table, with x the number of responders, n the number of patients, and p the proportion (x/n):


A real life example1
A real life example

  • The problem is to test the equality of the proportions responding between the two treatment groups, controlling for stratum, and to generate point estimates and confidence intervals.

  • PROC FREQ, CMH provides pvalues for hypothesis testing, but not point estimates, CIs.

  • Test desired can be found in Mehrotra and Railkar, Statistics in Medicine, 2000.

  • How to proceed?


Defining the x n matrices

We need to define two matrices of the same size, X, N, where

X is the number of patients who respond to treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I.

N is the number of patients in treatment in group J, j = 1 (1st column) or j = 2 (2nd column), in stratum I.

We can read in the data or enter it; we show the matrices next page, but leave off the coding.

Defining the X, N matrices



Compute row sums point estimates

We need to calculate 3 terms on the next page:

The row sums of the matrix N (will give a K x 1 matrix) (sum within each stratum)

Another matrix, phat, which is term by term each element of x divided by corresponding term of n.

The percent of patients responding across each stratum

Compute row sums, point estimates


Compute row sums point estimates1

ndot= n[,+];

phat= x/n;

pbar= (n#phat)[,+]/ndot;

Compute row sums, point estimates


Compute weights
Compute Weights

  • On the next page, we will compute the weights associated with each stratum.

  • These are the harmonic means, and each row contributes a weight proportional to the product of the terms in each column.


Compute weights1
Compute Weights

  • wnum = n[,#]/ndot;

  • wden = wnum[+,];

  • w = wnum/wden;


Weighted proportion of responders each group and variance

On the next page, we need to compute the proportion of patients in each treatment group (each column), responding to treatment; controlling for stratum (the weight calculated previously).

Also, we compute the estimated variance of each estimate of the proportion

Weighted Proportion of Responders (Each Group) and Variance


Weighted proportion of responders each group and variance1

pwt = (w#phat)[+,]; patients in each treatment group (each column), responding to treatment; controlling for stratum (the weight calculated previously).

varpwt = (w#w#phat#(1-phat)/n)[+,];

Weighted Proportion of Responders (Each Group) and Variance


Compute the weighted difference of proportions between two groups and the estimated variance
Compute the weighted difference of proportions between two groups, and the estimated variance

  • We will compute on the next page the difference in proportions between the two treatment groups (column 2 minus column 1).

  • To do this, for each row, we calculate the unweighted difference between column 2 and column 1. Then, we multiply each difference by the weight associated with each row.

  • Then we sum these terms, and get the point estimate


Compute the weighted difference of proportions between two groups and the estimated variance1
Compute the weighted difference of proportions between two groups, and the estimated variance

  • deltahat = phat[,2] – phat[,1];

  • deltawt = (w # deltahat) [+,];


Compute the weighted difference of proportions between two groups and the estimated variance2

To compute confidence intervals, we will need the estimated variance of the point estimates.

The variance of this point estimate is provided on the next page; and confidence intervals following that page.

Compute the weighted difference of proportions between two groups, and the estimated variance


Compute the weighted difference of proportions between two groups and the estimated variance3

vardeltai = variance of the point estimates.

(phat#(1-phat) / n ) [,+];

vardeltaw =

(w # w # vardeltai ) [+,];

Compute the weighted difference of proportions between two groups, and the estimated variance


Confidence intervals for proportion of each treatment group and for the difference in proportions

pwtcil = variance of the point estimates.

pwt – 1.96*sqrt(varpwt);

pwtciu =

pwt +1.96*sqrt(varpwt);

delwcil =

deltawt–1.96*sqrt(vardeltaw);

delwciu =

deltawt+ 1.96*sqrt(vardeltaw);

Confidence intervals for proportion of each treatment group, and for the difference in proportions


P values for testing the hypothesis

The chi-square statistic on the next page is identical to that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

That is, the numerator has a term of (ndot – 1) rather than ndot.

p-values for testing the hypothesis


P values for testing the hypothesis1

chictmp = ((n[,#] # deltahat) / ndot ) [+,]; that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

chicnum = chictmp ** 2;

chicden = (n[,#] # pbar # (1-pbar) / ndot ) [+,];

chic2 = chicnum/chicden;

p-values for testing the hypothesis


X 3 rows 2 cols (numeric) that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

140 180

50 90

60 60

N 3 rows 2 cols (numeric)

450 440

170 200

200 180

proc iml;

reset print;

x = {140 180, 50 90, 60 60};

n = {450 440, 170 200, 200 180};


/* ndot: sum the rows of the n matrix */ that generated by the CMH statistic in PROC FREQ, except for a factor of (n-1)/n.

ndot = n[,+];

NDOT 3 rows 1 col (numeric)

890

370

380

/* phat: point estimates, proportion of responders each cell */

phat = x/n;

PHAT 3 rows 2 cols (numeric)

0.3111111 0.4090909

0.2941176 0.45

0.3 0.3333333


/* pbar: proportion of responders in each row, weighted by size */

pbar = (n#phat)[,+]/ndot;

/* difference in proportions from column 2 (group 1) to column 1 (group 0) */

delta = phat[,2] - phat[,1];

PBAR 3 rows 1 col (numeric)

0.3595506

0.3783784

0.3157895

DELTA 3 rows 1 col (numeric)

0.0979798

0.1558824

0.0333333


/* Compute weights. */ size */

/* Wnum is, for each row, the product of the columns by the sum of cols */

/* Wden is the sum of the column vector */

/* w is normalized, which will sum to 1, each row of wnum divided by wden */

wnum = n[,#]/ndot;

wden = wnum[+,];

w = wnum/wden;

WNUM 3 rows 1 col (numeric)

222.47191

91.891892

94.736842

WDEN 1 row 1 col (numeric)

409.10064

W 3 rows 1 col (numeric)

0.5438073

0.2246193

0.2315734


/* PWT is weight (a scalar) * phat. then we sum over the rows, to get column proportions */

/* VARPWT is the corresponding variance of pwt */

pwt = (w#phat)[+,];

varpwt = (w#w#phat#(1-phat)/n)[+,];

PWT 1 row 2 cols (numeric)

0.304721 0.4007364

VARPWT 1 row 2 cols (numeric)

0.0002588 0.0002911


/* delta is a 3 x 1, the differences in each row. multiply by w, a 3 x 1. deltawt is the sum */

/* vardelta is the variance of the difference of each row */

/* vardelw is the variance of the weighted difference */

deltawt = (w#delta)[+,];

vardelta= (phat#(1-phat)/n)[,+];

vardelw = (w#w#vardelta)[+,];

DELTAWT 1 row 1 col (numeric)

0.0960154

VARDELTA 3 rows 1 col (numeric)

0.0010257

0.0024587

0.0022846

VARDELW 1 row 1 col (numeric)

0.0005499


/* Lower Confidence Limits for each of the column proportions */

pwtctil = pwt - 1.96*sqrt(varpwt);

/* Upper Confidence Limits for each of the column proportions */

pwtctiu = pwt + 1.96*sqrt(varpwt);

PWTCTIL 1 row 2 cols (numeric)

0.2731918 0.3672948

PWTCTIU 1 row 2 cols (numeric)

0.3362502 0.4341781


/* Lower, Upper Limits for the DeltaWT, the difference between the two groups */

delwcil = deltawt - 1.96*sqrt(vardelw);

delwciu = deltawt + 1.96*sqrt(vardelw);

DELWCIL 1 row 1 col (numeric)

0.0500542

DELWCIU 1 row 1 col (numeric)

0.1419766


/* ChiSquare Computations */ between the two groups */

chictmp = (n[,#]#delta/ndot)[+,];

chicnum = chictmp**2;

chicden = (n[,#]#pbar#(1-pbar)/ndot)[+,];

chic2 = chicnum/chicden;

pchic2 = 1-probchi(chic2,1);

CHICTMP 1 row 1 col (numeric)

39.279972

CHICNUM 1 row 1 col (numeric)

1542.9162

CHICDEN 1 row 1 col (numeric)

93.312668

CHIC2 1 row 1 col (numeric)

16.534906

PCHIC2 1 row 1 col (numeric)

0.0000478


Other uses
Other Uses between the two groups */

  • Graphics.

  • IML Studio can submit FORTRAN, C++, and R statements

    • Formerly Stat Studio

      • Which was formerly IML Workshop


For more reference
For more reference between the two groups */

  • See the SAS/IML Users Guide, or Online Documentation

  • Mehrotra, D., and Railkar, R. Minimum risk weights for comparing treatments in stratified binomial trials. Statistics in Medicine 2000; 19: 811-825.

  • Graybill, F. Theory and Application of the General Linear Model. Wadsworth & Brooks: Pacific Grove, 1976


Summary
Summary between the two groups */

  • IML is a powerful tool for data analysis, statistics, and graphics.

  • Consider instead of datastep programming if the opportunity presents itself.

    • Not always the best alternative, but helps inform the decision making.


Questions
Questions? between the two groups */


ad