1 / 44

Multivariate Data and Matrix Algebra Review

Multivariate Data and Matrix Algebra Review. BMTRY 726 Spring 2012. What is ‘Multivariate’ Data?. Data in which each sampling unit contributes to more than one outcome. For example…. Goals of Multivariate Analysis. Data reduction and structural simplification

avak
Download Presentation

Multivariate Data and Matrix Algebra Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

  2. What is ‘Multivariate’ Data? • Data in which each sampling unit contributes to more than one outcome. For example….

  3. Goals of Multivariate Analysis • Data reduction and structural simplification • Say we collect 16 biological markers to examine patient response to chemotherapy. • Ideally we might like to summarize patient response as some simple combination of the markers. • How can variation in p=16 markers be summarized?

  4. Goals of Multivariate Analysis • Sorting and grouping data • Participants are enrolled in a smoking cessation program for several years • Information about the background of each subject and smoking behavior at multiple visits • Some patients quit while others do not • Can we use the background and smoking behavior information to classify those that quit and those that do not in order to screen future participants?

  5. Goals of Multivariate Analysis • Investigating dependence among variables • Subjects take a standardized test with different categories of questions • Sentence completion • Number sequences • Orientation of patterns • Arithmetic (etc.) • Can correlation among scores be attributed to variation in one or more unobserved factors? • Intelligence • Mathematical ability

  6. Goals of Multivariate Analysis • Prediction based on relationship between variables • We conduct a microarray experiment to compare tumor and healthy tissue • We want to develop a reliable classification tool based on the gene expression information from our experiment

  7. Goals of Multivariate Analysis • Hypothesis testing • Participants in a diabetes study are placed into one of three treatment groups • Fasting blood glucose is evaluated at 0, 3, 6, 9, 12, and 15 months • We want to test the hypothesis that treatment groups are different.

  8. Multivariate Data Properties • What property/ies of multivariate data make commonly used statistical approached inappropriate?

  9. Notation & Data Organization • Consider an example where we have 15 tumor markers collected on 30 tissue samples • The 15 markers are variables and our samples represent the subjects in the data. • These data can most easily be expressed as an 15 by 30 array

  10. Notation & Data Organization • More generally, let j = 1,2,…,prepresent a set of variables collected in a study • And let i = 1,2,…,nrepresent the samples

  11. Random Vectors pdenotes the number of outcomes for subject i i = 1,2,…,n is the number subjects • Each experimental unit has multiple outcome measures thus we can arrange the ith subject’s j = 1,2,…, poutcomes as a vector. • is a random variable as are it’s individual elements

  12. Descriptive Statistics • We can calculate familiar descriptive statistics for this array • Mean • Variance • Covariance (Correlation)

  13. Arranged as Arrays • Means • Covariance

  14. Distance • Many multivariate statistics are based on the idea of distance • For example, if we are comparing two groups we might look at the difference in their means • Euclidean distance

  15. Distance • But why is Euclidean distance inappropriate in statistics? • This leads us to the idea of statistical distance • Consider a case where we have two measures

  16. Distance • Consider a case where we have two measures

  17. Distance • Consider a case where we have two measures

  18. Distance • Our expression of statistical distance can be generalized to p variables to any fixed set of points

  19. Basic Matrix Operations • Can I add A2x3 and B3x3? • What is the product of matrix A and scalar c? • When can I multiply the two matrices A and B?

  20. Matrix Transposes • The transpose of an n x m matrix A, denoted as A’, is an m x n matrix whose ijth element is the jith element of A • Properties of a transpose:

  21. Types of Matrices • Square matrix: • Idempotent: • Symmetric: • A square matrix is diagonal :

  22. More Definitions • An nxn matrix A is nonsingular if there exists an matrix Bnx n such that • B is the multiplicative inverse of A and can be written as • A square matrix with no multiplicative inverse is said to be…. • We can calculate the inverse of a matrix assuming one exists but it is tedious (let the computer do it).

  23. Matrix Determinant • The determinant of a square matrix A is a scalar given by • What is the determinant of

  24. Matrix Determinant • The determinant of a square matrix A is a scalar given by • What is the determinant of

  25. Matrix Determinant • What about the determinant of the 3x3 matrix?

  26. Matrix Determinant • Using this result what is the determinant of

  27. Orthogonal an Orthonormal vectors • A collection of m-dimensional vectors, x1, x2,…, xp are orthogonal if… • The collection of vectors is said to be orthonormal if what 2 conditions are met?

  28. Linear Dependence • The p of m-dimensional vectors, , are linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which

  29. Linear Dependence The p of m-dimensional vectors, , are linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which Conversely, if no such set of non-zero constants exists, the vectors are linearly independent.

  30. Rank of a Matrix Row rank is the number of rows Column rank is the number of cols Find the column rank of How are row and column rank related? What does rank tell us about linear dependence of the vectors that make up the matrix?

  31. Orthogonal Matrices A square matrix Anxn is said to be orthogonal if its columns form an orthonormal set. This can be easily be determined by showing that

  32. Eigenvalues and Eigenvectors The eigenvalues of an Anxn matrix are the solutions to for a set of eigenvectors, . We typically normalize so that

  33. Quadratic Forms Given a symmetric matrix Anxn and an n-dimensional vector x, we can write the quadratic form as . For example, find the quadratic form where

  34. Trace Let A be an nxn matrix, the trace of A is given by Properties of the trace:

  35. Positive Definite Matrices A symmetric matrix A is said to be positive definite if this implies

  36. Positive Definite Matrices A real symmetric matrix is:

  37. Back to Random Vectors • Define Y as a random vector • Then the population mean vector is:

  38. Random Vectors Cont’d So Yiis a random variable whose mean and variance can be expressed by:

  39. Covariance of Random Vectors • We then define the covariance between the ith and jth trait in Y as • Yielding the covariance matrix

  40. Correlation Matrix of Y The correlation matrix for Y is

  41. Properties of a Covariance Matrix is symmetric (i.e. sij = sji for all i,j) is positive semi-definite for any vector of constants

  42. Linear Combinations • Consider linear combinations of the elements of Y • If Yhas mean m and covariance S, then

  43. Linear Combinations Cont’d If S is not positive definite then for at least one

More Related