Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems Lecture 7

This Week’s Lectures • PCA • Continuation of visual system modeling • Quiz (postponed from last Thursday) • Start on Chapter 6 (TT) and Chapter 4 (O’Reilly) • This Thursday’s quiz will be on PCA • NOTE: for assignments (not general email) • Use the following email address: CognitiveSystems@gmail.com

Principal Component Analysis

Topics covered • Standard Deviation • Variance • Covariance • Eigenvectors • Eigenvalues • PCA • Application of PCA - Eigenfaces

Standard Deviation • Statistics – analyzing data sets in terms of the relationships between the individual points • Standard Deviation is a measure of the spread of the data [0 8 12 20] [8 9 11 12] • Calculation: average distance from the mean of the data set to a point s = Σi=1n(Xi – X)2 (n -1) • Denominator of n-1 for sample and n for entire population

Standard Deviation • For example [0 8 12 20] has s = 8.32 [8 9 11 12] has s = 1.82 [10 10 10 10] has s = 0

Variance • Another measure of the spread of the data in a data set • Calculation: s2 = Σi=1n(Xi – X)2 (n -1) Why have both variance and SD to calculate the spread of data? Variance is claimed to be the original statistical measure of spread of data. However it’s unit would be expressed as a square e.g. cm2, which is unrealistic to express heights or other measures. Hence SD as the square root of variance was born.

Covariance • Variance – measure of the deviation from the mean for points in one dimension e.g. heights • Covariance as a measure of how much each of the dimensions vary from the mean with respect to each other. • Covariance is measured between 2 dimensions to see if there is a relationship between the 2 dimensions e.g. number of hours studied & marks obtained. • The covariance between one dimension and itself is the variance

Covariance variance (X) = Σi=1n(Xi – X) (Xi – X) (n -1) covariance (X,Y) = Σi=1n(Xi – X) (Yi – Y) (n -1) • So, if you had a 3-dimensional data set (x,y,z), then you could measure thecovariance between the x and y dimensions, the y and z dimensions, and the x and z dimensions. Measuring the covariance between x and x , or y and y , or z and z wouldgive you the variance of the x , y and z dimensions respectively.

Covariance • What is the interpretation of covariance calculations? e.g.: 2 dimensional data set x: number of hours studied for a subject y: marks obtained in that subject covariance value is say: 104.53 what does this value mean?

Covariance • Exact value is not as important as it’s sign. • A positive value of covariance indicates both dimensions increase or decrease together e.g. as the number of hours studied increases, the marks in that subject increase. • A negative value indicates while one increases the other decreases, or vice-versa e.g. active social life at RIT vs performance in CS dept. • If covariance is zero: the two dimensions are independent of each other e.g. heights of students vs the marks obtained in a subject

Covariance • Why bother with calculating covariance when we could just plot the 2 values to see their relationship? Covariance calculations are used to find relationships between dimensions in high dimensional data sets (usually greater than 3) where visualization is difficult.

Covariance Matrix • Representing Covariance between dimensions as a matrix e.g. for 3 dimensions: cov(x,x) cov(x,y) cov(x,z) C = cov(y,x) cov(y,y) cov(y,z) cov(z,x) cov(z,y) cov(z,z) • Diagonal is the variances of x, y and z • cov(x,y) = cov(y,x) hence matrix is symmetrical about the diagonal • N-dimensional data will result in nxn covariance matrix

Transformation matrices • Consider: 2 3 3 12 3 2 1 2 8 2 • Square transformation matrix transforms (3,2) from its original location. Now if we were to take a multiple of (3,2) 3 6 2 4 2 3 6 24 6 2 1 4 16 4 x = = x 4 2 x = x = = x 4

Transformation matrices • Scale vector (3,2) by a value 2 to get (6,4) • Multiply by the square transformation matrix • We see the result is still a multiple of 4. WHY? A vector consists of both length and direction. Scaling a vector only changes its length and not its direction. This is an important observation in the transformation of matrices leading to formation of eigenvectors and eigenvalues. Irrespective of how much we scale (3,2) by, the solution is always a multiple of 4.

eigenvalue problem • The eigenvalue problem is any problem having the following form: A . v = λ . v A: n x n matrix v: n x 1 non-zero vector λ: scalar Any value of λ for which this equation has a solution is called the eigenvalue of A and vector v which corresponds to this value is called the eigenvector of A.

eigenvalue problem 2 3 3 12 3 2 1 2 8 2 A . v = λ.v Therefore, (3,2) is an eigenvector of the square matrix A and 4 is an eigenvalue of A Given matrix A, how can we calculate the eigenvector and eigenvalues for A? x = = x 4

Calculating eigenvectors & eigenvalues Given A . v = λ.v A . v - λ.I.v = 0 (A - λ.I ).v = 0 Finding the roots of |A - λ.I| will give the eigenvalues and for each of these eigenvalues there will be an eigenvector Example …

Calculating eigenvectors & eigenvalues • If A = 0 1 -2 -3 Then |A - λ.I| = 0 1 λ 0 = 0 -2 -3 0 λ -λ 1 = λ2 + 3λ + 2 = 0 -2 -3-λ This gives us 2 eigenvalues: λ1 = -1 and λ2 = -2

Calculating eigenvectors & eigenvalues • For λ1 the eigenvector is: (A – λ1.I ).v1 = 0 1 1 v1:1 = 0 -2 -2v1:2 -2.v1:1 + -2.v1:2 = 0 v1:1 = -v1:2 Therefore the first eigenvector is any column vector in which the two elements have equal magnitude and opposite sign

Calculating eigenvectors & eigenvalues • Therefore eigenvector v1 is v1 = k1 +1 -1 Where k1 is some constant. Similarly we find eigenvector v2 v2 = k2 +1 -2 And the eigenvalues are λ1 = -1 and λ2 = -2

Properties of eigenvectors and eigenvalues • Note that Irrespective of how much we scale (3,2) by, the solution is always a multiple of 4. • Eigenvectors can only be found for square matrices and not every square matrix has eigenvectors. • Given an n x n matrix, we can find n eigenvectors

Properties of eigenvectors and eigenvalues • All eigenvectors of a matrix are perpendicular to each other, no matter how many dimensions we have • In practice eigenvectors are normalized to have unit length. Since the length of the eigenvectors do not affect our calculations we prefer to keep them standard by scaling them to have a length of 1. e.g. For eigenvector (3,2) ((32 + 22))1/2 = (13)1/2 3 ÷ (13)1/2 = 3/(13)1/2 2 2/(13)1/2

Matlab • >> A = [0 1; 2 3] • A = • 0 1 • 2 3 • >> [v,d] = eig(A) • v = • -0.8719 -0.2703 • 0.4896 -0.9628 • d = • -0.5616 0 • 0 3.5616 >> help eig [V,D] = EIG(X) produces a diagonal matrix D of eigenvalues and a full matrix V whose columns are the corresponding eigenvectors so that X*V = V*D.

PCA • principal components analysis (PCA) is a technique that can be used to simplify a dataset • It is a linear transformation that chooses a new coordinate system for the data set such that greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component), the second greatest variance on the second axis, and so on. • PCA can be used for reducing dimensionality by eliminating the later principal components.

PCA • By finding the eigenvalues and eigenvectors of the covariance matrix, we find that the eigenvectors with the largest eigenvalues correspond to the dimensions that have the strongest correlation in the dataset. • This is the principal component. • PCA is a useful statistical technique that has found application in: • fields such as face recognition and image compression • finding patterns in data of high dimension.

PCA process –STEP 1 • Subtract the mean from each of the data dimensions. All the x values have x subtracted and y values have y subtracted from them. This produces a data set whose mean is zero. Subtracting the mean makes variance and covariance calculation easier by simplifying their equations. The variance and co-variance values are not affected by the mean value.

PCA process –STEP 1 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf DATA: x y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 ZERO MEAN DATA: x y .69 .49 -1.31 -1.21 .39 .99 .09 .29 1.29 1.09 .49 .79 .19 -.31 -.81 -.81 -.31 -.31 -.71 -1.01

PCA process –STEP 1 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

PCA process –STEP 2 • Calculate the covariance matrix cov = .616555556 .615444444 .615444444 .716555556 • since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable increase together.

PCA process –STEP 3 • Calculate the eigenvectors and eigenvalues of the covariance matrix eigenvalues = .0490833989 1.28402771 eigenvectors = -.735178656 -.677873399 .677873399 -.735178656

PCA process –STEP 3 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf • eigenvectors are plotted as diagonal dotted lines on the plot. • Note they are perpendicular to each other. • Note one of the eigenvectors goes through the middle of the points, like drawing a line of best fit. • The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.

PCA process –STEP 4 • Reduce dimensionality and form feature vector the eigenvector with the highest eigenvalue is the principle component of the data set. In our example, the eigenvector with the larges eigenvalue was the one that pointed down the middle of the data. Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance.

PCA process –STEP 4 Now, if you like, you can decide to ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much • n dimensions in your data • calculate n eigenvectors and eigenvalues • choose only the first p eigenvectors • final data set has only p dimensions.

PCA process –STEP 4 • Feature Vector FeatureVector = (eig1 eig2 eig3 … eign) We can either form a feature vector with both of the eigenvectors: -.677873399 -.735178656 -.735178656 .677873399 or, we can choose to leave out the smaller, less significant component and only have a single column: - .677873399 - .735178656

PCA process –STEP 5 • Deriving the new data FinalData = RowFeatureVector x RowZeroMeanData RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top RowZeroMeanData is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension.

PCA process –STEP 5 • FinalData is the final data set, with data items in columns, and dimensions along rows. What will this give us? It will give us the original data solely in terms of the vectors we chose. We have changed our data from being in terms of the axes x and y , and now they are in terms of our 2 eigenvectors.

PCA process –STEP 5 FinalData transpose: dimensions along columns x y -.827970186 -.175115307 1.77758033 .142857227 -.992197494 .384374989 -.274210416 .130417207 -1.67580142 -.209498461 -.912949103 .175282444 .0991094375 -.349824698 1.14457216 .0464172582 .438046137 .0177646297 1.22382056 -.162675287

PCA process –STEP 5 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

Reconstruction of original Data • If we reduced the dimensionality, obviously, when reconstructing the data we would lose those dimensions we chose to discard. In our example let us assume that we considered only the x dimension…

Reconstruction of original Data http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf x -.827970186 1.77758033 -.992197494 -.274210416 -1.67580142 -.912949103 .0991094375 1.14457216 .438046137 1.22382056

MATLAB DEMO

PCA applications -Eigenfaces • Eigenfaces are the eigenvectors of the covariance matrix of the probability distribution of the vector space of human faces • Eigenfaces are the ‘standardized face ingredients’ derived from the statistical analysis of many pictures of human faces • A human face may be considered to be a combination of these standard faces

PCA applications -Eigenfaces • To generate a set of eigenfaces: • Large set of digitized images of human faces is taken under the same lighting conditions. • The images are normalized to line up the eyes and mouths. • The eigenvectors of the covariance matrix of the statistical distribution of face image vectors are then extracted. • These eigenvectors are called eigenfaces.

PCA applications -Eigenfaces • the principal eigenface looks like a bland androgynous average human face http://en.wikipedia.org/wiki/Image:Eigenfaces.png

Eigenfaces – Face Recognition • When properly weighted, eigenfaces can be summed together to create an approximate gray-scale rendering of a human face. • Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces • Hence eigenfaces provide a means of applying data compression to faces for identification purposes.

Ventral Visual Pathway EOR Pathway From Draper, Baek, Boody - 2002 Expert Object Recognition in Video Matt McEuen

EOR • Principal Component Analysis (PCA) • Based on covariance • Visual memory reconstruction • Images of cats and dogs are aligned so that the eyes are in the same position in every image

64 x 64 Image 5 Element Vector EOR

References • PCA tutorial: http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf • Wikipedia: http://en.wikipedia.org/wiki/Principal_component_analysis • Website for code: http://www.eng.man.ac.uk/mech/merg/Research/datafusion.org.uk/pca.html

Biologically Inspired Intelligent Systems