Brief Introduction to PCA & SVD

Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Contents • Principle component analysis (PCA) • Singular value decomposition (SVD)

Principle Component Analysis • Steps • Get some data • Subtract the mean • Calculate the covariance matrix • Calculate the eigenvectors and eigenvalues of the covariance matrix • Choosing components and forming a feature vector • Deriving the new data set • Getting the old data back

Principle Component Analysis • Get some data

Principle Component Analysis • Subtract the mean

Principle Component Analysis • Calculate the covariance matrix • Calculate the eigenvectors and eigenvalues of the covariance matrix

Principle Component Analysis • Choosing components and forming feature vector (1) • Deriving the newdata set (1) • FinalData1 =DataAdjustFeatureVector1

Principle Component Analysis • Choosing components and forming feature vector (2) • Deriving the newdata set (2) • FinalData2 =DataAdjustFeatureVector2 • Getting the olddata back • DataAdjust' =FinalData2FeatureVector2T

Principle Component Analysis • 이론적으로 보자면, • 가장 중요한 성분(component) w1은 다음과 같이 구해짐 • 여기서, x는 data point의 확률변수(벡터)임 • k-1 개의 요인을 제거한 후 가장 주요한 성분이 wk임 • 위의 조건을 만족하는 모든 성분을 구하는 것은 최적화 문제이며 공분산 행렬 C의 특성행렬을 구함으로써 계산할 수 있음 • C = XXT = WWT • 여기서 XT는 data point의 row 행렬, W는 특성행렬, 는 대각행렬

Singular Value Decomposition • Singular value decomposition • Any m by n matrix A can be factored into A = Q1Q2T = (orthogonal)(diagonal)(orthogonal). • The columns of Q1 (m by m) are eigenvectors of AAT, and the columns of Q2 (n by n) are eigenvectors of ATA. The r singular values on the diagonal of  (m by n) are the square roots of the nonzero eigenvalues of both AAT and ATA. • Example (by MATLAB) [2 5 8 7] [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [3 5 7 6] [-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [1 6 4 9] = [-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08] [2 2 3 4] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.91] [ 0.05 0.85 -0.13 -0.51] [2 4 4 5] [-0.36 0.03 -0.36 0.44]

Singular Value Decomposition • Applications • Image compression [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.0 5.0 8.0 7.0] [-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [3.0 5.0 7.0 6.0] [-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08] = [1.0 6.0 4.0 9.0] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.91] [ 0.05 0.85 -0.13 -0.51] [2.0 2.0 3.0 4.0] [-0.36 0.03 -0.36 0.44] [2.0 4.0 4.0 5.0] [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.0 5.3 8.0 6.8] [-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [3.0 4.7 7.0 6.2] [-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08] = [1.0 6.0 4.0 9.0] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [2.0 2.6 2.9 3.7] [-0.36 0.03 -0.36 0.44] [2.0 3.7 4.1 5.2] [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.8 5.2 7.6 6.9] [-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [2.7 4.7 7.2 6.2] [-0.51 0.82 0.15 0.00] X [ 0 0 0.00 0 ] X [-0.92 0.06 0.37 -0.08] = [1.2 6.0 4.0 9.0] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [1.2 2.7 3.3 3.6] [-0.36 0.03 -0.36 0.44] [1.5 3.7 4.2 5.2] [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.4 5.6 6.6 7.7] [-0.50 -0.42 -0.23 0.42] [ 0 0.00 0 0 ] [-0.33 0.24 -0.73 0.55] [2.2 5.1 6.0 7.1] [-0.51 0.82 0.15 0.00] X [ 0 0 0.00 0 ] X [-0.92 0.06 0.37 -0.08] = [2.2 5.3 6.2 7.3] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [1.1 2.7 3.1 3.7] [-0.36 0.03 -0.36 0.44] [1.6 3.7 4.3 5.1]

Brief Introduction to PCA & SVD