1 / 20

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods. Shaohua (Kevin) Zhou Center for Automation Research Department of Electrical and Computer Engineering University of Maryland, College Park. Overview. Reproducing Kernel Hilbert Space (RKHS)

benita
Download Presentation

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation Research Department of Electrical and Computer Engineering University of Maryland, College Park ENEE698A Graduate Seminar

  2. Overview • Reproducing Kernel Hilbert Space (RKHS) • From RN to RKHS • Regularization Theory with RKHS • Regularization Network (RN) • Support Vector Regression (SVR) • Support Vector Classification (SVC) • Kernel Methods • Kernel Principal Component Analysis (KPCA) • More examples ENEE698A Graduate Seminar

  3. Vector Space RN • Positive definite matrix S=[si(j)] • S = [s1,s2,…,sN] • Eigensystem: S = Sn=1:NlnjnjnT • Inner product <f,g>= fT S-1g • <f,g>= Snln-1 fTjnjnTg = Sn ln-1(f,jn)(g,jn) • (u,v) = uTv, regular inner product • Two properties: • <si, sj>= siT S-1 sj= siTej = si(j) • <si, f>= siTS-1 f = eiTf= f(i) with f=[f(1),f(2),…,f(N)]T ENEE698A Graduate Seminar

  4. Reproducing Kernel Hilbert Space (RKHS) • Positive kernel function kx(.)=k(x,.) • Mercer’s theorem • Eigensystem : k(x,y)=Sn=1:∞lnjn(x)jn(y) withSn=1:∞ln2<∞ • Inner product <f,g>H • <f,g>H = Sn ln-1(f,jn)(g,jn) • (u,v)= ∫u(y)v(y)dy, regular inner product • Two properties: • <kx,ky>H = k(x,y) • <kx,f>H = f(x) reproducing property ENEE698A Graduate Seminar

  5. More on RKHS • Let f(y) be an element in RKHS • f(y) = Sn=1:∞an jn(y) • (f, jn) = an • <f,f>H= Sn=1:∞ln-1an2 • One particular function f(y) • f(y) = Si=1:n ci k(y,xi) • Is f(y) in the RKHS? • <f,f>H = Si=1:n Sj=1:n ci cj k(xi,xj) = cT K c with c=[c1,c2,…, ci]T and K=[k(xi,xj)] the Gram matrix ENEE698A Graduate Seminar

  6. More on RKHS • Nonlinear mapping f : RN R∞ • f(x)=[l11/2j1(x),…, ln1/2jn(x),…]T • Regular inner product in feature space R∞ • (f(x),f(y)) = f(x)Tf(y) = Sn=1:∞ln1/2jn(x)ln1/2jn(y) = k(x,y) = <kx, ky>H ENEE698A Graduate Seminar

  7. Kernel Choices • Gaussian kernel or RBF kernel • k(x,y)=exp(- s-2 ||x-y||2) • Polynomial kernel • k(x,y) = ((x,y)+d)p • Construction rule • Covariance function of Gaussian processes • k(x,y) = ∫g(x,z)g(z,y)dz • k(x,y) = c, c>0 • k(x,y) = k1(x,y) + k2(x,y) • k(x,y) = k1(x,y) * k2(x,y) ENEE698A Graduate Seminar

  8. Regularization Theory • Regularization task • min f inH J(f) = [Si=1:n L(yi,f(xi)) + l<f,f>H], where L is lost function and <f,f>H is a stabilizer. • Optimal solution • f(x)= Si=1:n ci k(x,xi) = [k(x,x1),…,k(x,xn)]c • {hi(x)=k(x,xi); i=1,…,n} are basis functions • Optimal coefficients {ci; i=1,…,n}depend on the function L and l ENEE698A Graduate Seminar

  9. Regularization Network (RN) • RN assumes a quadratic loss function • min f inH J(f) = [Si=1:n (yi-f(xi))2 + l<f,f>H] • Find {ci} • [f(x1), f(x2), …, f(xn)]T = Kc • J(f) = (y-Kc)T (y-Kc) + l cTKc • c = (K+lI)-1 y • Practical considerations • One term of intercept f(x) = Si=1:n ci k(x,xi)+b • Too many coefficients  Support vector regression (SVR) ENEE698A Graduate Seminar

  10. Support Vector Regression (SVR) • SVR assumes an e–insensitive loss function • min f inH J(f) = [Si=1:n |yi-f(xi)|e + l<f,f>H], with |x|e = max(0, |x|-e) • Primal problem • min J(f,m,n)= Si=1:n (mi+ni) + l<f,f>H • s.t. (1) f(xi)-yi<= e+ mi; (2) yi-f(xi)<= e+ ni ;(3) mi >=0; (4) ni >=0 • Quadratic programming (QP)  Dual problem • xi is called support vector (SV) if its Langrange multipler is nonzero ENEE698A Graduate Seminar

  11. Support Vector Classification (SVC) • SVR assumes a soft margin loss function • min f inH J(f) =[Si=1:n|1-yif(xi)|+ +l<f,f>H], with |x|+ = max(0, x) • Determine the label of x as sgn(Si ciyik(x,xi)+b) • Primal problem • min J(f,m)= Si=1:n mi + l<f,f>H • s.t. (1) 1- yif(xi)<= mi; (2) mi >=0; • Quadratic programming (QP)  Dual problem • xi is called support vector (SV) if its Langrange multipler is nonzero ENEE698A Graduate Seminar

  12. Kernel Methods • General strategy of kernel methods • Nonlinear mapping f : RN R∞ embedded in the kernel function • Linear learning methods employing geometry / linear algebra • Kernel trick: cast all computations in dot product ENEE698A Graduate Seminar

  13. Gram Matrix • Gram matrix (dot product matrix, kernel matrix) • Covariance matrix of any Gaussian process for any finite sample • Combines the information of the data and the kernel • Contains all needed information for the learning kernel • K = [k(xi,xj)] = [f(xi)Tf(xj)] = FTF where F = [f(x1), f(x2),…, f(xn)] ENEE698A Graduate Seminar

  14. Geometry in the RKHS • Distance in the RKHS • (f(x)-f(y))T(f(x)-f(y)) = f(x)Tf(x)+f(y)T f(y)–2f(x)Tf(y) = k(x,x) + k(y,y)- 2k(x,y) • Distance to center • f0 = Si=1:nf(xi)/n = F1/n • (f(x)-f0)T(f(x)-f0) = f(x)Tf(x) + f0T f0 – 2 f(x)Tf0 = k(x,x) + 1TFTF1/n2 – 2 f(x)T F1/n = k(x,x) + 1TK1/n2 – 2 gF(x)T1/n • gF(x) = FTf(x)= [k(x,x1),…,k(x,xn)]T ENEE698A Graduate Seminar

  15. Geometry in the RKHS • Centered distance in the RKHS • ( f(x) – f0 )T( f(y) – f0 ) = f(x)Tf(y) + f0Tf0 – f(x)T f0 – f(y)T f0 = k(x,y) +1TK1/n2-gF(x)T1/n-gF(y)T1/n • Centered Gram matrix • K^=[f(x1)–f0,…,f(xn)–f0]T[f(x1)–f0,…,f(xn)–f0] = [F-F11T/n]T[F-F11T/n] = [FQ]T[FQ] = QTQ = QTKQ Q = In-11T/n ENEE698A Graduate Seminar

  16. Kernel Principal Component Analysis (KPCA) • Kernel PCA • Mean f0 = Si=1:nf(xi)/n = F1/n • Covariance matrix C = n-1[f(x1)–f0,…,f(xn)–f0][f(x1)–f0,…,f(xn)–f0]T = n-1[FQ][FQ]T = n-1QQT; Q = FQ • Eigensystem of C • The ‘reciprocal’ matrix: QTQu= K^u= lu • n-1QQTQu= n-1lQu; Cv= n-1lv; v= Qu • Normalizaton: vTv= uT K^u= l uTu= l; v~ = Qul-1/2. ENEE698A Graduate Seminar

  17. Kernel Principal Component Analysis (KPCA) • Eigen-projection • (f(x)–f0)T v~ = (f(x)–f0)T FQul-1/2 = f(x)TFQul-1/2 -1TFT FQul-1/2 /n = gF(x)TQul-1/2 -1TKQul-1/2 /n ENEE698A Graduate Seminar

  18. Kernel Principal Component Analysis (KPCA) Contour plots of PCA features ENEE698A Graduate Seminar

  19. More Examples of Kernel Methods • Examples • Kernel Fisher Discriminant Analysis (KFDA) • Kernel K-Means Clustering • Spectral Clustering and Graph Cutting • Kernel … • Kernel Independent Component Analysis (KICA) ? ENEE698A Graduate Seminar

  20. Summary of Kernel Methods • Pros and Cons • Nonlinear embedding • Linear algorithm • Large storage requirement • Computational inefficiency • Important Issues • Kernel selection and design ENEE698A Graduate Seminar

More Related