PLS: PARTIAL-LEAST SQUARES
Download
1 / 28

RENSSELAER - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

PLS: PARTIAL-LEAST SQUARES. PLS: - Partial-Least Squares - Projection to Latent Structures - Please listen to Svante Wold Error Metrics Cross-Validation - LOO - n-fold X-Validation - Bootstrap X-Validation Examples:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RENSSELAER' - emmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

PLS: PARTIAL-LEAST SQUARES

  • PLS: - Partial-Least Squares

  • - Projection to Latent Structures

  • - Please listen to Svante Wold

  • Error Metrics

  • Cross-Validation

  • - LOO

  • - n-fold X-Validation

  • - Bootstrap X-Validation

  • Examples:

  • - 19 Amino-Acid QSAR

  • - Cherkassky’s nonlinear function

  • - y = sin|x|/|x|

  • • Comparison with SVMs

RENSSELAER


Slide3 l.jpg

IMPORTANT EQUATIONS FOR PLS

• t’s are scores or latent variables

• p’s are loadings

• w1 eigenvector of XTYYTX

• t1 eigenvector of XXTYYT

• w’s and t’s of deflations:

• w’s are orthonormal

• t’s are orthogonal

• p’s not orthogonal

• p’s orthogonal to earlier w’s

RENSSELAER



Slide5 l.jpg

NIPALS ALGORITHM FOR PLS (with just one response variable y)

  • Start for a PLS component:

  • Calculate the score t:

  • Calculate c’:

  • Calculate the loading p:

  • Store t in T, store p in P, store w in W

  • Deflate the data matrix and the response variable:

Do for h latent variables

RENSSELAER


Slide6 l.jpg

The geometric representation of PLSR. The X-matrix can be represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.


Slide7 l.jpg

QSAR DATA SET EXAMPLE: 19 Amino Acids represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

From Svante Wold, Michael Sjölström, Lennart Erikson, "PLS-regression: a basic tool of chemometrics," Chemometrics and Intelligent Laboratory Systems, Vol 58, pp. 109-130 (2001)

RENSSELAER


Slide8 l.jpg

INXIGHT VISUALIZATION PLOT represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

RENSSELAER


Slide10 l.jpg

QSAR.BAT: represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.SCRIPT FOR BOOTSTRAP VALIDATION FOS AA’s


Slide11 l.jpg

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.


Slide12 l.jpg

2 latent variables represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.


Slide13 l.jpg

3 latent variables represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.


Slide14 l.jpg

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

No aromatic AAs


Slide16 l.jpg

KERNEL PLS HIGHLIGHTS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

  • Invented by Rospital and Trejo (J. Machine learning, December 2001)

  • They first altered the linear PLS by dealing with eigenvectors of XXT

  • They also made the NIPALS PLS formulation resemble PCA more

  • Now non-linear correlation matrix K(XXT) rather than XXT is used

  • Nonlinear Correlation matrix contains nonlinear similarities of

  • datapoints rather than

  • • An example is the Gaussian Kernel similarity measure:

Kernel PLS

Linear PLS

• trick is a different normalization

• now t’s rather than w’s are normalized

• t1 eigenvector of K(XXT)YYT

• w’s and t’s of deflations of XXT

• w1 eigenvector of XTYYTX

• t1 eigenvector of XXTYYT

• w’s and t’s of deflations:

• w’s are orthonormal

• t’s are orthogonal

• p’s not orthogonal

• p’s orthogonal to earlier w’s


Slide18 l.jpg

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

Gaussian Kernel PLS (sigma = 1.3)

With aromatic AAs


Slide22 l.jpg

CHERKASSKY’S NONLINEAR BENCHMARK DATA represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

• Generate 500 datapoints (400 training; 100 testing) for:

Cherkas.bat


Slide23 l.jpg

Bootstrap Validation Kernel PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

8 latent variables

Gaussian kernel with sigma = 1


Slide25 l.jpg

True test set for Kernel PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

8 latent variables

Gaussian kernel with sigma = 1


Slide26 l.jpg

Y=sin|x|/|x| represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

• Generate 500 datapoints (100 training; 500 testing) for:


Slide27 l.jpg

Comparison Kernel-PLS with PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

4 latent variables

sigma = 0.08

PLS

Kernel-PLS