1 / 28

RENSSELAER - PowerPoint PPT Presentation

PLS: PARTIAL-LEAST SQUARES. PLS: - Partial-Least Squares - Projection to Latent Structures - Please listen to Svante Wold Error Metrics Cross-Validation - LOO - n-fold X-Validation - Bootstrap X-Validation Examples:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'RENSSELAER' - emmy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• PLS: - Partial-Least Squares

• - Projection to Latent Structures

• - Please listen to Svante Wold

• Error Metrics

• Cross-Validation

• - LOO

• - n-fold X-Validation

• - Bootstrap X-Validation

• Examples:

• - 19 Amino-Acid QSAR

• - Cherkassky’s nonlinear function

• - y = sin|x|/|x|

• • Comparison with SVMs

RENSSELAER

• t’s are scores or latent variables

• w1 eigenvector of XTYYTX

• t1 eigenvector of XXTYYT

• w’s and t’s of deflations:

• w’s are orthonormal

• t’s are orthogonal

• p’s not orthogonal

• p’s orthogonal to earlier w’s

RENSSELAER

• Start for a PLS component:

• Calculate the score t:

• Calculate c’:

• Store t in T, store p in P, store w in W

• Deflate the data matrix and the response variable:

Do for h latent variables

RENSSELAER

The geometric representation of PLSR. The X-matrix can be represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

QSAR DATA SET EXAMPLE: 19 Amino Acids represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

From Svante Wold, Michael Sjölström, Lennart Erikson, "PLS-regression: a basic tool of chemometrics," Chemometrics and Intelligent Laboratory Systems, Vol 58, pp. 109-130 (2001)

RENSSELAER

INXIGHT VISUALIZATION PLOT represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

RENSSELAER

QSAR.BAT: represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.SCRIPT FOR BOOTSTRAP VALIDATION FOS AA’s

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

2 latent variables represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

3 latent variables represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

No aromatic AAs

KERNEL PLS HIGHLIGHTS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

• Invented by Rospital and Trejo (J. Machine learning, December 2001)

• They first altered the linear PLS by dealing with eigenvectors of XXT

• They also made the NIPALS PLS formulation resemble PCA more

• Now non-linear correlation matrix K(XXT) rather than XXT is used

• Nonlinear Correlation matrix contains nonlinear similarities of

• datapoints rather than

• • An example is the Gaussian Kernel similarity measure:

Kernel PLS

Linear PLS

• trick is a different normalization

• now t’s rather than w’s are normalized

• t1 eigenvector of K(XXT)YYT

• w’s and t’s of deflations of XXT

• w1 eigenvector of XTYYTX

• t1 eigenvector of XXTYYT

• w’s and t’s of deflations:

• w’s are orthonormal

• t’s are orthogonal

• p’s not orthogonal

• p’s orthogonal to earlier w’s

1 latent variable represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

Gaussian Kernel PLS (sigma = 1.3)

With aromatic AAs

CHERKASSKY’S NONLINEAR BENCHMARK DATA represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

• Generate 500 datapoints (400 training; 100 testing) for:

Cherkas.bat

Bootstrap Validation Kernel PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

8 latent variables

Gaussian kernel with sigma = 1

True test set for Kernel PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

8 latent variables

Gaussian kernel with sigma = 1

Y=sin|x|/|x| represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

• Generate 500 datapoints (100 training; 500 testing) for:

Comparison Kernel-PLS with PLS represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.

4 latent variables

sigma = 0.08

PLS

Kernel-PLS