- 131 Views
- Uploaded on
- Presentation posted in: General

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Test Set Validation Revisited – Good Validation Practice in QSAR

Knut Baumann

Department of Pharmacy, University of Würzburg, Germany

= f( )

k

Quantitative Structure-Activity Relationships

- Build mathematical model: Activity = f(Structural Properties)
- Use it to predict activity of novel compounds

Model

Validation

Ultimate Goal of QSAR

- Predictivity
- Prerequisites:
- Valid biological and structural data
- Stable mathematical model
- Exclusion of chance correlation and overfitting

Outline

- Conditions for good external predictivity
- Practice of external validation

Levels of Model Validity

- Data fit
- Internal predictivity internal validation
- External predictivity external validation

10

9

8

7

6

5

Fitted

4

3

3

4

5

6

7

8

9

10

Observed

Definition: Data Fit

The same data are used to build and to assess the model

Resubstitution Error

GRID-PLS

R2 = 0.94

R2: squared multiple correlation coefficient

Data: HEPT; n = 53

1.0

0.9

GRID-PLS

0.8

R2 / R2CV-1

max. R2CV-1

0.7

0.6

0.5

0

2

4

6

8

10

Number of PLS-Factors

Fit

Cross-Validation

Definition: Internal Predictivity

A measure of predictivity (cross-validation, validation set prediction) that is used for model selection

R2CV-1: leave-one-out cross-validated squared correlation coefficient (Q2)

Data: HEPT; n = 53

Definition: External Predictivity

A measure of predictivity (cross-validation, test set prediction) for a set of data that did not influence model selection

The activity values of the test set are concealed and not known to the user during model selection

GRID-PLS

1.0

max. R2Test

0.9

0.8

R2 / R2CV-1 / R2Test

max. R2CV-1

0.7

Fit

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Example: External Predictivity

Data: HEPT; n = 53, nTest = 27

1.0

max. R2

0.8

0.6

R2 / R2CV-1 / R2Test

0.4

Fit

0.2

Cross-Validation

Test Set Prediction

0.0

0

5

10

15

20

25

30

35

Number of PLS-Factors

Importance of Selection Criterion

Good external predictivity

Quality of measure of predictivity for model selection!

Data: HEPT; n = 53, nTest = 27

Usefulness of Internal Predictivity

Do internal measures of predictivity provide useful information?

It depends …

CV:

Test:

Case 1: No Model Selection

Multiple Linear Regression:

R2CV-1 R2Test

MSEP: Mean squared error of prediction

GRID-PLS

1.0

0.9

0.8

R2CV-1 / R2Test

0.7

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Stable mathematical modelling technique

&

Few models are compared

Internal External

Case 2: Little Model Selection

1.0

0.8

0.6

R2CV-1

0.4

0.2

Internal

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection

1.0

max. R2CV-1

0.8

0.6

R2CV-1 /R2Test

0.4

0.2

Internal

External

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection

Extensive model selection (danger of) overfitting

internal measures of predictivity are of limited usefulness

Data: Steroids; n = 21, nTest = 9

Outline

- Conditions for good external predictivity
- Practice of external validation

Meaningful External Validation

- The two Problems of external Validation:
- Data splitting
- Variability

Problem 1: Data Splitting

Training set

Activity

values

Structure

descriptors

Test set

- Techniques for splitting
- Experimental design using descriptors
- Random partition

biased1

variability

Use multiple random splits into training and test sets

1) E. Roecker, Technometrics1991, 33, 459-468.

Problem 2: Variability

nTest = 5rel sdv(RMSEP) = 32%

nTest = 10rel sdv(RMSEP) = 22%

nTest = 50rel sdv(RMSEP) = 10%

RMSEP: Root mean squared error of prediction

Problem 2: Variability

Example Steroid data set nTest = 9

RMSEP = 0.53 R2Test = 0.73

RMSEP 2 sdv(RMSEP) = 0.53 0.25

R2Test = [ 0.40 0.92 ]

RMSEP: Root mean squared error of prediction

Problem 2: Variability

Until the test data set is huge (nTest 100)

Use multiple random splits into training and test sets

RMSEP: Root mean squared error of prediction

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

n = 29

nTest = 15

Data: W84

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

100 random

splits into:

n = 29

nTest = 15

mean

Data: W84

Variable Selection

GRID-PLS

mean

mean

Variability Illustrated II

Influence of extensive model selection

1.0

100 random

splits into:

n = 29

nTest = 15

0.5

R2Test

0.0

-0.5

-1.0

-1.0

-0.5

0.0

0.5

1.0

R2CV

Extensive model selection causes instability

Data: W84

Financial Support

German Research Foundation: SFB 630 – TP C5

Conclusion

- Internal predictivity must reliably characterize model performance
- Avoid extensive model selection if possible
- Do not use the activity values of the test set until the final model is selected
- Model selection: variation of any operational parameter

- Use multiple splits into test and training set unless test set is huge

Kubinyi-Pardoxon Explained

Data: Log P

Definition: Data Fit

GRID-PLS

8

R2 = 0.99

7

6

Fitted

5

4

4

5

6

7

8

Observed

The same data are used to build and to assess the model

Resubstitution Error

Usefulness: strongly biased

8

R2 = 0.99

R2CV-1 = 0.62

7

6

Predicted

5

Fit

4

Cross-Validation

4

5

6

7

8

Observed

Internal Predictivity

GRID-PLS

Does internal predictivity provide useful information?

It depends!

Definition: Internal Predictivity

GRID-PLS

1

0.8

0.6

R2 / R2CV-1

0.4

0.2

Fit

Cross-Validation

0

0

2

4

6

8

10

Number of PLS-Factors

A measure of predictivity (cross-validation, test set prediction) that was used for model selection

Usefulness: it depends …

1

0.9

0.8

0.7

R2Test

0.6

0.5

data 26

data 27

0.4

data 28

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

R2CV-1

Variability Illustrated

Conclusion

- Internal figures of merit in VS are largely inflated and can, in general, not be trusted
- The resulting models are far more complex than anticipated
- VS is prone to chance correlation, in particular with LOO-CV and similar statistics as objective function

- rigorous validation mandatory
„Trau, Schau, Wem!“ – “Try before you trust”

- similar in spirit to:
- „The importance of being earnest“, Tropsha et al.

For a PDF-reprint of the slides email to: [email protected]