Test Set Validation Revisited – Good Validation Practice in QSAR
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on
  • Presentation posted in: General

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann Department of Pharmacy, University of Würzburg, Germany. = f ( ). k. Quantitative Structure-Activity Relationships. Build mathematical model: Activity = f (Structural Properties)

Download Presentation

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Test set validation revisited good validation practice in qsar knut baumann

Test Set Validation Revisited – Good Validation Practice in QSAR

Knut Baumann

Department of Pharmacy, University of Würzburg, Germany


Test set validation revisited good validation practice in qsar knut baumann

= f( )

k

Quantitative Structure-Activity Relationships

  • Build mathematical model: Activity = f(Structural Properties)

  • Use it to predict activity of novel compounds


Test set validation revisited good validation practice in qsar knut baumann

Model

Validation

Ultimate Goal of QSAR

  • Predictivity

  • Prerequisites:

  • Valid biological and structural data

  • Stable mathematical model

  • Exclusion of chance correlation and overfitting


Test set validation revisited good validation practice in qsar knut baumann

Outline

  • Conditions for good external predictivity

  • Practice of external validation


Test set validation revisited good validation practice in qsar knut baumann

Levels of Model Validity

  • Data fit

  • Internal predictivity  internal validation

  • External predictivity  external validation


Test set validation revisited good validation practice in qsar knut baumann

10

9

8

7

6

5

Fitted

4

3

3

4

5

6

7

8

9

10

Observed

Definition: Data Fit

The same data are used to build and to assess the model

 Resubstitution Error

GRID-PLS

R2 = 0.94

R2: squared multiple correlation coefficient

Data: HEPT; n = 53


Test set validation revisited good validation practice in qsar knut baumann

1.0

0.9

GRID-PLS

0.8

R2 / R2CV-1

max. R2CV-1

0.7

0.6

0.5

0

2

4

6

8

10

Number of PLS-Factors

Fit

Cross-Validation

Definition: Internal Predictivity

A measure of predictivity (cross-validation, validation set prediction) that is used for model selection

R2CV-1: leave-one-out cross-validated squared correlation coefficient (Q2)

Data: HEPT; n = 53


Test set validation revisited good validation practice in qsar knut baumann

Definition: External Predictivity

A measure of predictivity (cross-validation, test set prediction) for a set of data that did not influence model selection

The activity values of the test set are concealed and not known to the user during model selection


Test set validation revisited good validation practice in qsar knut baumann

GRID-PLS

1.0

max. R2Test

0.9

0.8

R2 / R2CV-1 / R2Test

max. R2CV-1

0.7

Fit

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Example: External Predictivity

Data: HEPT; n = 53, nTest = 27


Test set validation revisited good validation practice in qsar knut baumann

1.0

max. R2

0.8

0.6

R2 / R2CV-1 / R2Test

0.4

Fit

0.2

Cross-Validation

Test Set Prediction

0.0

0

5

10

15

20

25

30

35

Number of PLS-Factors

Importance of Selection Criterion

Good external predictivity

Quality of measure of predictivity for model selection!

Data: HEPT; n = 53, nTest = 27


Test set validation revisited good validation practice in qsar knut baumann

Usefulness of Internal Predictivity

Do internal measures of predictivity provide useful information?

It depends …


Test set validation revisited good validation practice in qsar knut baumann

CV:

Test:

Case 1: No Model Selection

Multiple Linear Regression:

R2CV-1 R2Test

MSEP: Mean squared error of prediction


Test set validation revisited good validation practice in qsar knut baumann

GRID-PLS

1.0

0.9

0.8

R2CV-1 / R2Test

0.7

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Stable mathematical modelling technique

&

Few models are compared

Internal  External

Case 2: Little Model Selection


Test set validation revisited good validation practice in qsar knut baumann

1.0

0.8

0.6

R2CV-1

0.4

0.2

Internal

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection


Test set validation revisited good validation practice in qsar knut baumann

1.0

max. R2CV-1

0.8

0.6

R2CV-1 /R2Test

0.4

0.2

Internal

External

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection

Extensive model selection  (danger of) overfitting 

internal measures of predictivity are of limited usefulness

Data: Steroids; n = 21, nTest = 9


Test set validation revisited good validation practice in qsar knut baumann

Outline

  • Conditions for good external predictivity

  • Practice of external validation


Test set validation revisited good validation practice in qsar knut baumann

Meaningful External Validation

  • The two Problems of external Validation:

  • Data splitting

  • Variability


Test set validation revisited good validation practice in qsar knut baumann

Problem 1: Data Splitting

Training set

Activity

values

Structure

descriptors

Test set

  • Techniques for splitting

  • Experimental design using descriptors

  • Random partition

 biased1

 variability

 Use multiple random splits into training and test sets

1) E. Roecker, Technometrics1991, 33, 459-468.


Test set validation revisited good validation practice in qsar knut baumann

Problem 2: Variability

nTest = 5rel sdv(RMSEP) = 32%

nTest = 10rel sdv(RMSEP) = 22%

nTest = 50rel sdv(RMSEP) = 10%

RMSEP: Root mean squared error of prediction


Test set validation revisited good validation practice in qsar knut baumann

Problem 2: Variability

Example Steroid data set nTest = 9

RMSEP = 0.53  R2Test = 0.73

RMSEP  2  sdv(RMSEP) = 0.53  0.25

 R2Test = [ 0.40 0.92 ]

RMSEP: Root mean squared error of prediction


Test set validation revisited good validation practice in qsar knut baumann

Problem 2: Variability

Until the test data set is huge (nTest  100)

 Use multiple random splits into training and test sets

RMSEP: Root mean squared error of prediction


Test set validation revisited good validation practice in qsar knut baumann

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

n = 29

nTest = 15

Data: W84


Test set validation revisited good validation practice in qsar knut baumann

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

100 random

splits into:

n = 29

nTest = 15

mean

Data: W84


Test set validation revisited good validation practice in qsar knut baumann

Variable Selection

GRID-PLS

mean

mean

Variability Illustrated II

Influence of extensive model selection

1.0

100 random

splits into:

n = 29

nTest = 15

0.5

R2Test

0.0

-0.5

-1.0

-1.0

-0.5

0.0

0.5

1.0

R2CV

 Extensive model selection causes instability

Data: W84


Test set validation revisited good validation practice in qsar knut baumann

Financial Support

German Research Foundation: SFB 630 – TP C5

Conclusion

  • Internal predictivity must reliably characterize model performance

  • Avoid extensive model selection if possible

  • Do not use the activity values of the test set until the final model is selected

    • Model selection: variation of any operational parameter

  • Use multiple splits into test and training set unless test set is huge

[email protected]


Test set validation revisited good validation practice in qsar knut baumann

Kubinyi-Pardoxon Explained

Data: Log P


Test set validation revisited good validation practice in qsar knut baumann

Definition: Data Fit

GRID-PLS

8

R2 = 0.99

7

6

Fitted

5

4

4

5

6

7

8

Observed

The same data are used to build and to assess the model

 Resubstitution Error

Usefulness: strongly biased


Test set validation revisited good validation practice in qsar knut baumann

8

R2 = 0.99

R2CV-1 = 0.62

7

6

Predicted

5

Fit

4

Cross-Validation

4

5

6

7

8

Observed

Internal Predictivity

GRID-PLS

Does internal predictivity provide useful information?

 It depends!


Test set validation revisited good validation practice in qsar knut baumann

Definition: Internal Predictivity

GRID-PLS

1

0.8

0.6

R2 / R2CV-1

0.4

0.2

Fit

Cross-Validation

0

0

2

4

6

8

10

Number of PLS-Factors

A measure of predictivity (cross-validation, test set prediction) that was used for model selection

Usefulness: it depends …


Test set validation revisited good validation practice in qsar knut baumann

1

0.9

0.8

0.7

R2Test

0.6

0.5

data 26

data 27

0.4

data 28

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

R2CV-1

Variability Illustrated


Test set validation revisited good validation practice in qsar knut baumann

Conclusion

  • Internal figures of merit in VS are largely inflated and can, in general, not be trusted

  • The resulting models are far more complex than anticipated

  • VS is prone to chance correlation, in particular with LOO-CV and similar statistics as objective function

  • rigorous validation mandatory

    „Trau, Schau, Wem!“ – “Try before you trust”

  • similar in spirit to:

  • „The importance of being earnest“, Tropsha et al.

For a PDF-reprint of the slides email to: [email protected]


  • Login