Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann

1 / 34

# Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann - PowerPoint PPT Presentation

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann Department of Pharmacy, University of Würzburg, Germany. = f ( ). k. Quantitative Structure-Activity Relationships. Build mathematical model: Activity = f (Structural Properties)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann' - tana

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Knut Baumann

Department of Pharmacy, University of Würzburg, Germany

= f( )

k

Quantitative Structure-Activity Relationships

• Build mathematical model: Activity = f(Structural Properties)
• Use it to predict activity of novel compounds

Model

Validation

Ultimate Goal of QSAR

•  Predictivity
• Prerequisites:
• Valid biological and structural data
• Stable mathematical model
• Exclusion of chance correlation and overfitting

Outline

• Conditions for good external predictivity
• Practice of external validation

Levels of Model Validity

• Data fit
• Internal predictivity  internal validation
• External predictivity  external validation

10

9

8

7

6

5

Fitted

4

3

3

4

5

6

7

8

9

10

Observed

Definition: Data Fit

The same data are used to build and to assess the model

 Resubstitution Error

GRID-PLS

R2 = 0.94

R2: squared multiple correlation coefficient

Data: HEPT; n = 53

1.0

0.9

GRID-PLS

0.8

R2 / R2CV-1

max. R2CV-1

0.7

0.6

0.5

0

2

4

6

8

10

Number of PLS-Factors

Fit

Cross-Validation

Definition: Internal Predictivity

A measure of predictivity (cross-validation, validation set prediction) that is used for model selection

R2CV-1: leave-one-out cross-validated squared correlation coefficient (Q2)

Data: HEPT; n = 53

Definition: External Predictivity

A measure of predictivity (cross-validation, test set prediction) for a set of data that did not influence model selection

The activity values of the test set are concealed and not known to the user during model selection

GRID-PLS

1.0

max. R2Test

0.9

0.8

R2 / R2CV-1 / R2Test

max. R2CV-1

0.7

Fit

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Example: External Predictivity

Data: HEPT; n = 53, nTest = 27

1.0

max. R2

0.8

0.6

R2 / R2CV-1 / R2Test

0.4

Fit

0.2

Cross-Validation

Test Set Prediction

0.0

0

5

10

15

20

25

30

35

Number of PLS-Factors

Importance of Selection Criterion

Good external predictivity

Quality of measure of predictivity for model selection!

Data: HEPT; n = 53, nTest = 27

Usefulness of Internal Predictivity

Do internal measures of predictivity provide useful information?

It depends …

CV:

Test:

Case 1: No Model Selection

Multiple Linear Regression:

R2CV-1 R2Test

MSEP: Mean squared error of prediction

GRID-PLS

1.0

0.9

0.8

R2CV-1 / R2Test

0.7

0.6

Cross-Validation

Test Set Prediction

0.5

0

2

4

6

8

10

Number of PLS-Factors

Stable mathematical modelling technique

&

Few models are compared

Internal  External

Case 2: Little Model Selection

1.0

0.8

0.6

R2CV-1

0.4

0.2

Internal

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection

1.0

max. R2CV-1

0.8

0.6

R2CV-1 /R2Test

0.4

0.2

Internal

External

0.0

9000

18000

27000

36000

45000

0

No. Models eval.

Case 3: Extensive Model Selection

Here: Variable Subset Selection

Extensive model selection  (danger of) overfitting 

internal measures of predictivity are of limited usefulness

Data: Steroids; n = 21, nTest = 9

Outline

• Conditions for good external predictivity
• Practice of external validation

Meaningful External Validation

• The two Problems of external Validation:
• Data splitting
• Variability

Problem 1: Data Splitting

Training set

Activity

values

Structure

descriptors

Test set

• Techniques for splitting
• Experimental design using descriptors
• Random partition

 biased1

 variability

 Use multiple random splits into training and test sets

1) E. Roecker, Technometrics1991, 33, 459-468.

Problem 2: Variability

nTest = 5 rel sdv(RMSEP) = 32%

nTest = 10 rel sdv(RMSEP) = 22%

nTest = 50 rel sdv(RMSEP) = 10%

RMSEP: Root mean squared error of prediction

Problem 2: Variability

Example Steroid data set nTest = 9

RMSEP = 0.53  R2Test = 0.73

RMSEP  2  sdv(RMSEP) = 0.53  0.25

 R2Test = [ 0.40 0.92 ]

RMSEP: Root mean squared error of prediction

Problem 2: Variability

Until the test data set is huge (nTest  100)

 Use multiple random splits into training and test sets

RMSEP: Root mean squared error of prediction

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

n = 29

nTest = 15

Data: W84

1.0

0.9

0.8

0.7

R2Test

0.6

0.5

0.4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

R2CV-1

Variability Illustrated I

GRID - PLS

100 random

splits into:

n = 29

nTest = 15

mean

Data: W84

Variable Selection

GRID-PLS

mean

mean

Variability Illustrated II

Influence of extensive model selection

1.0

100 random

splits into:

n = 29

nTest = 15

0.5

R2Test

0.0

-0.5

-1.0

-1.0

-0.5

0.0

0.5

1.0

R2CV

 Extensive model selection causes instability

Data: W84

Financial Support

German Research Foundation: SFB 630 – TP C5

Conclusion

• Internal predictivity must reliably characterize model performance
• Avoid extensive model selection if possible
• Do not use the activity values of the test set until the final model is selected
• Model selection: variation of any operational parameter
• Use multiple splits into test and training set unless test set is huge

knut.baumann@chemometrix.de

Definition: Data Fit

GRID-PLS

8

R2 = 0.99

7

6

Fitted

5

4

4

5

6

7

8

Observed

The same data are used to build and to assess the model

 Resubstitution Error

Usefulness: strongly biased

8

R2 = 0.99

R2CV-1 = 0.62

7

6

Predicted

5

Fit

4

Cross-Validation

4

5

6

7

8

Observed

Internal Predictivity

GRID-PLS

Does internal predictivity provide useful information?

 It depends!

Definition: Internal Predictivity

GRID-PLS

1

0.8

0.6

R2 / R2CV-1

0.4

0.2

Fit

Cross-Validation

0

0

2

4

6

8

10

Number of PLS-Factors

A measure of predictivity (cross-validation, test set prediction) that was used for model selection

Usefulness: it depends …

1

0.9

0.8

0.7

R2Test

0.6

0.5

data 26

data 27

0.4

data 28

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

R2CV-1

Variability Illustrated

Conclusion

• Internal figures of merit in VS are largely inflated and can, in general, not be trusted
• The resulting models are far more complex than anticipated
• VS is prone to chance correlation, in particular with LOO-CV and similar statistics as objective function
• rigorous validation mandatory

„Trau, Schau, Wem!“ – “Try before you trust”

• similar in spirit to:
• „The importance of being earnest“, Tropsha et al.

For a PDF-reprint of the slides email to: knut.baumann@chemometrix.de