statistical considerations in the evaluation of digital pathology devices l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistical Considerations in the Evaluation of Digital Pathology Devices PowerPoint Presentation
Download Presentation
Statistical Considerations in the Evaluation of Digital Pathology Devices

Loading in 2 Seconds...

play fullscreen
1 / 62

Statistical Considerations in the Evaluation of Digital Pathology Devices - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

Statistical Considerations in the Evaluation of Digital Pathology Devices. Hematology and Pathology Devices Panel Meeting October 22-23, 2009 Shanti Gomatam, Ph.D. Mathematical Statistician FDA/CDRH/OSB/DBS. Outline. Q 0. Outline. Intended Use Clinical Study Design Issues

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistical Considerations in the Evaluation of Digital Pathology Devices' - odell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
statistical considerations in the evaluation of digital pathology devices

Statistical Considerations in the Evaluation of Digital Pathology Devices

Hematology and Pathology Devices Panel Meeting

October 22-23, 2009

Shanti Gomatam, Ph.D.

Mathematical Statistician

FDA/CDRH/OSB/DBS

outline

Outline

Q 0

Outline
  • Intended Use
  • Clinical Study Design Issues
  • Study Design Examples
  • Assessing Results
  • Precision Studies
intended use
Intended Use

The intended use under discussion is for primary diagnosis of surgical pathology microscope slides in lieu of an optical microscopy (OM).

Broad application -- not organ or disease specific.

The Intended Use Population (IUP) is the population of subjects on whom the device is intended to be used.

supporting evidence
Supporting Evidence

Sponsors would be required to provide evidence to support safety and effectiveness of WSI under its intended use.

  • Clinical studies assess how well WSI performs with respect to OM under clinical use.
  • Precision studies characterize imprecision (variability) in WSI results.
supporting evidence flowchart
Supporting EvidenceFlowchart

Analyze Results

Establish Performance

bias and variance
Bias and Variance

Low bias, high variance

Large Bias but low variance

Low bias, low variance

bias and variance7
Bias and Variance
  • Bias is about hitting the right target.
  • Variance or imprecision is about how close together your repeated attempts are.
  • Right data (right study design) helps reduce bias; more data does not help.
  • More data can help reduce uncertainty (imprecision).
clinical study design issues
Clinical Study Design Issues

Factors to consider:

  • Diagnostic reference standard
  • Time of specimen collection
  • Comparing modalities
  • Paired design
  • Reader design
  • Sample Selection
diagnostic reference standard reference diagnosis

Clinical Study Design

Diagnostic Reference Standard(Reference Diagnosis)
  • Diagnostic accuracy is based on determination of “truth” via a diagnostic reference standard (See FDA Diagnostic Guidance1).
  • Diagnostic reference standard allows determination of accuracy (e.g, TP, FP, TN, FN).
  • The diagnostic reference standard should not be based on the device being evaluated for accuracy.
  • When diagnostic reference is based on control device (OM), there can be potential bias in comparison.

1 FDA Guidance document: Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic tests.

prospective studies

Q 3.2

Clinical Study Design

Time of Specimen Collection

Prospective Studies

Prospective studies are those in which specimens (cases/slides) are prospectively collected and assessed by each modality (WSI or OM).

  • Prospective planning required.
  • Common protocol used across specimens.
  • Prospective studies less likely to be biased.
  • Study duration is potentially longer.
  • Final collection of study specimens may not contain all specimens of interest.
retrospective studies

Q 3.2

Clinical Study Design

Time of Specimen Collection

Retrospective Studies

Retrospective studies are based on specimens that were previously collected from the patient.

  • Easier to enrich.
  • Potential for bias - selection criteria; hidden missing sample/data issues; variation in pre-analytical processes
  • Potential lack of clinical, demographic, and other information for specimens (case/slide)
comparing modalities

Clinical Study Design

Comparing Modalities
  • Best to compare WSI to OM (“control”) on same samples.
  • Avoid potential bias due to change in clinical practice, change in other time- or location-dependent factors.
  • Difficult to evaluate WSI without comparison to control device OM.
paired designs

Clinical Study Design

Paired designs

When each specimen (case/slide) is tested with both WSI and OM the study design is paired.

  • Paired designs have good properties.

Design considerations:

  • Memory of first reading can affect next reading (non-washout).
  • Order of WSI and OM readings should be randomized.
slide14

Clinical Study Design

Paired Designs

OM reading

slide15

Clinical Study Design

Paired Designs

OM reading

slide16

Clinical Study Design

Paired Designs

OM reading

WSI reading

slide17

Clinical Study Design

Paired Designs

WSI reading

slide18

Clinical Study Design

Paired Designs

WSI reading

slide19

Clinical Study Design

Paired Designs

WSI reading

OM reading

paired designs20
Paired designs

Clinical Study Design

Paired Designs

S

P

E

C

I

M

E

N

S

OM

-----------------

WSI

WSI

-----------------

OM

WASHOUT

TIME

reader design

Clinical Study Design

Reader design
  • Pathologists are “readers” for this indication.
  • Reader effect makes a difference to results obtained.
  • Reader designs:
    • every reader reads every specimen under every modality
    • each reader reads a different subset of specimens under a single modality.
  • The first design is most efficient.
sample selection

Q 3.2

Clinical Study Design

Sample Selection
  • Non-representative samples may lead to conclusions that are not generalizable to the IUP (bias may be high and variance estimates may be incorrect).
  • Random selection from IUP is preferred statistical choice. Consecutive (sequential) selection from IUP may be reasonable (under suitable conditions) .
  • Enrichment may be necessary to have rare conditions represented in sufficient numbers.
sample selection23

Q 3.2

Clinical Study Design

Sample Selection
  • Adequate representation of non-disease and benign disease cases needed.
  • Factors to be considered while picking sample:
    • Organ/disease for which specimens are collected;
    • Type of specimen (needle biopsy, resection etc.);
    • Potential spectrum effect (level of difficulty -- case-mix);
    • Clinical center/site from which samples are obtained.
  • Ideally statistical mechanism for drawing specimen does not introduce bias; pre-specification preferred.
common elements of all design examples

Study Design Examples

Common Elements of all Design Examples
  • Specimens picked from regular clinical practice at multiple sites.
  • Paired design; Specimen order and order of read are randomized.
  • Diagnostic reference standard available for statistical analysis.
  • Readers read de-identified specimens.
  • Results from specimens are compared on diagnoses.
study i prospective clinical study

Study Design Examples

Study I: Prospective Clinical Study
  • Prospective study using consecutive clinical specimens.
  • R pathologists at each site read all specimens at site with WSI and OM with appropriate washout.
study ii retrospective enriched clinical study

Study Design Examples

Study II: Retrospective Enriched Clinical Study
  • Prospectively planned retrospective study using enriched clinical specimens randomly picked from those available.
  • R pathologists at each site read all specimens at all sites with WSI and OM
  • Non-study pathologist reads specimens to implement enrichment; Study pathologists blinded to enrichment read.
study iii retrospective clinical study

Study Design Examples

Study III: Retrospective Clinical Study
  • Prospectively planned retrospective study using consecutive clinical specimens.
  • R pathologists at each site read all specimens at all sites with WSI and OM.
study i prospective clinical study29

Study Design Examples

Study I: Prospective Clinical Study

Pros:

  • Representative of intended use
  • Ensures planning (prospective)
  • Common protocol (prospective)
  • Reduction in bias (prospective)
  • Reader design not as efficient

Cons:

  • Potential implementation challenges (prospective)
  • May take longer (non-enriched, prospective)
  • Reader behavior could be affected (multiple reads)
study ii retrospective enriched clinical study30

Study Design Examples

Study II: Retrospective Enriched Clinical Study

Pros:

  • Easier to implement (retrospective)
  • Potentially smaller sample size (enrichment)
  • Ensures some planning (prospectively planned)
  • Reader effect efficient (All cases read with both)

Cons:

  • Lack of common protocol (retrospective)
  • Potential bias (retrospective)
  • Reader behavior could be affected (enrichment + multiple reads)
study iii retrospective clinical study31

Study Design Examples

Study III: Retrospective Clinical Study

Pros:

  • Ensures some planning
  • Potentially shorter duration (retrospective)
  • Potentially larger sample size (non-enriched)
  • Reader design efficient

Cons:

  • Lack of common protocol (retrospective)
  • Potential bias (retrospective)
  • Reader behavior could be affected (multiple reads)
assessing results

Assessing Results

Assessing Results
  • Attributes/measurements to be evaluated
  • Hypotheses on Attributes
  • Study success criterion
  • Study sizing
examples

Assessing Results

Examples

Two organ systems will be used as examples in the following slides.

  • Breast: CAP Breast IC protocol checklist
  • Lung: CAP Lung IC Biopsy protocol checklist
cap protocol for breast ic macroscopic elements

Assessing Results

CAP Protocol for Breast ICMacroscopic Elements
  • Specimen Type
  • Lymph Node Sampling
  • Specimen Size
  • Laterality
  • Tumor Site
cap protocol for breast ic cont microscopic elements

Assessing Results

CAP Protocol for Breast IC (cont.)Microscopic elements
  • Size of invasive component
  • Histologic Type (check all that apply):
    • ___ Noninvasive carcinoma (NOS)
    • ___ Ductal carcinoma in situ
    • ___ Lobular carcinoma in situ
    • ___ Other(s) (specify): ____________________________
    • ___ Carcinoma, type cannot be determined
cap protocol for breast ic cont microscopic elements37

Assessing Results

CAP Protocol for Breast IC(cont.)Microscopic elements
  • Histologic Grade:
    • Nottingham Histologic Score (Tubule formation; nuclear Pleomorphism; Mitotic count)

OR

    • Other Grading System + Mitotic count
  • Pathologic Staging
  • Margins
  • Venous/Lymphatic Invasion
  • Microcalcifications
  • Additional Pathologic Findings
cap protocol for lung ic biopsy microscopic elements

Assessing Results

CAP Protocol for Lung IC BiopsyMicroscopic elements
  • Histologic Type:
    • ___ Carcinoma, non-small cell type
    • ___Small cell carcinoma
    • ___ Squamous cell carcinoma
    • ___ Other(s) (specify): ____________________________
    • ___ Carcinoma, type cannot be determined
cap protocol for lung ic biopsy microscopic elements39

Assessing Results

CAP Protocol for Lung IC Biopsy Microscopic elements
  • Histologic Grade:
    • ___ Not applicable
    • ___ GX: Cannot be assessed
    • ___ G1: Well differentiated
    • ___ G2: Moderately differentiated
    • ___ G3: Poorly differentiated
    • ___ G4: Undifferentiated
    • ___ Other (specify): ______
  • Visceral Pleura Invasion
  • Venous Invasion
  • Lymphatic Invasion
  • Additional Pathologic Findings
measurements

Assessing Results

Measurements
  • Measurements vary by tissue-type.
  • Measurements vary by pathological findings.
  • Lots of potential measurements per specimen.
  • What results/findings should one use to assess device performance?
selecting measurements

Assessing Results

Selecting Measurements
  • Should one assess on the basis of: Case (multiple slides); or single whole slide?
  • Should microscopic and/or macroscopic findings be assessed?
  • Pathological report has multiple “lines” of results each potentially containing information on type, grade, size, … How many “lines” is it sufficient to assess agreement on?
selecting measurements42

Assessing Results

Selecting Measurements
  • What fields within each “line” should be compared?
    • Histologic type
    • Histologic grade
    • Histologic determination of size (for case) using multiple slides
  • Results are tissue-type/disease dependent.
potential measurements for performance comparison

Assessing Results

Potential Measurements for Performance Comparison
  • Disease/non-disease status
  • Primary diagnosis only (main diagnosis for specimen); Some diagnoses from pathological evaluation; All diagnoses from pathological evaluation
  • Any of the above will have multiple measurements of different kinds: Type is nominal, grade is ordinal, size is interval, …
primary and secondary measurements

Q 3.5

Assessing Results

“Primary” and “Secondary” Measurements
  • Agreement on which measurements is key for regulatory decisions? (“Primary” measurements)
  • What additional comparisons are useful to report? (“Secondary” measurements)
assessing accuracy scales

Assessing Results

Assessing AccuracyScales

Accuracy and comparative performance can be assessed at various levels and for different outcomes:

  • On binary scale (eg. disease/non-disease)
  • On nominal scale (eg. Histologic type)
  • On ordinal scale (eg. Histologic grade)
  • On continuous scale (eg. Tumor size or probability of being diseased)
more on assessing accuracy

Assessing Results

More on Assessing Accuracy
  • Sensitivity and specificity can be used for assessments on the binary scale
  • Agreements on ordinal scale can be evaluated using sensitivities/specificities conditioning on category and ROC-based methods.
  • Many methods exist for assessing agreement on a continuous scale.
  • Nonparametric methods for assessing diagnostic accuracy on all scales*.

* Obuchowski (2005), Acad. Radiol.

assessing nominal accuracy

Assessing Results

Assessing Nominal Accuracy
  • Histologic type is important attribute for performance assessment
  • KxK* table for nominal types
    • WSI with Reference

and

    • OM with Reference
  • Example using Breast IC histologic types

* K is the number of types of responses

assessing nominal accuracy k by k tables

Assessing Results

Assessing Nominal AccuracyK by K tables

NIC: Non-invasive carcinoma;

DCIS: ductal carcinoma in situ;

C,ND: Carcinoma, not determined.

assessing nominal accuracy49

Assessing Results

Assessing Nominal Accuracy
  • Can use percent “correct” calls for each of the K types.
  • If K is large, then need large N to power estimates.
  • Can also reduce K by combining categories into subgroups
assessing nominal accuracy50

Assessing Results

Assessing Nominal Accuracy
  • If ordinal subgroups possible, can have ordinal analyses.
  • May also be able to define differences between categories in terms of clinical importance – this could reduce table size and create ordinal categories
  • However, loss of information should be considered when combining categories.
problems with kappa and overall agreement

Assessing Results

Problems with Kappa and Overall Agreement
  • Not good as primary descriptive measures.
  • Summarizes KxK table by a single number, severe reduction in information.
  • Depends on prevalence. Agreement between WSI and OM can change by changing proportion of diseased and non-diseased subjects (column totals).
  • “Not good for tests whose results are functions of reader variability.”*

* Obuchowski (2001), Stat. Med.

problems with kappa and overall agreement52

Assessing Results

Problems with Kappa and Overall Agreement

kappa = 0.36 kappa=0.29

Overall agreement=91.6% Overall agreement=89%

Sensitivity=40% Sensitivity=40%

Specificity=94.4% Specificity=94.4%

hypotheses and study success criterion

Assessing Results

Hypotheses and Study Success Criterion
  • Regulatory decisions on WSI based on WSI performance in comparison to OM.
  • What hypotheses are appropriate on “primary” and “secondary” measurements? Superiority? Non-inferiority?
  • What are acceptable definitions of study success criteria?
study sizing

Assessing Results

Study Sizing
  • Study success criterion must be met.
  • The study is typically sized to power for hypotheses to be satisfied for study success criterion.
precision studies definition
Precision StudiesDefinition
  • CLSI* definition of precision “measure of closeness of agreement between independent test/measurement results obtained under stipulated conditions.”
  • Studies to assess variability in WSI measurements when changes are made to important factors (sources of variability).
  • Repeatability and Reproducibility are considered extreme measures of precision.

* CLSI: Clinical Laboratory and Standards Institute

precision studies definition57
Precision StudiesDefinition
  • Repeatability: Imprecision of measurements made under the same conditions (same pathologist, scanner, …).
  • Reproducibility: Imprecision of measurements made when conditions are varied to “largest” extent (different pathologists, scanners, laboratories, …).
  • Multiple studies can be used (varying or fixing various factors) to cover the range of precision measurements.
precision studies issues

Q4

Precision StudiesIssues
  • What factors to be included in precision study?
  • What specimens to be used for precision studies for WSI?
    • Representation of all tissue-types needed?
    • Representation of all potential specimens (e.g., needle biopsy, resections …) needed?
precision studies issues59
Precision StudiesIssues

Issues common to clinical study:

  • Sample selection
  • Measurements to be assessed
  • Study Sizing
precision studies non continuous measurements
Precision StudiesNon-continuous measurements

Methods exists for precision assessment of continuous measurements but no uniform agreement on methods for non-continuous (ordinal, qualitative) measurements.

precision study example
Precision studyExample
  • A precision study to characterize imprecision of histologic grade measurements by WSI across scanners.
  • 3 scanners, 2 pathologists, single site
  • Each pathologist does 60 reads (3 scans of, e.g., 20 slides) with washout.
  • Order of scans and order of de-identified slides is randomized.
  • Positive and negative “correct” call rates across scanners can be used to characterize imprecision.
summary
Summary
  • Intended use drives studies needed for approval.
  • Study design is critical.
  • Differences in measurements for different tissue-types and specimen collection procedures complicates assessment.
  • A collection of studies would probably be needed to assess different aspects of the device.
  • Comparative performance on all critical measurements should be evaluated.
  • Adequate sizing is important.