360 likes | 530 Views
A Formal Representation for Numerical Data Presented in Published Clinical Trial Reports. Maurine Tong BS, William Hsu PhD, Ricky K Taira PhD Medical Imaging Informatics Group University of California, Los Angeles. Problem: Querying Free Text CTRs. Clinical Trial Reports (CTRs).
E N D
A Formal Representation for Numerical Data Presented in Published Clinical Trial Reports Maurine Tong BS, William Hsu PhD, Ricky K Taira PhD Medical Imaging Informatics Group University of California, Los Angeles
Problem: Querying Free Text CTRs Clinical Trial Reports (CTRs) Informatics Applications Patient Recruitment Query Processor Internal/External Validity Testing Disease Modeling Representation
Why Focus on Numerical Info Patient Recruitment • Predictive disease modeling • Ex: Bayesian Belief Networks • Key to identifying trial quality • Hypothesis testing context and measures • Key to synthesizing evidence • What is the context for reported probabilities • P ( effect | cause, context ) Internal Validity Disease Modeling
Background and Prior Work • Ontologies for Experiments and Clinical Trials • Ontology of Clinical Research (OCRe) Sim et al. • Ontology of Scientific Experiments (EXPO) Soldatova et al. • Standardizing and sharing clinical trial data • BRIDG, CDISC, SNOMED CT • Representing individual sections of a clinical trial report • Eligibility criteria: EliXR, Weng et al. • Scientific claims: Blake et al. These systems primarily help to improve patient recruitment. Our focus is on modeling numerical information for quality assessment and disease modeling
Methods: Requirements Analysis • What are the queries to be supported by the representation? Study Quality Disease Modeling
Methods: Requirements Analysis Study Quality • Study quality queries • What is the p-value (population parameter associated with hypothesis? • What is the statistical test used to calculate the p-value? • What is the power of the sample size tested? • … Consulted textbooks and experts James Sayre, PhD Biostatician
Methods: Requirements Analysis Disease Modeling • Disease modeling queries • What are the prior probabilities? • Can we estimate posterior probabilities from p-values or other reported information? • … Consulted experts, textbooks and literature • Thomas Belin, PhD • Biostatician
Methods: Initial Design • Conceptual model of representation • Domain: Metastatic Melanoma Flaherty KT. et al. N Engl J Med. 2010 Aug 26;363(9):809-19
Pop. Stats Sample Pop. Intervention Baseline Measurements … … … … …
A Pop. Stats Sample Pop. Intervention Baseline Measurements … Process Model … … … …
Pop. Stats Sample Pop. Intervention Baseline Measurements … … B Global Variable List … … …
Pop. Stats Sample Pop. Intervention Baseline Measurements … … … … … C Variable Characterization
Pop. Stats Sample Pop. Intervention Baseline Measurements … … … … D Statistical Hypothesis Testing …
Example 1: Capturing context • Demonstration of how the representation captures context for the observations of an intervention group. • Query • Domain: Lung Cancer • In Johnson et al., what is the context (e.g., intervention, population characteristics, measurement methodology) associated with progression free survival (PFS) in the high dose group (HDG)? Johnson DH. et al. J Clin Oncol. 2004 Jun 1;22(11):2184-91.
Steps to Capture Context • Find the node in the process model • Find corresponding column • Find variable of interest • Backtrack through the process model to obtain context for observations and get associated data to backtracked node • Construct logical representation of context • Repeat steps 4-5 until the start node
Step 1: Find the node in process model This node represents the progression free survival time point for high dose group.
Step 2: Find corresponding column This column represents the numerical data and data elements associated with this node
Step 4: Backtrack & Obtain Data Obtain context by looking at linked nodes in process model
Step 5: Construct logical context Cell name: Bevacizumab Cell Location #: 474 Drug: Bevacizumab Dose: 15 mg/kg How was it administered: Vehicle: Intravenous infusion Duration: Over 90 minutes Cycle: 3 weeks Maximum dose: 18 doses Exception: Well tolerated Resulting Action: New duration Duration: 30-60 minutes Data modeling is straightforward from semantics of process model link and node
Step 6: Repeat steps 4-5 until start • Continue backtracking through process model • Aggregate associated data • Repeat until first node • Context for Adverse Event (Node #740): • Name of n847
Example 1: Capturing context • Demonstration of how the representation captures context for the observations of an intervention group. • Query • What is the context (e.g., intervention, population characteristics, measurement methodology) associated with progression free survival (PFS) in the high dose group?
Example 1: Capturing context • Data: • AssociatedContext: Context for Adverse Event (Node #740): 1 ) INTERVENTION: • Bevacizumab (Node #474) • 2) POPULATION CHARACTERISTICS: • High Dose Bev (Arm #3) • Eligibility Criteria: • Stage 3 Recurrent NSCLC (Node #847) • No Prior Chemotherapy (Node #628) • Other criteria (Node #748) • Baseline characteristics of the patient (Node #222) 3) METHODS: Progression Free Survival
Example 2: Comparisons • Comparison of outcomes in the intervention vs. control arms • Query • Compare PFS for intervention and control arm • Context from two nodes can be placed on the same chart
Example 3: Analyses • How was the p-value calculated? • Visualization includes: • Data • Test Statistics • P-value • Statement
Pilot Evaluation • Can representation answer user queries from requirements analysis? • Preliminary evaluation questions • Characteristics of the trial • Quality of the trial • Significance of the science
Evaluation: Objectives • Objective 1 • Utility of the representation to accurately identify numerical data to support key contributions made by a clinical trial report • Objective 2 • Intuitiveness of the representation through reproducibility of the visualization by different users
Evaluation: Study Design • Study design • 2-arm study • Status quo group using paper copy • Intervention group using proposed representation • Participants (n=6) • Graduate students in biology, biostatistics, informatics, or engineering • Statistical methods • Student’s paired t-test • Gold standard • Established by graduate student supervised by domain expert • 4 clinical trial papers in NSCLC • J Clin Oncol. 2004 Jun 1;22(11):2184-91. • J Clin Oncol. 2008 May 20;26(15):2442-9. • Lancet Oncol. 2012 Jan;13(1):33-42. • J Clin Oncol. 2011 Nov 1;29(31):4113-20.
Evaluation: Questions • What is the purpose of this trial? • What is the sample size for each experimental arm? • How was the primary outcome assessed? • How many patients experienced positive outcomes in this trial? • How was the data analyzed?
Evaluation: Results • Users of the representation was able to accurately identify numerical data that support key contributions as compared with status quo • User visualizations was reproducible • 68.1% ± 6.45% was of the gold standard was reproduced by users
Discussion • Our work supports queries related to study quality and disease modeling • We developed a representation to associate appropriate context from numerical data within clinical trial reports • The pilot evaluation shows that the utility of the representation is promising • To extend this work: • Instantiate using automatic methods and capture numerical data using NLP methods • Develop an interface to support frequently-asked queries for specific clinical trial reports • Test in journal club setting
Conclusion • We are establishing a systematic way of extracting information from clinical trial reports in a machine-understandable way • The overarching objective is to have a computer reason on this representation to facilitate clinical decision making
Acknowledgements • James Sayre, PhD, Biostatician • Domain experts • Research participants • NLM Training Grant • NLM R01-LM009961