data collection and management for clinical research n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data Collection and Management for Clinical Research PowerPoint Presentation
Download Presentation
Data Collection and Management for Clinical Research

Loading in 2 Seconds...

play fullscreen
1 / 73

Data Collection and Management for Clinical Research - PowerPoint PPT Presentation

  • Uploaded on

Data Collection and Management for Clinical Research. Michael A. Kohn, MD, MPP 25 August 2008. Clinical Research*. Choose the study design, and define the study population, predictor variables, and outcome variables; measure these variables and anticipate problems with measurement;

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Data Collection and Management for Clinical Research

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data collection and management for clinical research

Data Collection and Management for Clinical Research

Michael A. Kohn, MD, MPP

25 August 2008

clinical research
Clinical Research*
  • Choose the study design, and define the study population, predictor variables, and outcome variables;
  • measure these variables and anticipate problems with measurement;
  • analyze the results

Today, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements.

*Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.

  • Paper data collection forms and data tables (rows and columns)
  • “Normalization” of tables
  • Online data collection forms
  • Data entry options for coded responses
  • Questionnaires
  • Data management protocol
  • Queries (?)
one sentence description of study
One-Sentence Description of Study

The [cute acronym] study is a [design] study of the association between [predictor] and [outcome] in [study population].


The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio.

Lowenstein DH, et al. Control Clin Trials. 2001 Jun;22(3):290-309.

Alldredge BK, et al. N Engl J Med. 2001 Aug 30;345(9):631-7.


The paper data collection forms were completed by busy base hospital physicians who were called from the emergency department to a radio room. What are the advantages and disadvantages of using an on-screen computer form instead of a paper form? If you designed the study, which would you use?

advantages of an on screen form
Advantages of an on-screen form
  • No need for transcription from paper forms into the computer data tables
  • Immediate feedback on invalid entries
  • Programmed skip logic (if seizure stopped during prehospital course, computer form prompts for time seizure stopped, otherwise this field is disabled and skipped)
  • Can be made available via a web-browser at multiple sites simultaneously
disadvantages of an on screen form
Disadvantages of an on-screen form
  • Hardware requirement -- a computer workstation
  • Some user training required
advantages of a paper form
Advantages of a paper form
  • Ease and speed of use
  • Portability
  • Ability to enter unanticipated information or unstructured data (notes in the margin, responses that were not otherwise considered, etc.)
  • Hardware requirement – a pen
  • User training received by all data entry personnel in elementary school
disadvantages of a paper form
Disadvantages of a paper form
  • Requires subsequent transcription into the computer database
  • No interactive feedback or automated skip logic
  • Data viewing and entry limited to one person in one place
which would you choose
Which would you choose?

Although data entry via on-screen data collection forms has many advantages and we recommend it for most research studies, in this study it is impractical. The simplest, fastest, and most user-friendly way to capture data on a non-volatile medium is still to use a pen and paper.

paper data collection forms
Paper Data Collection Forms
  • Paper forms filled out by an examiner or interviewer
  • Paper forms filled out by the patient or subject

How do you get the data off of the paper forms and into the data table(s)?

Study data are stored in one or more data tables in which –

data tables
Data Tables

Study data are stored in one or more data tables in which –

Rows = Records = Entities

Columns = Fields = Attributes

rows records entities
Rows = Records = Entities

Columns = Fields = Attributes


Jaundice and Infant Feeding Study

Cohort study to determine the association between neonatal jaundice or feeding disorders and 5-year neuropsychological scores in children born at Northern California Kaiser Hospitals.

Newman TB, et al. N Engl J Med. 2006 May 4;354(18):1889-900.


Infant Jaundice Study

(Our fictional version of JIFee)


Design-Double Cohort study.

Setting-Single, urban medical center

Subjects-Infants with neonatal jaundice and randomly selected non-jaundiced infants

Predictor Variable-Presence or absence of jaundice

Outcome Variable- IQ score (ranging from 55 to 145) at age 5 years

Analysis- ?

where is the predictor field
Where is the Predictor Field?
  • What is the predictor in the Infant Jaundice Study?
  • Where is it on the data collection form?
demonstration creating a data table
Demonstration: Creating a Data Table

Label columns and enter rows of data in datasheet view

demonstration data dictionary
Demonstration: Data Dictionary
  • Table design view:
  • field (=column) names,
  • data types,
  • definitions,
  • validation rules
  • (More on data types, free-text vs. coded responses, later)
demonstration data validation
Demonstration: Data Validation

Disallowed values

Duplicate primary keys

demonstration same table in excel stata
Demonstration: Same Table in Excel, Stata
  • Excel
  • Stata
  • Etc

Rows = Records = Entities

Columns = Fields = Attributes

Access and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.


Table of Study Subjects

Table of Study Subjects

Row = Individual Infant

Columns = ID#, Name, DOB, Sex, Jaundice

If some infants have more than one exam, what do you do?

  • Find highest IQ Score
  • Find all exams done in April
common error
Common Error
  • If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, …
  • Or if your table is more than about 30 columns wide,
    • It is time to restructure your table.

Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.)


  • Find highest IQ Score
  • Find all exams in a particular month
  • What happened to Alejandro, Ryan, Zachary, and Jackson?


If some infants have multiple exams,

“normalize” the records into two tables, one for subjects and one for examinations.


Data normalized into two tables: one (“Baby”) with rows comprising subject-specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.


Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).

for more on normalization and the relational database model
For more on normalization and the “Relational Database Model,”

Relational Database Management Systems for Clinical Research, UCSF EPI 218 Winter 2008 (1 unit)

start with data tables or data collection forms
Start with Data Tables or Data Collection Forms?

It doesn’t matter as long as the process is iterative.

Can start with the tables and then develop the forms, test the forms, find problems, and update the tables.

Can start with a word-processed form, create the tables, test, and update.

before seeking help with data management
Before seeking help with data management

Search the internet and ask other researchers for already developed data collection forms.

Draft your data collection form.

Test your data collection form with dummy subjects and, even better, with real (de-identified) study subjects.

Enter your test data into a data table with rows corresponding to subjects and columns corresponding to data elements. (Use Excel, Access, Stata, or even Word.)

Create or at least think about a data dictionary.

Decide who will collect the data, and when/how the data will be collected.

on screen data collection forms
On-Screen Data Collection Forms
  • Will demonstrate using the “race” field from the Infant Jaundice Study
  • Free text versus coded response
  • Single response (mutually exclusive choices) versus “all that apply”
free text vs coded responses
Free Text vs. Coded Responses

Same as “Open-Ended” vs. “Closed-Ended” Questions

Free text responses useful in developing coded response options.

mutually exclusive collectively exhaustive response options
Mutually Exclusive, Collectively Exhaustive Response Options
  • One field (=column)
  • Can always make responses exhaustive by including an “Other” response
  • Drop down list (combo box) vs. pick list (field list) vs. option group
drop down list combo box
Drop-down List (Combo Box)
  • Saves screen real estate
  • Doesn’t work on paper forms

(Master form)

pick list field list
Pick List (Field List)
  • Uses up screen real estate
  • Useful on paper forms

(MasterRaceAsFieldList form)

option group
Option Group
  • Radio buttons (by convention)
  • Uses up screen real estate

(MasterRaceAsOptionGroup form)

all that apply
“All that apply”

Multiple fields (= columns)

Use check boxes (by convention)

(MasterRaceAsAllThatApply form)

from paper data forms to data table s
From Paper Data Forms to Data Table(s)*
  • Transcription directly into the table(s)
  • Transcription via an online (screen) form
  • Scanning using OMR software

*Best option: Don’t use paper data collection forms at all (more on this later).

  • nhamcs100-ED_2007.pdf
on screen vs paper forms
On-screen vs. paper forms

Minimize the extent to which study measurements are recorded on paper forms.

Enter data directly into the computer database or move data from paper forms into the computer database as close to the data collection time as possible.

When you define a variable in a computer database, you specify both its format and its domain or range of allowed values. Using these format and domain specifications, computer data entry forms give immediate feedback about improper formats and values that are out of range. The best time to receive this feedback is when the study subject is still on site.

You can always print out a paper copy of the screen form or a report of the exam/interview results once the data are collected.

Examples: ATM Machine’s printed transaction record, Gas Station’s printed receipt (but neither of these is going to be accessed or updated by a research assistant – so the printout may be less important).


Special type of data collection form

  • Patient history (interview), not physical exam or diagnostic test results*
  • Often put a premium on branching logic.

*Requires a subject who can talk. Can’t administer a questionnaire to infants or to intubated/comatose patients.

*See DCR 3rd Ed. Chapter 15

clinical research studies consistently under plan and under budget for data management
Clinical research studies consistently under-plan and under-budget for data management.
  • One FTE research assistant, including benefits, will cost your study ~$100k/year; you should be willing to spend one tenth as much for consulting on your database system.
  • Don’t leave data checking, validation, and cleaning for the biostatistician

a) it will be too late;

b) biostatisticians do not view data cleaning as part of their job.

data management protocol
Data Management Protocol
  • General description of database (including relationships diagram)
  • Data collection and entry
  • Error checking and data validation
  • Analysis (e.g., export to Stata)
  • Security/confidentiality
  • Back up
general description of database
General Description of Database
  • DBMS, e.g. MS Access XP
  • # of dynamic tables
  • # of static “lookup” tables
  • # of forms
  • # of reports

An appendix should include the relationships diagram, the table names and descriptions, and the field names and descriptions (data dictionary).

data collection and entry
Data Collection and Entry
  • Import baseline data from existing systems
  • Import lab results, scan results (e.g. DEXA), holter monitor data, and other digital data.
  • For each form, who will collect the data?
  • Collect onto paper forms and then transcribe? Enter directly using screen forms? Scannable forms?
error checking and validation
Error Checking and Validation
  • Database automatically checks data against the range of allowed values.
  • Periodic outlier detection. (Outliers still within the range of allowed values.)
  • Calculation checks
  • Double data entry?
  • How will you get the data out of the database?
security confidentiality
  • Keep identifying data (name, SSN, MRN) in a separate table.
  • Link rest of DB to this table via a Subject ID that has no meaning external to the DB.
  • Restrict access to identifying data.
  • Password protect at both OS and application levels.
  • Audit entries and updates.
back ups
Back ups
  • Ask your system person to restore a file periodically. This tests both the back-up and restore systems.

Write a one-page data management section for your research study protocol.

select queries
Select Queries

Select queries (aka “Views”) organize, sort, filter, and display data.

Queries use Standard Query Language (SQL), but you don’t have to learn it, because of graphical query design tools.

A query can join data from two or more tables, display only selected fields, and filter for records that meet certain criteria.


Age in months and BMI at exam of subjects who were examined in January and February of 2010.

select queries produce table like results
Select Queries Produce “Table-Like” Results

Note that the result of a select query that joins two tables, displays only certain fields, selects rows based on special criteria, and calculates age and BMI still looks like a table in datasheet view.

But, remember that it is merely a “view” of data from the underlying tables.

action queries change data
“Action Queries” Change Data
  • Update Query -- changes the values of specific fields in existing records
  • Append Query -- adds new records (rows) to a table
  • Delete Query -- deletes records from a table