1 / 47

Planning and Budgeting for Data Management in a Clinical Research Study

Planning and Budgeting for Data Management in a Clinical Research Study. Michael A. Kohn, MD, MPP 1 February 2005. Outline. Assignment 3 Review Guidelines for Research Databases Loose Ends: BLOBs Planning and Budgeting for Data Management in a Research Project. Assignments (cont’d).

kineks
Download Presentation

Planning and Budgeting for Data Management in a Clinical Research Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 1 February 2005

  2. Outline • Assignment 3 Review • Guidelines for Research Databases • Loose Ends: BLOBs • Planning and Budgeting for Data Management in a Research Project

  3. Assignments (cont’d) Lab 3: Exporting and Analyzing Data 1/25,26/2005 Option 1 (Epi/Biostats Students): Determine if neonatal jaundice was associated with the 5-year neuropsychiatric scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association. Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744) Send assignment to ucsfdbclass@yahoo.com by 1/31/2005.

  4. Assignments (cont’d) Lab 3: Exporting and Analyzing Data 1/25,26/2005 Option 2 (CRC Students): Answer a research question of your own by querying an existing database. Display your results in a paragraph, table, or figure appropriate for presentation to others in your field. Send assignment to ucsfdbclass.yahoo.com by 1/31/2005.

  5. Answer Of the infants with neonatal jaundice, 149 had neuropsychiatric exams at age 5, and of the infants without neonatal jaundice, 248 had neuropsychiatric exams. The mean (+SD) neuropsychiatric score was significantly higher in the jaundice group, 111.5 +21.1, than in the no-jaundice group 101.4+20.5 (P<0.0001).

  6. Would you submit this for publication? ----------------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- No | 248 101.3925 1.303441 20.52661 98.8252 103.9597 Yes | 149 111.5358 1.732576 21.14879 108.112 114.9596 ---------+-------------------------------------------------------------------- combined | 397 105.1994 1.06956 21.31083 103.0967 107.3021 ---------+-------------------------------------------------------------------- diff | -10.14332 2.152007 -14.37414 -5.912502 ------------------------------------------------------------------------------ Degrees of freedom: 395 Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -4.7134 t = -4.7134 t = -4.7134 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

  7. Essential Elements • Sample size (149 jaundiced, 248 non-jaundiced) • Indication of effect size (report both means, or the difference between them) • Comment on direction of effect (Jaundiced group did better!) • Indication of variability (Sample SDs, SEs of means, CIs of means, or CI of difference between means.) • Report of within-subject SD as a measure of reliability. • Comment on reliability

  8. Browner on Figures Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

  9. Cutoff at 50? Caption should be below figure. What are the error bars? “Neuopsychiatric”

  10. Cutoff at 60? 95% CI, Standard error of mean, or sample SD? Caption should be below figure.

  11. Browner on 3-D Figures Three dimensional graphs usually are not helpful. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 97 Also, note that the 3-D is only an effect. The data are two dimensional (score by jaundice).

  12. Takes the prize for ugliest figure.

  13. Caption not sufficiently explanatory.

  14. Methods or Results? We assessed inter-rater reliability of the neuropsychiatric test scores by having different examiners re-test 198 of the children. The within-subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744)

  15. What Have You Learned? • The meaning and importance of the terms “normalization”, “primary key”, and “foreign key”. • The difference between a flat-file database, and a normalized, multi-table relational database. • A little bit of Microsoft Access 2000 • Querying data • Exporting data for analysis in a statistical package

  16. Guidelines for Data Management in Clinical Research 1. Establish the database tables, their rows and columns, and their relationships correctly at the outset.   A poorly organized database makes data maintenance and retrieval nearly impossible. Make sure the data are normalized. The data structures should never require duplicate data entry or redundant storage. Sometimes it helps to start with the data collection forms, but remember, you do NOT need one table per data collection form. In the labs you learned that one form can combine data from several tables. And data from one table can appear on several forms.

  17. Start with Data Tables or Data Collection Forms? It doesn’t matter as long as the process is iterative. Can start with the tables and then develop the forms, test the forms, find problems, and update the tables. Can start with a word-processed form, create the tables, test, and update.

  18. Guidelines for Data Management in Clinical Research 2. Establish and follow naming conventions for columns and tables. Short field names without spaces or underscores are convenient for programming, querying, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper case letters within the variable name) to distinguish words, e.g. “SubjectID”, “FName”, or “ExamDate”. Table names should be singular, e.g. “Baby” instead of “Babies”, “Exam” instead of “Exams”.

  19. Guidelines for Data Management in Clinical Research 3. Obtain baseline demographic and clinical information about members of the study population from existing computer databases. Avoid re-entering data which are already available (in digital formats) from other sources. In the JIFee Study, the patient demographic data and contact information are obtained from the hospital database. Computer systems can almost always produce text-delimited or fixed-column-width character files that the database management system can import.

  20. Guidelines for Data Management in Clinical Research 4. Minimize the extent to which study measurements are recorded on paper forms. Enter data directly into the computer database or move data from paper forms into the computer database as close to the data collection time as possible. When you define a variable in a computer database, you specify both its format and its domain or range of allowed values. Using these format and domain specifications, computer data entry forms give immediate feedback about improper formats and values that are out of range. The best time to receive this feedback is when the study subject is still on site.

  21. On-screen vs. paper forms You can always print out a paper copy of the screen form or a report of the exam/interview results once the data are collected. Examples: ATM Machine’s printed transaction record, Gas Station’s printed receipt

  22. Guidelines for Data Management in Clinical Research 5. Follow standard data entry conventions. Several conventions for data entry and display have developed over time. Although most users of screen forms are not aware of these conventions, they have come to expect them subconsciously. For example, a series of mutually exclusive, collectively exhaustive choices is usually displayed as an “option group” consisting of several different “radio buttons”, whereas choices which are not mutually exclusive are displayed as check boxes. N.B. An “option group” of mutually exclusive choices is a single column or field. A group of N check boxes represents N yes/no fields.

  23. Use check boxes when options are not mutually exclusive. (5 fields) Use radio buttons when options are mutually exclusive. (1 field) Computer chart abstraction form showing two common data entry conventions.

  24. Guidelines for Data Management in Clinical Research 6.      Back up the database regularly and check the adequacy of the back up procedure by periodically restoring a file from the back up medium.

  25. Demonstration (BLOB) Field types are not limited to numbers, text, dates. You can put an “object”, such as a Word document or a photo, in a field Memo fields in the Infant Jaundice Database Word Document Fields on the “Class” form of the ATCR Student Database Photograph fields in the ATCR Student Database

  26. Four Types of Research Database • Combination of paper files, Excel spreadsheets, and direct keyboard entry into the statistical analysis package. • Desktop multi-table relational database.* • Client-Server or “Enterprise” multi-table relational database. • Internet database server. * Best fit for most people in this class

  27. Desktop DBMS The processing of records is done by the desktop. The server simply stores files (file server). Microsoft Access Claris Filemaker Pro Paradox Microsoft Visual FoxPro Dataease

  28. Client-Server DBMS The processing of records is done by the server. The desktop manages the screen, but passes queries on to the server. (Just to confuse things, MS Access can be a client for SQL Server, and other enterprise systems. The ultimate in “thin” clients is a browser (Internet Explorer). In this case, the server is an intranet or internet database server.) Microsoft SQL Server Oracle Informix Sybase

  29. File Server SQL Server Server thinks too! Workstation does all the “thinking”… Client Machine Client Machine File Server vs. Client Server

  30. Advice on Building a Desktop Multi-Table Relational Database for your Study • Build it yourself using what you learned in this class—with occasional help from a database expert • Budget $500-$1000 per month out of your grant for database consulting during the design phase. • Take advantage of your departmental resources. • Take advantage of campus resources. • Don’t confuse database development with network administration and systems management.

  31. Costs There is no adequate, on-campus resource for database design consulting. (If there were, it would cost $100/hour just like biostatistical consulting.) Independent database consultants also cost $100+ per hour.

  32. Costs The JIFee Study developed a comprehensive database for study administrative data as well as results. They have a full time project coordinator and spent about $10,000 on database consulting. Total cost of the JIFee Database in time and money was at least $25,000.

  33. Departmental Resources • Your department should provide you with a networked desktop computer, as well as network support, server access, and database hosting. However, the departmental computer person will NOT be able to help you with database design or development. • System administrators do not and cannot build database management systems.

  34. Campus Resources • GCRC/PCRC Informatics Lab (Requires an approved CRC protocol and approval from CRC Director) • Independent consultants. • Other campus resources? Library? PSG?

  35. Investigators Using Data Management Skills from this Course

  36. Petra Liljestrand (Study Coordinator, JIFee)Jessica Zegre (Study Coordinator, Immemdiate AIM)Jon Zaroff (PI, CRASH)Jim Quinn (PI, ED Syncope and Dog Bite Studies)Candice Wong (PI, Chinese Smoking Cessation Study)Rebecca Sudore (Co-Investigator, Advance Directives Study)Cari DeLoa (Study Coordinator, MS Genetics)Mark Pletcher (PI, Alcohol Withdrawal Study)Matthew Riley (PI, Pediatric NAFLD Study)Grace Yoon (PI, ATS Study) Cathy Lomen-Hoerth (PI, ALS Studies) Jay Garg (Co-Investigator, ESRD and Hypertenstion)Irina Garadetskaya (Study Coordinator, CRIC)Serge Lindner (PI, Advance Directives)

  37. Anil Sapru (PI, Pediatric DKA)Emily Von Scheven (PI, Pediatric Rheumatology Studies)Roberta Keller (PI, Neonatology Studies)Carolyn Hoppe (Pediatric Hematology)Mark Eisner (PI, Kaiser COPD Study)Heidi Flori (PICU Studies, CHO)Matthew Reeves (OB/Gyn)Lee Zane (PI, Stress and Acne Study)Yvonne Wu (Pediatric Neurology)

  38. Data Management Protocol • General description of database • Data collection and entry • Error checking and data validation • Analysis (e.g., export to Stata) • Security/confidentiality • Back up

  39. General Description of Database • DBMS, e.g. MS Access XP • # of dynamic tables • # of static “lookup” tables • # of forms • # of reports An appendix should include the relationships diagram, the table names and descriptions, and the field names and descriptions (data dictionary).

  40. Data Collection and Entry • Import baseline data from existing systems • Import lab results, scan results (e.g. DEXA), holter monitor data, and other digital data. • For each form, who will collect the data? • Collect onto paper forms and then transcribe? Enter directly using screen forms? Scannable forms?

  41. Error Checking and Validation • Database automatically checks data against the range of allowed values. • Periodic outlier detection. (Outliers still within the range of allowed values.) • Calculation checks • Is double data entry really needed ?

  42. Analysis • How will you get the data out of the database?

  43. Security/Confidentiality • Keep identifying data (name, SSN, MRN) in a separate table. • Link rest of DB to this table via a Subject ID that has no meaning external to the DB. • Restrict access to identifying data. • Password protect at both OS and application levels. • Audit entries and updates.

  44. Back ups • Ask your system person to restore a file periodically. This tests both the back-up and restore systems.

  45. Assignment 4 Data Management Protocol Write a one-page data management section for your research study protocol or a one-page description of your current research study database. At the beginning of your assignment, for the readers, briefly describe your study, including design, predictors, outcomes, target population, and sample size. (1 or 2 sentences) Include with your assignment a relationships diagram showing the structure of your study database. Send assignment to ucsfdbclass@yahoo.com by 2/14/2005.

More Related