review of assignment 3 loose ends web based data collection n.
Skip this Video
Loading SlideShow in 5 Seconds..
Review of Assignment 3, Loose Ends, Web-based Data Collection PowerPoint Presentation
Download Presentation
Review of Assignment 3, Loose Ends, Web-based Data Collection

Review of Assignment 3, Loose Ends, Web-based Data Collection

141 Views Download Presentation
Download Presentation

Review of Assignment 3, Loose Ends, Web-based Data Collection

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Review of Assignment 3, Loose Ends, Web-based Data Collection Michael A. Kohn, MD, MPP 3 February 2009

  2. Outline • Assignment 3 Review • Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends, On-Screen Data Entry Conventions • Web-based Data Entry • Assignment 4

  3. Housekeeping • Database demos with advice for Assignment 4: Tuesday 2/10 • Carolyn Calfee • Janet Turan • Mary Farrant • Assignment 4 is due 2/16 • Please try to return the Learn MS Access 2000 CD

  4. Assignment 3 Lab 3: Exporting and Analyzing Data 1/27/2009 Determine if neonatal jaundice was associated with the 5-year IQ scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association. Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744)

  5. Answer Of the infants with neonatal jaundice, 149 had IQ tests at age 5, and of the infants without neonatal jaundice, 248 had IQ tests. The mean (+SD) IQ score was significantly higher in the jaundice group, 111.5 +21.1, than in the no-jaundice group 101.4+20.5 -- difference 10.1 (95% CI 5.9 – 14.4).

  6. Newman T et al. N Engl J Med 2006;354:1889-1900

  7. Would you submit this for publication? ----------------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- No | 248 101.3925 1.303441 20.52661 98.8252 103.9597 Yes | 149 111.5358 1.732576 21.14879 108.112 114.9596 ---------+-------------------------------------------------------------------- combined | 397 105.1994 1.06956 21.31083 103.0967 107.3021 ---------+-------------------------------------------------------------------- diff | -10.14332 2.152007 -14.37414 -5.912502 ------------------------------------------------------------------------------ Degrees of freedom: 395 Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -4.7134 t = -4.7134 t = -4.7134 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

  8. Essential Elements • Sample size (149 jaundiced, 248 non-jaundiced) • Indication of effect size (report both means, or the difference between them) • Get direction of effect right (Jaundiced group did better!) • Indication of variability (Sample SDs, SEs of means, CIs of means, or CI of difference between means.)

  9. Browner on Figures Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

  10. Cutoff at 50? Caption should be below figure. What are the error bars? “Neuopsychiatric”

  11. Cutoff at 60? Caption should be below figure.

  12. Browner on 3-D Figures Three dimensional graphs usually are not helpful. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 97 Also, note that the 3-D is only an effect. The data are two dimensional (score by jaundice).

  13. Takes the prize for ugliest figure.

  14. Caption not sufficiently explanatory. Sample size?

  15. Figure 1: In 149 infants with neonatal jaundice, the average IQ scores were higher compared to the 248 non-jaundiced infants when evaluated at age 5 (p<0.0001).

  16. Box Plot • Median Line • Box extends from 25th to 75th percentile • Whiskers to upper and lower adjacent values • Adjacent value = 75th /25th percentile ±1.5 x IQR (interquartile range) • Values outside the adjacent values are graphed individually • Would be nice if area (or at least width) of box were proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.

  17. Extra Credit Extra Credit • Report within-subject SD (4.0) as a measure of reliability. • Calculate repeatability (11.0) • Bland-Altman plot with mean difference and 95% limits of agreement* * Nobody did this.

  18. Methods or Results? We assessed inter-rater reliability of the IQ test by having different examiners re-test 198 of the children. The within-subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744)

  19. N = 142 (children examined by both Satcher and Richmond) Mean Difference = 0.49 (95% CI -0.41 – 1.38) 95% Limits of Agreement: -10.272 – 11.244

  20. Outline • DONE Assignment 3 Review • Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends, On-Screen Data Entry Conventions • Web-based Data Entry • Assignment 4

  21. Loose Ends • Yes/No Fields • BLOBs • Field Names • “Front End” vs. “Back End” • On-Screen Data Entry Conventions

  22. Yes/No fields • Binary fields are not very useful, because you can’t distinguish “No” from blank (not valued). • I create a combo box like we used for Race in Lab 1 with 0 for “No” and 1 for “Yes”. This allows blank. Demonstrate with “Subject” table/form, Latino and Jaundice fields.

  23. Demonstration (BLOB) Field types are not limited to numbers, text, dates. You can put an “object”, such as a Word document or a photo, in a field Memo fields in the Infant Jaundice Database Word Document Fields on the “Class” form of the ATCR Student Database Photograph fields in the ATCR Student Database

  24. Field Names Establish and follow naming conventions for columns and tables. Short field names without spaces or underscores are convenient for programming, querying, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper case letters within the variable name) to distinguish words, e.g. “SubjectID”, “FName”, or “ExamDate”. Table names should be singular, e.g. “Subject” instead of “Subjects”, “Exam” instead of “Exams”.

  25. “Front End” vs. “Back End” “Back End” – Tables and Data “Front End” – Forms and reports for entering and viewing the data Access database that you have been using combines “back end” (tables and relationships) with “front end” (forms and reports).* *Even if both are in Access, you usually want to split the front end from the back end. QuesGen uses MySQL for the back end.

  26. Start with Data Tables or Data Collection Forms? It doesn’t matter as long as the process is iterative. Can start with the tables and then develop the forms, test the forms, find problems, and update the tables. Can start with a word-processed form, create the tables, test, and update.* *This seems to work better for most investigators

  27. Sometimes it helps to start with the data collection forms, but remember, you do NOT need one table per data collection form. In the labs you learned that one form can combine data from several tables. And data from one table can appear on several forms.

  28. Before seeking help with data management Search the internet and ask other researchers for already developed data collection forms. Draft your data collection form. Test your data collection form with dummy subjects and, even better, with real (de-identified) study subjects. Enter your test data into a data table with rows corresponding to subjects and columns corresponding to data elements. (Use Excel, Access, Stata, or even Word.) Create or at least think about a data dictionary. Decide who will collect the data, and when/how the data will be collected.

  29. Common Sequence • Develop data collection forms in Word • Create Excel spreadsheets to store the data (one column per field/attribute, one row per record/entity) • Move from Excel to Access because of need for one of more of: • data entry forms (front end), • multiple related tables, • queries using the Access query design tool • Move from Access to QuesGen because of need for web-based data entry, hosting, auditing, richer user administration and security, but continue to use Access for querying of data extracts to filter, sort, format, and generate derived fields. • Export to Stata for analysis.

  30. On-Screen Data Collection Forms • Will demonstrate using the “race” field from the Infant Jaundice Study • Free text versus coded response • Single response (mutually exclusive choices) versus “all that apply”

  31. Free Text vs. Coded Responses Same as “Open-Ended” vs. “Closed-Ended” Questions Free text responses useful in developing coded response options.

  32. Mutually Exclusive, Collectively Exhaustive Response Options • One field (=column) • Can always make responses exhaustive by including an “Other” response • Drop down list (combo box) vs. pick list (field list) vs. option group

  33. Drop-down List (Combo Box) • Saves screen real estate • Doesn’t work on paper forms (Master form)

  34. Combo Box

  35. Combo Box

  36. Pick List (Field List) • Uses up screen real estate • Useful on paper forms (MasterRaceAsFieldList form)

  37. Field List

  38. Option Group • Radio buttons (by convention) • Uses up screen real estate (MasterRaceAsOptionGroup form)

  39. Option Group

  40. Mutually Exclusive = One Field

  41. “All that apply” Multiple fields (= columns) Use check boxes (by convention) (MasterRaceAsAllThatApply form)

  42. All That Apply

  43. “All that Apply” = Multiple Fields

  44. From Paper Data Forms to Data Table(s)* • Transcription directly into the table(s) • Transcription via an online (screen) form • Scanning using OMR software *Best option: Don’t use paper data collection forms at all.

  45. On-Screen* vs. Paper Forms Enter data directly into the computer database or move data from paper forms into the computer database as close to the data collection time as possible. When you define a variable in a computer database, you specify both its format and its domain or range of allowed values. Using these format and domain specifications, computer data entry forms give immediate feedback about improper formats and values that are out of range. The best time to receive this feedback is when the study subject is still on site. *Using on-screen forms is sometimes called EDC for Electronic Data Capture

  46. On-screen vs. paper forms You can always print out a paper copy of the screen form or a report of the exam/interview results once the data are collected. Examples: ATM Machine’s printed transaction record, Gas Station’s printed receipt