1 / 24

Data Management Systems in Epidemiology Seminar 1 April 25, 2012

Data Management Systems in Epidemiology Seminar 1 April 25, 2012. Data management is important for: Documenting observations Reproducibility Organizing thoughts Publication Presentations Applying for funding Communicating with mentor Required by funding agency such as NIH

avani
Download Presentation

Data Management Systems in Epidemiology Seminar 1 April 25, 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Systems in Epidemiology Seminar 1 April 25, 2012

  2. Data management is important for: • Documenting observations • Reproducibility • Organizing thoughts • Publication • Presentations • Applying for funding • Communicating with mentor • Required by funding agency such as NIH • Analyzing data differently in the future

  3. Steps in creating a database: • Defining your variables • Format and range of values for variables • Creating a data dictionary • Planning data entry procedures • Testing data entry procedures • Data entry • Creating a dataset for analysis • Backing up and archiving the dataset

  4. Defining New Variables • Assign a name to each variable for identifying • variables in the database and during the analysis • Variable’s name should: • - Clearly identify the question on the survey or • type of information collected • - Be understandable, consistent and short • Use lower case to name all variables this • eliminates errors when software are • case sensitive. • Note: Some software such as SPSS only • allow 8 characters

  5. Example of Data Form

  6. Constructing Database • A database transform data into information. • Before creating database you must identify • and clarify its purposes. • Who will use it? • What are their needs? • What will the data be used for? • What do I want to say with this data?

  7. Major types of databases utilized in public health: Flat-file database (spreadsheet) Relational database

  8. Relational Database • A relational database stores data in a table. • Each table consist of records (rows) and • fields (columns). • Tables can be linked and are related • Examples of Software for creating relational • databases are: • MS Access • Oracle • MySQL • Epi Info

  9. Flat File Database • A flat file database is a single file with rows • and columns, with no relationships between • records • Choosing between flat file and relational • database depends on the information you • are collecting. • If the information include complex relationships, • you should use a relational database. This will • reduce data entry time, error and redundancy.

  10. Data Dictionary • A data dictionary is for identifying the • meaning of the collected data. • A data dictionary should include: • Variable type (nominal, string, text) • Vaiable format (“Yes”, “No”, “Missing”) • Acceptable values (a response coded • can only include 0, 1, or 9999)

  11. Coding • Coding = Translation and summarization • The majority of statistical analysis require • that nonnumeric responses be coded into • numeric responses. • Coding Example • “Have you ever been diagnosed with asthma?” • 1=”Yes” • 2=”No” • 9=”Don't Know”

  12. Coding Open-Ended Questions • Coding responses to open-ended questions • are complicated. • Example: • "What hobbies or other interests do you • have?“ • "What has been important about your adult • life?"

  13. Coding Missing Data • Make sure that assigned value for missing data is not a possible numeric value for that data. • Example: When coding missing data for age with “99”. Missing values for age will be analyzed as age=99 years. This is incorrect. • Get familiar with the standard missing value code of the software that you will use • Code “Don't Know” differently from missing data.

  14. Validating Data • Check for illogical answers – For example, • those reported as “female” should not report • that they have had prostatectomy • Most data management systems are able to do • edit checks to validate your data while it is being • entered (set this up). • Most systems let us to control the • range of acceptable values that can be • entered into a field (set this up).

  15. Controlling Data Range The database can be arranged to permit only the values 1, 2, and 9 to be entered into the field For example: “Have you ever been diagnosed with chronic bronchitis?” 1=”Yes” 2=”No” 9=”Don't Know”

  16. Data Entry • Goal: enter the data efficiently and accurately • into the database. • Reduce data entry time by setting up proper • tab orders and hot key short cuts • Data Entry alternatives: • One Database • Multiple Databases

  17. Advantage of One Database • While data is being entered, we can instantly • run summary statistics and interim analyses. • We can easily check for data entry duplication. • Handy if data entry staffs are in multiple • locations and everyone has access to the same • database.

  18. Advantage of Multiple Databases • If the database gets corrupted, we have to only • re-enter that person’s records. • Useful if the data entry staff has different levels • of computer and data entry skills • We will not lose all of our data (just some). • Data can be merged for the analysis of the • project. • Note: Monitor who is entering which records into the database!

  19. Train data entry staff on: • How to enter the data • How to navigate through the database • What should be entered and how • Provide your staff with protocol guidelines • and hard copy of a blank surveys and data • form • Back up the database frequently • • Document the process / make a note of • everything that happens

  20. Documentation of Data Entry Procedure • Retain a notebook and document ALL that • happens. • Choices that are made about what • will be entered and how • Change in staff involved in data • collection and data entry and when • these changes happened • Problems with data entry and solutions • Arrange all data collection forms for easy • retrieval.

  21. Identifying and correcting errors in the database • Double data entry is one method to reduce • data entry errors. • Data are entered twice. • The two entries are compared for each. • variable and create a list of values that • do not match. • Data with errors are checked against • the original data. • Another method is to recheck or reenter a • random fraction of the data.

  22. Testing Data Entry Procedures • It is crucial to test your data entry methods • every time. • How? Check your: • -coding • -data validation • -checking procedures • -revise and check your data dictionary • Finally take six to ten surveys or forms and • enter them into your newly created database.

  23. Backing Up and Archiving • Back up the data regularly (e.g., end of each • day). • Archive a dataset with its documentation and • any important files for interpreting the data.

  24. Questions

More Related