Data management
1 / 31

DATA MANAGEMENT - PowerPoint PPT Presentation

  • Updated On :

DATA MANAGEMENT. Using EpiData and SPSS. References. Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData . The EpiData Documentation Project. :

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'DATA MANAGEMENT' - russ

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data management l.jpg


Using EpiData and SPSS

References l.jpg

Public domain (pdf) book on data management:

Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. :

EpiData Association Website:

Importing raw data into SPSS:

Data management3 l.jpg
Data Management

  • Planning data needs

  • Data collection

  • Data entry and control

  • Validation and checking

  • Data cleaning and variable transformation

  • Data backup and storage

  • System documentation

  • Other

Types of data base management systems dbmss l.jpg
Types of Data Base Management Systems (DBMSs)

  • Spreadsheets (e.g., Excel, SPSS Data Editor)

    • Prone to error, data corruption, & mismanagement

    • Lack data controls, limited programmability

    • Suitable only for small and didactic projects

    • Also good for last step data cleaning

  • Commercial DBMS programs (e.g., Oracle, Access)

    • Limited data control, good programmability

    • Slow & expensive

    • Powerful and widely available

  • Public domain programs (e.g., EpiData, Epi Info)

    • Controlled data entry, good programmability

    • Suitable for research and field use

We will use two platforms l.jpg
We will use two platforms:

  • EpiData

    • controlled data entry

    • data documentation

    • export (“write”) data

  • SPSS

    • import (“read”) data

    • analysis

    • reporting

What is epidata l.jpg
What is EpiData ?

  • EpiData is computer program (small in size 1.2Mb) for simple or programmed data entry and data documentation

  • It is highly reliable

  • It runs on Windows computers

    • Runs on Macs and Linus with emulator software (only)

  • Interface

    • pull down menus

    • work bar

History of epiinfo epidata l.jpg
History of EpiInfo & EpiData

  • 1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic)

    • Small, fast, reliable, 100,000+ users worldwide

  • 1995–2000: DOS dies slow painful death

  • 2000: CDC releases EpiInfo2000

    • Based on Microsoft Jet (Access) data engine

    • Large, slow, unreliable (resembled EpiInfo in name only)

  • 2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”

    • Creates open source public domain program

    • Calls program “EpiData”

Goal create maintain error free datasets l.jpg
Goal: Create & Maintain Error-Free Datasets

  • Two types of data errors

    • Measurement error (i.e., information bias) – discussed last couple of weeks

    • Processing errors = errors that occur during data handling – discussed this week

  • Examples of data processing errors

    • Transpositions (91 instead of 19)

    • Copying errors (O instead of 0)

    • Additional processing errors described on p. 18.2

Avoiding data processing errors l.jpg
Avoiding Data Processing Errors

  • Manual checks (e.g., handwriting legibility)

  • Range and consistency checks* (e.g., do not allow hysterectomy dates for men)

  • Double entry and validation*

    • Operator 1 enters data

    • Operator 2 enters data in separate file

    • Check files for inconsistencies

  • Screening during analysis (e.g., look for outliers)

* covered in lab

Controlled data entry l.jpg
Controlled Data Entry

  • Criteria for accepting & rejecting data

  • Types of data controls

    • Range checks (e.g., restrict AGE to reasonable range)

    • Value labels (e.g., SEX:1 = male, 2 = female)

    • Jumps (e.g., if “male,” jump to Q8)

    • Consistency checks (e.g., if “sex = male,” do not allow “hysterectomy = yes”)

    • Must enters

    • etc.

Data processing steps l.jpg
Data Processing Steps

  • File naming conventions

  • Variables types and names

  • QES (questionnaire) development

  • Convert .QES file to .REC (record) file

  • Add .CHK file

  • Enter data in REC file

  • Validate data (double entry procedure)

  • Documentation data (code book)

  • Export data to SPSS

  • Import data into SPSS

Filenaming and file management l.jpg
Filenaming and File Management

  • c:\path\filename.ext

  • A web address is a good example of a filename, e.g.,

  • Some systems are case sensitive (Unix)

    • Others are not (Windows)

  • Always be aware of

    • Physical location(local, removable, network)

    • Path (folders and subfolders)

    • Filename (proper)

    • Extension

  • Demo Windows Network Explorer: right-click Start Bar > Explore

Epidata variable names l.jpg
EpiData Variable Names

  • Variable name based on text that occurs before variable type indicator code

  • EpiData variable naming default vary depending on installation

  • Create variable names exactly as specified

    To be safe, denote variable names in {curly brackets}

  • For example, to create a two byte numeric variable called age, use the question:

What is your {age}? ##

Demo work along l.jpg
Demo / Work Along

  • Create QES file [demo.qes]

  • Convert QES to REC [demo.rec]

  • Create CHK file [demo.chk]

  • Create double entry file [demo2.rec]

  • Enter data

  • Validate data

We will stop here and pick up the second part of the lecture next week l.jpg

We will stop here and pick up the second part of the lecture next week

“Stay tuned”

Codebooks l.jpg

  • Contain info that helps users decipher data file content and structure

  • Includes:

    • Filename(s)

    • File location(s)

    • Variable names

    • Coding schemes

    • Units

    • Anything else you think might be useful

File structure codebook l.jpg
File Structure Codebook

Full codebook contains descriptive statistics (demo)

Full codebook l.jpg
Full Codebook

Notice descriptive statistics

Conversion of data file l.jpg
Conversion of Data File

  • Requires common intermediate file format

  • Examples of common intermediate files

    • .TXT = plain text

    • .DBF = dBase program

    • .XLS = Excel

  • Steps

    • Export .REC file  .TXT file

    • Import .TXT file into SPSS

    • Save permanent SAV file

Plain raw txt data l.jpg
Plain (“raw”) TXT data

  • plain ASCII data format

  • no column demarcations

  • no variable names

  • no labels

Txt file with codebook l.jpg
TXT file with codebook



Spss data export import l.jpg
SPSS Data Export / Import


(raw data)





Top of tox samp sps l.jpg
Top of tox-samp.sps

Lines beginning with * are

comments (ignored by

command interpreter)

Next set of commands show

file location and structure

via SPSS command syntax

Bottom part of tox samp sps file l.jpg
Bottom part of tox-samp.sps file

Labels being imported

into SPSS

Delete * if you want this

command to run

Ethics of data keeping l.jpg
Ethics of Data Keeping

  • Confidentiality (sanitized files – free of identifiers)

  • Beneficence

  • Equipoise

  • Informed consent (To what extent?)

  • Oversight (IRB)