Statistical Software Packages:

Statistical Software Packages:

How do I get this into that ? Gillian Byrne Memorial University of Newfoundland. The Basics. Data is often available in flat ASCII text files. Data Definition Files. Statistical software programs need to know what to do with the data.

### Statistical Software Packages:

How do I get this into that?

Gillian Byrne

Memorial University of Newfoundland

The Basics

- Data is often available in flat ASCII text files

Data Definition Files

- Statistical software programs need to know what to do with the data.
- Data Definition Files “explain” the text file to the software program
- For example a data definition file can format the pile of numbers into cases and variables, provide variable labels, define missing cases, and more
- Data definition files differ between software packages

SPSS Syntax File

Location of the data

Variables in the data file

Variable labels (as seen in the SPSS Variable View)

Value labels assign descriptions to the values of variables

Missing values for each variable

Data Definition Files and the Codebook

- Where do the data definition files derive from?
- …the Codebook!

Other Statistical Software Packages

SAS

- Geared towards power users: one of the most powerful statistical packages, but also has the steepest learning curve
- Relies more on programming rather than a point-and-click. interface

Other Statistical Software Packages

Stata

- Combination of command language and point and click interface
- Used by economics departments and other social science disciplines
- Known for its strong graphing capabilities

Other Statistical Software Packages

Shazam

- Canadian product
- used widely in economics/econometrics
- Not as powerful as other statistical programs
- Runs on DOS, Windows, Mac, Unix platforms

Other Statistical Software Packages

MS Excel

- Not a dependable statistical package, but…
- Widely available
- Easy to understand & use

Tips for Successful Interoperability

- Data definition files
- By far the easiest way to format raw data
- SPSS, SAS, and STATA data definition files (with commenting!) are available in IDLS
- Troubleshooting tips:
- Ensure you correctly identify the file path to the data
- Make sure that commands don’t include breaks (carriage returns)
- Check to make sure the correct symbol is used to separate commands (in SPSS it’s a period, in SAS & STATA a semi-colon)

Tips for Successful Interoperability

- Comma-Separated Values (csv) files:
- Text files (with the extension .csv) with commas separating the data
- Often csv files imported into statistical software will require tweaking (variable labels, layout, etc.)
- csv files can be imported by most programs:
- SPSS, SAS, Stata, Excel

- csv files are available in ESTAT and CANSIM II through CHASS
- b2020 files can also be converted to csv for use in another program

File Input Chart

Adapted from: http://www.chass.utoronto.ca/datalib/caq/format.htm

Conversion Software

- Conversion software allows you to seamlessly transport data from one statistical program to another
- STAT/Transfer
- Supports over 30 software programs, including SAS, SPSS and Stata
- Approx. $150 USD for single user license

- DBMS/Copy
- Supports over 80 software programs, including databases and spreadsheets
- Approx. $500 USD for single user

Roundup

- There are a proliferation of statistical software packages, all of them with different strengths and weaknesses
- Concentrate on getting the data into the software – often users can take it from there
- CANSIM II at CHASS, ESTAT, IDLS, and the DLI website all offer different file type options – it can be worthwhile checking different sources to find the file type you’re looking for

