1 / 9

Migrating from SPSS to SIR

Migrating from SPSS to SIR. Return from Anarchy. Jon Johnson 11 May 2005. Introduction. CLS runs 3 / 4 British Birth Cohort Studies Multi-disciplinary study of the life-course of three generations born in 1958,1970 and 2000 Data collected in various ways, paper, CAPI, administrative data

saima
Download Presentation

Migrating from SPSS to SIR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Migrating from SPSS to SIR Return from Anarchy Jon Johnson 11 May 2005

  2. Introduction • CLS runs 3 / 4 British Birth Cohort Studies • Multi-disciplinary study of the life-course of three generations born in 1958,1970 and 2000 • Data collected in various ways, paper, CAPI, administrative data • Complex data, 100,000 variables, 18,000 participants per study

  3. History • Punch cards, different data centres, SIR, SPSS • The data has been through the range of data storage fashions • Social science versus Medical data access models • Goal of increased accessibility and understanding of relationships within data • Development of social science meta-data standards

  4. Current Data Collection • Data collection methods such as CAPI has a negative and positive side • Data is pre-punched • Data is pre-checked • Data is less understandable • Data is more complicated • Recent data supplied for one sweep was > 100,000 variables

  5. Taming data • Datasets are routinely supplied in SPSS format • SPSS is not an ideal environment to manage such data • SIR is an ideal environment to manage this data

  6. Data Migration with minimum information loss • SPSS Data List • Rarely used, high level of manual intervention • Visual Basic (a.k.a. SaxBasic) • Platform dependent • Limited functionality, multi-step process • ODBC • Flaky at best • Reverse engineer SPSS file • SPSS Portable format - stable if poorly documented format

  7. Implementation • PQL, Perl, Python ? • Stable across OS’s • Good text manipulation • Good XML support • Case based databases

  8. How it works • parse spss file • grabs variable name, value labels, data values etc • looks up a configuration file for BDI settings • check if also setting up database or just adding a new record • do some conversions: time, date, scaled vars • do some analysis of the data to grab range of values, • write out warning if > 3 missing values or a range of missing values • write out schema • python spss_parser.py -f <input filename> -s <sir config file> -d <ddi config file>

  9. Use • Once into SIR the data can be restructured • Extend to other datasets held in other statistical packages such as Stata or SAS going via StatTransfer -> SPSS portable format and go from there • Also creates XML to add to a data store - superseded !!!

More Related