when setup files go bad debugging your sas spss and stata code so it works
Download
Skip this Video
Download Presentation
When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works

Loading in 2 Seconds...

play fullscreen
1 / 32

When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works - PowerPoint PPT Presentation


  • 550 Views
  • Uploaded on

When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works Felicia B. LeClere, Ph.D. Director, Data Sharing for Demographic Research Overview of webinar Broaden the scope a bit….. No set up files ---this is where we learn to debug

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works' - lotus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
when setup files go bad debugging your sas spss and stata code so it works

When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works

Felicia B. LeClere, Ph.D.

Director, Data Sharing for Demographic Research

overview of webinar
Overview of webinar
  • Broaden the scope a bit…..
      • No set up files ---this is where we learn to debug
      • Set up files ---things that might not work
      • When the double click doesn’t work….
things we will be looking for
Things we will be looking for
  • What it looks like when it runs…
  • When things don’t work…
  • How to diagnosis what’s wrong…
its just numbers what do i do
Its’ just numbers…what do I do?
  • Many of our historical files require you to create syntax on your own…that means learning to read in ASCII data
  • You know you are in trouble when the download page looks like this…
what to do
What to do….
  • Find the documentation and look for the following language
        • Column locations, field length, or variable position
  • These describe where your variables are in the ASCII data file and mark how you will read them in….
what you will see
What you will see….

How to read the data

The data file location

The data file

what you need to do
What you need to do

This is from the codebook…called tape position index

Variable location

Variable

and you know the drill
And you know the drill….
  • Identify method for ASCII for your favorite stat package
  • Use fixed format infile to read the fields
  • And build…..
how do you know when its gone wrong
How do you know when its gone wrong
  • Says the file doesn’t exist or can’t be read or some other message
  • Doesn’t read a variable or doesn’t recognize a variable name
  • Frequency counts really don’t match what is in the documentation
  • The valid values for a variable don’t match what’s in the codebook
  • The number of cases don’t match the number given in the codebook
slide13
This looks right

libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

data new;

infile "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt";

input

pid 1-5 exam 16 lang 17;

proc freq;

tables lang;

run;

NOTE: The infile "D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt" is:

File Name=D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt,

RECFM=V,LRECL=256

NOTE: 11653 records were read from the infile "D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt".

The minimum record length was 256.

The maximum record length was 256.

One or more lines were truncated.

NOTE: The data set WORK.NEW has 11653 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 1.75 seconds

cpu time 0.15 seconds

6 proc freq;

7 tables lang;

8 run;

NOTE: There were 11653 observations read from the data set WORK.NEW.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.96 seconds

cpu time 0.00 seconds

slide14
The SAS System 10:27 Tuesday, April 28, 2009 1

The FREQ Procedure

Cumulative Cumulative

lang Frequency Percent Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 5986 51.53 5986 51.53

2 5631 48.47 11617 100.00

Frequency Missing = 36

Looks good!!

slide15
Not so right …….

libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

data new;

infile "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt";

input

pid 1-5 exam 16 lang 17 Bite 407;

proc freq;

tables bite;

run;

NOTE: The infile "D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt" is:

File Name=D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt,

RECFM=V,LRECL=256

NOTE: LOST CARD.

pid=16785 exam=1 lang=1 Bite=. _ERROR_=1 _N_=5827

NOTE: 11653 records were read from the infile "D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt".

The minimum record length was 256.

The maximum record length was 256.

One or more lines were truncated.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.NEW has 5826 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.34 seconds

cpu time 0.21 seconds

14 proc freq;

15 tables bite;

16 run;

NOTE: There were 5826 observations read from the data set WORK.NEW.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

slide16
Allowable values don’t match

Frequencies don’t match

slide17
Why?
  • The allowable record length in SAS is 256 –it was telling us that in the error.
  • Once we got past the field position of 256…we got lost. Language was at position 17 ….and Bite at 407
    • Solution-
  • Reset lrecl in infile statement (lrecl=815)
other reasons things go bad
Other reasons things go bad
  • Multiple lines per record --- a product of times when data were on cards and the record length was fixed at 80
  • You read a string as a numeric or vice versa
  • Data errors or non-standard characters (files converted from main frames or other formats)
you have a syntax file
You have a syntax file …
  • You find a file and you download it for your favorite flavor of software
  • You decide to keep all the variables
  • You know where the data went (i.e. where you downloaded it to)
initial steps to test
Initial steps to test
  • Get rid of formatting
  • Add a frequency check for variables at the beginning and end
  • Simplify if you can (do you really need all those variables…)
slide21
libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

DATA;

INFILE "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\22627-0001-Data.txt" LRECL=2983;

INPUT

CASEID 1-8 GENDER 9 AGE 10-11

ETHNONAT 12-13 ETHNOS10 14-15 PANETH4 16-17

GENERAT3 18-20 .1 GENERAT4 21-23 .1 AGEARRV 24-25

ABUELOFB 26-27 QUOGRPS 28-31 INTLANG 32-35

SAMPLE 36-39 QS2AM 40-43 QS2AF 44-47

QS5A 48-51 QS5B 52-55 QS6A 56-59

QS6B 60-63 QS7 64-67 QS8 68-71

What happened?

ERROR: Physical file does not exist, D:\fleclere\Desktop\misc

documents\10294302\ICPSR_08535\DS0002\22627-0001-Data.txt.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.DATA1 may be incomplete. When this step was stopped there were 0

observations and 657 variables.

NOTE: DATA statement used (Total process time):

real time 0.29 seconds

cpu time 0.29 seconds

1534 Proc freq;

1535 Tables gender polparty;

1536

1537 RUN ;

NOTE: No observations in data set WORK.DATA1.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

slide22
TE: The infile "D:\fleclere\Desktop\misc

documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt" is:

File Name=D:\fleclere\Desktop\misc

documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt,

RECFM=V,LRECL=2983

NOTE: 4655 records were read from the infile "D:\fleclere\Desktop\misc

documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt".

The minimum record length was 2983.

The maximum record length was 2983.

NOTE: The data set WORK.DATA3 has 4655 observations and 657 variables.

NOTE: DATA statement used (Total process time):

real time 2.51 seconds

cpu time 0.73 seconds

4576 Proc freq;

4577 Tables gender polparty;

4578

4579 RUN ;

NOTE: There were 4655 observations read from the data set WORK.DATA3.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

slide24
What else should I check?

This is from the original survey documentation before ICPSR standardization. Always validate the data produced against documentation from original data set to be sure. The syntax and the ICPSR codebook have the same origins --- an error in one may be reproduced in another. Total case counts and frequencies.

if the frequencies or case counts don t match
If the frequencies or case counts don’t match
  • Check the lrecl against the documentation
  • Check the field lengths ----the codebooks should contain for each variable its location and field length
  • Punctuation counts …SAS likes its semicolons and SPSS its spaces and periods and STATA is fussy about what goes before and after a comma
if variable looks weird
If variable looks weird
  • Print observations ….

Everything checks out but there are weird fields or non-numeric items in a frequency display. Print a record or 2.

i pointed i clicked and
I pointed, I clicked, and …
  • Things to ask yourself

Is it a version issue?

(SAS in particular has problems reading different versions)

Do you have the software?

(the icon will not look right)

why you should always run the ascii syntax instead
Why you should always run the ASCII syntax instead?
  • It allows you to customize the file
  • It forces you to know where the data are
  • It forces you to read the log files even if all you are doing is watching them go by
  • You have to open the software version --- and it will run it in the version you have and create the file the way you need it
  • It prevents you from being complacent
steps to prevent bugs
Steps to prevent bugs
  • Simplify the syntax…take out all the extraneous stuff
  • Pick fewer variables
  • Always add frequency counts
  • Always check case counts
steps to prevent bugs30
Steps to prevent bugs
  • Know where you put the data
  • Read the documentation first
  • Save log files as well as program files
  • Verify, verify, verify
if you find errors in icpsr syntax
If you find errors in ICPSR syntax
  • Please help us … send a corrected file and a description of the error to:

[email protected]

We do updates all the time

if you need help with the basics
If you need help with the basics
  • Our help site for reading in data

Using Data

Great help in building and debugging statistical software programs

UCLA

ad