When setup files go bad debugging your sas spss and stata code so it works
Download
1 / 32

When Setup Files Go Bad. Debugging your SAS - PowerPoint PPT Presentation


  • 546 Views
  • Uploaded on

When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works Felicia B. LeClere, Ph.D. Director, Data Sharing for Demographic Research Overview of webinar Broaden the scope a bit….. No set up files ---this is where we learn to debug

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'When Setup Files Go Bad. Debugging your SAS' - lotus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
When setup files go bad debugging your sas spss and stata code so it works l.jpg

When Setup Files Go Bad….  Debugging your SAS, SPSS, and STATA code so it works

Felicia B. LeClere, Ph.D.

Director, Data Sharing for Demographic Research


Overview of webinar l.jpg
Overview of webinar STATA code so it works

  • Broaden the scope a bit…..

    • No set up files ---this is where we learn to debug

    • Set up files ---things that might not work

    • When the double click doesn’t work….


Things we will be looking for l.jpg
Things we will be looking for STATA code so it works

  • What it looks like when it runs…

  • When things don’t work…

  • How to diagnosis what’s wrong…


Its just numbers what do i do l.jpg
Its’ just numbers…what do I do? STATA code so it works

  • Many of our historical files require you to create syntax on your own…that means learning to read in ASCII data

  • You know you are in trouble when the download page looks like this…


Slide5 l.jpg

Instead of this…….. STATA code so it works


What to do l.jpg
What to do…. STATA code so it works

  • Find the documentation and look for the following language

    • Column locations, field length, or variable position

  • These describe where your variables are in the ASCII data file and mark how you will read them in….


  • What you will see l.jpg
    What you will see…. STATA code so it works

    How to read the data

    The data file location

    The data file


    What you need to do l.jpg
    What you need to do STATA code so it works

    This is from the codebook…called tape position index

    Variable location

    Variable


    Slide10 l.jpg

    Variable descriptions STATA code so it works


    And you know the drill l.jpg
    And you know the drill…. STATA code so it works

    • Identify method for ASCII for your favorite stat package

    • Use fixed format infile to read the fields

    • And build…..


    How do you know when its gone wrong l.jpg
    How do you know when its gone wrong STATA code so it works

    • Says the file doesn’t exist or can’t be read or some other message

    • Doesn’t read a variable or doesn’t recognize a variable name

    • Frequency counts really don’t match what is in the documentation

    • The valid values for a variable don’t match what’s in the codebook

    • The number of cases don’t match the number given in the codebook


    Slide13 l.jpg

    This looks right STATA code so it works

    libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

    data new;

    infile "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt";

    input

    pid 1-5 exam 16 lang 17;

    proc freq;

    tables lang;

    run;

    NOTE: The infile "D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt" is:

    File Name=D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt,

    RECFM=V,LRECL=256

    NOTE: 11653 records were read from the infile "D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt".

    The minimum record length was 256.

    The maximum record length was 256.

    One or more lines were truncated.

    NOTE: The data set WORK.NEW has 11653 observations and 3 variables.

    NOTE: DATA statement used (Total process time):

    real time 1.75 seconds

    cpu time 0.15 seconds

    6 proc freq;

    7 tables lang;

    8 run;

    NOTE: There were 11653 observations read from the data set WORK.NEW.

    NOTE: PROCEDURE FREQ used (Total process time):

    real time 0.96 seconds

    cpu time 0.00 seconds


    Slide14 l.jpg

    The SAS System 10:27 Tuesday, April 28, 2009 1 STATA code so it works

    The FREQ Procedure

    Cumulative Cumulative

    lang Frequency Percent Frequency Percent

    ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

    1 5986 51.53 5986 51.53

    2 5631 48.47 11617 100.00

    Frequency Missing = 36

    Looks good!!


    Slide15 l.jpg

    Not so right ……. STATA code so it works

    libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

    data new;

    infile "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt";

    input

    pid 1-5 exam 16 lang 17 Bite 407;

    proc freq;

    tables bite;

    run;

    NOTE: The infile "D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt" is:

    File Name=D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt,

    RECFM=V,LRECL=256

    NOTE: LOST CARD.

    pid=16785 exam=1 lang=1 Bite=. _ERROR_=1 _N_=5827

    NOTE: 11653 records were read from the infile "D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\08535-0002-data.txt".

    The minimum record length was 256.

    The maximum record length was 256.

    One or more lines were truncated.

    NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

    NOTE: The data set WORK.NEW has 5826 observations and 4 variables.

    NOTE: DATA statement used (Total process time):

    real time 0.34 seconds

    cpu time 0.21 seconds

    14 proc freq;

    15 tables bite;

    16 run;

    NOTE: There were 5826 observations read from the data set WORK.NEW.

    NOTE: PROCEDURE FREQ used (Total process time):

    real time 0.01 seconds

    cpu time 0.01 seconds


    Slide16 l.jpg

    Allowable values don’t match STATA code so it works

    Frequencies don’t match


    Slide17 l.jpg
    Why? STATA code so it works

    • The allowable record length in SAS is 256 –it was telling us that in the error.

    • Once we got past the field position of 256…we got lost. Language was at position 17 ….and Bite at 407

      • Solution-

    • Reset lrecl in infile statement (lrecl=815)


    Other reasons things go bad l.jpg
    Other reasons things go bad STATA code so it works

    • Multiple lines per record --- a product of times when data were on cards and the record length was fixed at 80

    • You read a string as a numeric or vice versa

    • Data errors or non-standard characters (files converted from main frames or other formats)


    You have a syntax file l.jpg
    You have a syntax file … STATA code so it works

    • You find a file and you download it for your favorite flavor of software

    • You decide to keep all the variables

    • You know where the data went (i.e. where you downloaded it to)


    Initial steps to test l.jpg
    Initial steps to test STATA code so it works

    • Get rid of formatting

    • Add a frequency check for variables at the beginning and end

    • Simplify if you can (do you really need all those variables…)


    Slide21 l.jpg

    libname in "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

    DATA;

    INFILE "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002\22627-0001-Data.txt" LRECL=2983;

    INPUT

    CASEID 1-8 GENDER 9 AGE 10-11

    ETHNONAT 12-13 ETHNOS10 14-15 PANETH4 16-17

    GENERAT3 18-20 .1 GENERAT4 21-23 .1 AGEARRV 24-25

    ABUELOFB 26-27 QUOGRPS 28-31 INTLANG 32-35

    SAMPLE 36-39 QS2AM 40-43 QS2AF 44-47

    QS5A 48-51 QS5B 52-55 QS6A 56-59

    QS6B 60-63 QS7 64-67 QS8 68-71

    What happened?

    ERROR: Physical file does not exist, D:\fleclere\Desktop\misc

    documents\10294302\ICPSR_08535\DS0002\22627-0001-Data.txt.

    NOTE: The SAS System stopped processing this step because of errors.

    WARNING: The data set WORK.DATA1 may be incomplete. When this step was stopped there were 0

    observations and 657 variables.

    NOTE: DATA statement used (Total process time):

    real time 0.29 seconds

    cpu time 0.29 seconds

    1534 Proc freq;

    1535 Tables gender polparty;

    1536

    1537 RUN ;

    NOTE: No observations in data set WORK.DATA1.

    NOTE: PROCEDURE FREQ used (Total process time):

    real time 0.00 seconds

    cpu time 0.00 seconds


    Slide22 l.jpg

    TE: The infile "D:\fleclere\Desktop\misc documents\10294302\ICPSR_08535\DS0002";

    documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt" is:

    File Name=D:\fleclere\Desktop\misc

    documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt,

    RECFM=V,LRECL=2983

    NOTE: 4655 records were read from the infile "D:\fleclere\Desktop\misc

    documents\10294358\ICPSR_22627\DS0001\22627-0001-Data.txt".

    The minimum record length was 2983.

    The maximum record length was 2983.

    NOTE: The data set WORK.DATA3 has 4655 observations and 657 variables.

    NOTE: DATA statement used (Total process time):

    real time 2.51 seconds

    cpu time 0.73 seconds

    4576 Proc freq;

    4577 Tables gender polparty;

    4578

    4579 RUN ;

    NOTE: There were 4655 observations read from the data set WORK.DATA3.

    NOTE: PROCEDURE FREQ used (Total process time):

    real time 0.01 seconds

    cpu time 0.01 seconds


    Slide23 l.jpg

    This is from our codebook documents\10294302\ICPSR_08535\DS0002";


    Slide24 l.jpg

    What else should I check? documents\10294302\ICPSR_08535\DS0002";

    This is from the original survey documentation before ICPSR standardization. Always validate the data produced against documentation from original data set to be sure. The syntax and the ICPSR codebook have the same origins --- an error in one may be reproduced in another. Total case counts and frequencies.


    If the frequencies or case counts don t match l.jpg
    If the frequencies or case counts don’t match documents\10294302\ICPSR_08535\DS0002";

    • Check the lrecl against the documentation

    • Check the field lengths ----the codebooks should contain for each variable its location and field length

    • Punctuation counts …SAS likes its semicolons and SPSS its spaces and periods and STATA is fussy about what goes before and after a comma


    If variable looks weird l.jpg
    If variable looks weird documents\10294302\ICPSR_08535\DS0002";

    • Print observations ….

      Everything checks out but there are weird fields or non-numeric items in a frequency display. Print a record or 2.


    I pointed i clicked and l.jpg
    I pointed, I clicked, and … documents\10294302\ICPSR_08535\DS0002";

    • Things to ask yourself

      Is it a version issue?

      (SAS in particular has problems reading different versions)

      Do you have the software?

      (the icon will not look right)


    Why you should always run the ascii syntax instead l.jpg
    Why you should always run the ASCII syntax instead? documents\10294302\ICPSR_08535\DS0002";

    • It allows you to customize the file

    • It forces you to know where the data are

    • It forces you to read the log files even if all you are doing is watching them go by

    • You have to open the software version --- and it will run it in the version you have and create the file the way you need it

    • It prevents you from being complacent


    Steps to prevent bugs l.jpg
    Steps to prevent bugs documents\10294302\ICPSR_08535\DS0002";

    • Simplify the syntax…take out all the extraneous stuff

    • Pick fewer variables

    • Always add frequency counts

    • Always check case counts


    Steps to prevent bugs30 l.jpg
    Steps to prevent bugs documents\10294302\ICPSR_08535\DS0002";

    • Know where you put the data

    • Read the documentation first

    • Save log files as well as program files

    • Verify, verify, verify


    If you find errors in icpsr syntax l.jpg
    If you find errors in ICPSR syntax documents\10294302\ICPSR_08535\DS0002";

    • Please help us … send a corrected file and a description of the error to:

      [email protected]

      We do updates all the time


    If you need help with the basics l.jpg
    If you need help with the basics documents\10294302\ICPSR_08535\DS0002";

    • Our help site for reading in data

      Using Data

      Great help in building and debugging statistical software programs

      UCLA


    ad