14b accessing data files in sas n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
14b. Accessing Data Files in SAS ® PowerPoint Presentation
Download Presentation
14b. Accessing Data Files in SAS ®

Loading in 2 Seconds...

play fullscreen
1 / 35

14b. Accessing Data Files in SAS ® - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

14b. Accessing Data Files in SAS ®. Prerequisites. Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either 4. Parent and Youth Surveys or

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '14b. Accessing Data Files in SAS ®' - fabian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
prerequisites
Prerequisites
  • Recommended modules to complete before viewing this module
    • 1. Introduction to the NLTS2 Training Modules
    • 2. NLTS2 Study Overview
    • 3. NLTS2 Study Design and Sampling
    • NLTS2 Data Sources, either
      • 4. Parent and Youth Surveys or
      • 5. School Surveys, Student Assessments, and Transcripts
    • NLTS2 Documentation
      • 10. Overview
      • 11. Data Dictionaries
      • 12. Quick References
overview
Overview
  • Purpose
  • Opening and viewing data files
  • Limiting variables
  • Subsetting cases
  • Joining/combining data files
  • Summary
  • Closing
  • Important information
nlts2 restricted use data
NLTS2 restricted-use data

NLTS2 data are restricted.

Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data.

Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

purpose
Purpose
  • Learn to
    • Open a data file
    • See what is in a file (i.e., contents of the file)
    • “Size” a data file for a perfect fit
      • Reduce the number of variables
      • Reduce the number of cases (i.e., subset the data)
    • Combine information from multiple sources
      • Bring in data from another source or another wave
      • Join or combine files
    • Create a new file
open and view data files
Open and view data files
  • SAS® and SPSS ® data are in separate folders.
  • SAS data files have a “.sas7bdat” extension.
  • Associated value formats are stored in a SAS library, “Formats.sas7bcat.”
  • SAS programming code is available for re-creating the user-defined formats in the SAS format library.
    • Hyperlinked from the table of contents on the database and documentation disk.
open and view data files1
Open and view data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Files are either read from or written to.
  • Files have a name and a location where they are stored.
    • SAS needs to know the name of the file and where to find it.
  • SAS uses a LIBNAME statement to identify the path.
    • The path describes the nesting of folders.
    • C:\myprojects\NLTS2\Data is a path or location.
      • i.e., the file is located on Drive “C”, in the folder “Data,” which is nested inside the “myprojects” and “NLTS2” folders.
  • An example LIBNAME statement for a path would be

LIBNAME sasdb 'C:\myprojects\NLTS2\Data' ;

open and view data files2
Open and view data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • An easy way to view the contents of the file is PROC CONTENTS.
  • Specify LIBNAME statements for the data files and format library.

LIBNAME [ddname] '[path]' ;

LIBNAME library '[path]' ;

  • Syntax

PROC CONTENTS DATA = [ddname].[file name] ;

RUN ;

open and view data files example
Viewing files

Use the Wave 1 teacher file

Run a PROC CONTENTS.

Open and View Data Files: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

open and view data files example1
Open and view data files: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

viewing files
Viewing files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

limiting variables
Limiting variables

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • How to reduce the number of variables in the file
    • Large files with many cases and many variables are unwieldy; simplify.
      • Fewer variables to search through.
      • Fewer cases to process.
    • Create data files that are limited to just those variables needed for analysis.
    • You have the choice of drop or keep.
      • Which one is best? The one that requires less typing!
      • If you are dropping more variables than keeping, use “keep” and vice versa.
limiting variables1
Limiting variables

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Note: When making changes to your data
    • Use work files for temporary changes.
      • Work files are files that are in existence only for the duration of the program or SAS interactive session.
      • Work files are temporary files unless they are saved.
      • Work files have a one-level name—no ddname needed
    • To save modified data permanently, create a new permanent data file.
      • Usually it is best to create a new file rather than to modify the source file.
limiting variables2
Limiting variables

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Syntax

DATA Par_w1_lmt_vars (DROP= np1E2a np1GroupMember);

SET sasdb.n2w1parent(KEEP= ID w1_DisHdr2001 w1_GendHdr2001w1_IncomeHdr2001 w1_AgeHdr2001 np1Weight np1HealthProb np1GroupMember np1ProblemCount np1E2a np1B2a) ;

run ;

  • A “KEEP” or “DROP” option on a “SET” or “MERGE” statement controls which variables come into the data set.
  • A “KEEP” or “DROP” option on a “DATA” statement controls which variables are saved on the output data set.
  • Notice that SAS statements end in a semicolon.
limiting variables example
Limiting Variables: Example
  • w2_dis12
  • w2_gend2
  • na_grade4
  • w2_incm3
  • wt_na

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Limiting variables
    • Create a file with fewer variables.
    • Create a new file called “PrScores” from n2w2dirassess.
    • Keep only the following variables:
      • ID
      • ndacalc_pr
      • ndaPC_PR
      • ndasyn_pr
      • NDaF1_friend
      • na_age4
    • Save the new file
    • Review the new file with a PROC CONTENTS
limiting variables example1
Limiting variables: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

subsetting cases
Subsetting cases

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • How to reduce the number of cases in a data step or in a procedure
  • Often analysis is done on a subset; for example:
      • Select only youth with visual impairment.
      • Select only youth who are out of secondary school.
      • Exclude younger students.
subsetting cases1
Subsetting cases

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Example: Limit Wave 4 parent/youth interview data to those who are 21 or older, excluding youth who are 19 or 20 (W4_Age2007 = 19 or 20).
  • Syntax to limit cases in a file

DATA AgeGT20 ;

SET sasdb.n2w4paryouth (WHERE=(W4_Age2007>20));

run;

or

DATA AgeGT20 ;

SET sasdb.n2w4paryouth ;

IF W4_Age2007>20;

run ;

subsetting cases2
Subsetting cases

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Syntax to limit cases in a procedure
    • No change to file with this syntax

PROC FREQ data=sasdb.n2w4paryouth ;

WHERE W4_Age2007>20 ;

TABLES W4_Age2007 ;

run ;

subsetting cases example
Subsetting cases: Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Subsetting cases
    • Create a small data set with a subset of cases.
    • Use “PrScores” created in previous example.
    • Limit cases to those classified with hearing. impairment only, i.e., those with a value of “5” for “w2_dis12”.
    • Look at notes in the log window.
      • Are there any clues that the file has changed?
joining combining data files
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • How to bring in data from another file
  • Purpose
    • Learn to combine or join files
      • Bring in data from another source
      • Bring in data from another wave
    • Learn what to watch for
      • Number of cases in the combined file
      • How cases are joined
        • Key variable, i.e., which variable to match on
        • Keyed file, i.e., which cases to keep
joining combining data files1
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Why do this?
    • Often it is necessary to combine information from different files to perform comparative analyses, create new variables, or measure differences over time.
  • For example, you may want to
    • Create composite variables from multiple sources.
    • Look at similar items at different points in time.
joining combining data files2
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Example of composite variables from multiple sources.
    • Create a variable for “if parent attended a parent/teacher conference” using Wave 2 teacher survey item nts2C8, and fill in with parent interview item np2E1a_d if teacher data are missing.
  • Example of items at different points in time.
    • Create a variable to look at the pattern of employment between waves 2 and 3: employed both waves, either wave, or neither wave.
      • Set to “employed both waves” if np2HasPdJob (W2) and np3HasJob (W3) are “yes.”
      • Else set to “employed in either wave” if np2HasPdJob or np3HasJob are “yes.”
      • Else set to “not employed” if np2HasPdJob and np3HasJob are “no.”
joining combining data files3
Joining/combining data files

There will be missing records across files and missing items within files.

If data look like this, ask which file is the main filebeing analyzed.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

joining combining data files4
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Data in all files must be sorted by the key variable.
    • The key variable matches files case by case.
    • Key variable is “ID.”
  • Files on CD should be sorted by key variable, but as you work with files they may become unsorted.
  • Syntax to sort data

PROC SORT data=sasdb.PrScores ;

by ID ;

run ;

  • Note for those who do not want to destroy data
    • If you use a KEEPor DROPstatement, do a temporary sort, or otherwise change the data, save to a new file.
joining combining data files5
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Syntax to join files

DATA Emplmt ;

MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7)

sasdb.n2w2paryouth (KEEP=id np2HasPdJob)

sasdb.n2w3paryouth (KEEP=id np3HasJob)

sasdb.n2w4paryouth (KEEP=id np4HasJob) ;

BY ID ;

run ;

joining combining data files6
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

SAS log

joining combining data files7
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Syntax to join files keeping only those who have Wave 4 data

DATA Emplmt ;

MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7)

sasdb.n2w2paryouth (KEEP=id np2HasPdJob)

sasdb.n2w3paryouth (KEEP=id np3HasJob)

sasdb.n2w4paryouth

(KEEP=id np4HasJob in=inW4) ;

BY ID ;

IF inW4 ;

run ;

joining combining data files8
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

SAS log

joining combining data files9
Joining/combining data files

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Suggestion: Name files in a meaningful way, such as
    • By date: AnFile_29July
    • By type of Analysis: PI_CrossWave
    • By source: PI_W123
    • By sequence: File_5
  • Be sure to use a two-level data set name (using a ddname) to save the file; otherwise, it is a work or temporary file.

DATA Emplmt ; /* temporary work file */

DATA sasdb.Emplmt; /* saved file */

joining combining data files example
Joining/combining data files:Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

  • Joining/combining data
    • Combine data from another file with an existing file.
    • Sort PrScores by ID.
    • Bring in np2HasPdJob from n2w2paryouth.
    • Bring in np3HasJob from n2w3paryouth.
    • Save the file as PrScoresEmp.
joining combining data files example1
Joining/combining data files:Example

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

summary
Summary
  • Congratulations! You have learned to
    • Open and view a file
    • Create a new file
    • Reduce the size of files by specifying the
      • Variables needed
      • Cases needed
    • Join files using a key variable
    • Save files with a new name
closing
Closing
  • Topics discussed in this module
    • Purpose
    • Opening and viewing data files
    • Limiting variables
    • Subsetting cases
    • Joining/combining data files
  • Next module
    • 15b. Accessing Data: Frequencies in SAS
important information
Important information
  • NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/
  • Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/
  • General information about restricted data licenses can be found on the NCES websitehttp://nces.ed.gov/statprog/instruct.asp
  • E-mail address: nlts2@sri.com