stata getting starting and being productive with va data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Stata: Getting Starting and Being Productive with VA Data PowerPoint Presentation
Download Presentation
Stata: Getting Starting and Being Productive with VA Data

Loading in 2 Seconds...

play fullscreen
1 / 35

Stata: Getting Starting and Being Productive with VA Data - PowerPoint PPT Presentation


  • 394 Views
  • Uploaded on

Stata: Getting Starting and Being Productive with VA Data. Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007. Outline. Getting data into Stata Editing in Stata How does Stata handle data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Stata: Getting Starting and Being Productive with VA Data' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
stata getting starting and being productive with va data

Stata: Getting Starting and Being Productivewith VA Data

Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

--Abraham Lincoln

Todd Wagner

June 2007

outline
Outline
  • Getting data into Stata
  • Editing in Stata
  • How does Stata handle data
  • Stata notation and help
  • Using Stata and Basic Stata commands
transferring data
Transferring Data
  • Stattransfer or DBMS copy work
  • Stattransfer often seeks to optimize the Stata dataset by default
    • If transferring data with SCRSSN, FORCE Stattransfer to transfer SCRSSN as double precision
editing in stata
Editing in Stata
  • Any ASCII text editor will work
  • Stata has a built in text editor, but it is limited.
  • I recommend using another text editor

http://fmwww.bc.edu/repec/bocode/t/textEditors.html

handling data
Handling Data
  • SAS processes one record at a time
  • Stata processes all the records at the same time
    • Loops are commonly used in SAS
    • Loops are very rarely used in Stata
loading data into memory
Loading Data into Memory
  • Stata reads the data into memory
    • set mem 100m (before you load the data)
  • You must have enough memory for your dataset
  • With large datasets:
    • drop unnecessary variables
    • Use the compress command (but don’t compress SCRSSN)
stata abbreviations
Stata Abbreviations
  • Stata commands can be abbreviated with the first three letters
    • regression income education female

could be written

    • reg income education female
  • Can also abbreviate variables if uniquely defined
    • reg inc educ fem
stata help
Stata Help
  • Stata’s built in help is great
    • Help <command>
  • Stata manuals are great because they review theory
stata and the web
Stata and the Web
  • Stata is “web aware”
  • Check for updates periodically
    • update all
  • You can search for user-written programs
    • findit output
    • findit outreg (click to install)
stata in windows
Stata in Windows
  • Page up scrolls through the previous commands
  • There is a graphical user interface (menus) if you forget a command
  • We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss of some shortcuts
using stata
Using Stata
  • Create batch files called “.do” files
  • I work interactively
    • Run Stata and create do file as I go
    • I can then use the do file as needed
  • Debugging code and exploratory data analysis is very fast in Stata
sysdir ls and cd
Sysdir, ls and cd
  • Stata recognizes some unix commands, such as ls and cd
  • Sysdir provides a listing of Stata’s working directories

sysdir

STATA: C:\Program Files\Stata9\

UPDATES: C:\ProgramFiles\Stata9\ado\updates\

BASE: C:\Program Files\Stata9\ado\base\

SITE: C:\Program Files\Stata9\ado\site\

PLUS: c:\ado\stbplus\

PERSONAL: c:\ado\personal\

OLDPLACE: c:\ado\

delimiters
Delimiters
  • SAS recognizes “;” as a delimiter
  • Stata recognizes the carriage return
    • Always add a carriage return after your last command
  • You can change delimiters to ;

#delimit ;

missing data
Missing Data
  • Stata and SAS both use “.” as missing
  • Stata implicitly values a missing as a very large number
  • SAS implicitly values a missing as a very small number
generating and recoding variables
Generating and Recoding Variables
  • In SAS you type

quality=0;

If VA=1 then quality=1;

  • In Stata you type

gen quality=0

recode quality 0=1 if VA==1 or

replace quality=1 if VA==1

boolean logic
Boolean Logic
  • Stata is picky about Boolean logic

gen y=x if a==b (must use two ==)

gen y=x if a>b & b>10 (must use &)

gen y=x if a<=b (< or > must be before =)

creating dummy variables
Creating Dummy Variables
  • Goal: create dummy variable for each DRG

gen drgnum1=drg==1 or

tab drg, gen(drgnum)

  • This second command automatically creates dummy variables
slide19
Drop
  • Drop <varnames> (drops variables)
  • Drop if X==1 (drop cases where value is 1)
egen commands
egen Commands
  • You want to generate total costs for a medical center
  • In SAS this is done by proc summary
  • In Stata, you can type

collapse (sum) costs, by (stan3)or

sort sta3n

by sta3n: egen sumcost=total(cost)

icd 9 codes
ICD-9 Codes
  • Stata has capabilities to handle ICD-9 diagnosis and procedure codes
  • You can
    • check to see if codes are valid
    • generate identifiers based on codes or ranges of codes
dates
Dates
  • Same date functions as SAS
combining data
Combining Data
  • Merge
    • this automatically creates a variable called _merge
    • merge==1 obs. from master data
    • merge==2 obs. from only one using dataset
    • merge==3 obs. from at least two datasets, master or using

merge scrssn admitday disday using data_y

  • Append (stacking data)
explicit subscripting
Explicit Subscripting
  • Identify the most recent encounter in an encounter database

gsort id -date

by id : gen n=_n

by id : gen N=_N

gen select=n==1

Ascending sort by ID and reverse by date

Record counter from 1 to N per person

Total number of records per person

set clear and more
Set, Clear and More
  • Set: sets system parameters
    • Need to set memory size to open a database

set mem 100m

  • Clear erases data from memory
  • When output is >1 page, you are asked to continue (set more off)
summarizing data
Summarizing Data
  • Sum < >, d provides more details on each variable
  • Tabstat provides summary info, including totals

. sum gender age educ

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

gender | 4085 1.496206 .5000468 1 2

age | 4085 64.5601 9.451724 50 94

educ | 4085 4.398286 1.662883 1 9

tabulating data
Tabulating Data

. tab gender

gender | Freq. Percent Cum.

------------+-----------------------------------

1 | 2,058 50.38 50.38

2 | 2,027 49.62 100.00

------------+-----------------------------------

Total | 4,085 100.00

. table gender

----------------------

gender | Freq.

----------+-----------

1 | 2,058

2 | 2,027

----------------------

tabulating data1
Tabulating Data

tab gender age

too many values

r(134);

tab age gender

| gender

age | 1 2 | Total

-----------+----------------------+----------

50 | 49 69 | 118

51 | 72 71 | 143

94 | 1 0 | 1

-----------+----------------------+----------

Total | 2,058 2,027 | 4,085

tabstat
. tabstat age, by (gender)

gender | mean

---------+----------

1 | 64.77454

2 | 64.34238

---------+----------

Total | 64.5601

--------------------

. table gender, c(mean age)

-----------------------

gender | mean(age)

----------+------------

1 | 64.77454

2 | 64.34238

-----------------------

Tabstat
graphing
Graphing
  • Diagnostic graphics
  • Presenting

results

basic analytical functions
Basic Analytical Functions
  • OLS (reg)
  • Logistic, probit, count data (e.g., CLAD)
  • Multinomials
  • GLM/HLM
  • Duration models
  • Semi and non-parametric models
output
Output

Linear regression Number of obs = 1306

F( 21, 1284) = 10.88

Prob > F = 0.0000

R-squared = 0.1398

Root MSE = 90.367

Robust

wtp Coef. Std. Err. t P>t [95% Conf.Interval]

ethn1 1.990048 8.742036 0.23 0.820 -15.16019 19.14029

Ethn2 -25.74654 11.69993 -2.20 0.028 -48.69961 -2.793467

ethn3 -35.59552 11.98309 -2.97 0.003 -59.1041 -12.08694

ethn4 -3.244168 11.16836 -0.29 0.771 -25.15441 18.66607

english -11.44402 9.699576 -1.18 0.238 -30.47277 7.584741

lifeus 37.34419 13.86037 2.69 0.007 10.15274 64.53564

age1999 -.6272524 .3097408 -2.03 0.043 -1.234906 -.0195987

income .8068256 .1714309 4.71 0.000 .4705102 1.143141

incmis 14.07434 9.404149 1.50 0.135 -4.374848 32.52352

_cons 111.3607 24.13083 4.61 0.000 64.02051 158.7009

outreg
Outreg
  • Outputs data to a delimited file
  • Delimited file can be read into Excel
  • Very flexible
  • Creates publishable tables