advanced stata workshop n.
Skip this Video
Loading SlideShow in 5 Seconds..
Advanced Stata Workshop PowerPoint Presentation
Download Presentation
Advanced Stata Workshop

Loading in 2 Seconds...

play fullscreen
1 / 59

Advanced Stata Workshop - PowerPoint PPT Presentation

  • Uploaded on

Advanced Stata Workshop. FHSS Research Support Center. Presentation Layout. Visualization and Graphing Macros and Looping Panel and Survey Data Postestimation. Visualization and Graphing in Stata. Intro To Graphing In Stata. “graph” is often optional. So is “ twoway ” in this case.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Advanced Stata Workshop' - vonda

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
advanced stata workshop

Advanced Stata Workshop

FHSS Research Support Center

presentation layout
Presentation Layout
  • Visualization and Graphing
  • Macros and Looping
  • Panel and Survey Data
  • Postestimation
intro to graphing in stata
Intro To Graphing In Stata

“graph” is often optional. So is “twoway” in this case.

Note: Nearly all graphing commands start with “graph”, and “twoway” is a large family of graphs.

creating multiple graphs with by
Creating Multiple Graphs with “by():”

Note that the value label is displayed above the graphs, and the variable label is displayed in the bottom right hand corner.

overlaying twoway graphs
Overlaying “twoway” graphs

The || tells Stata to put the second graph on top of the first one – order matters! You don’t need to type “twoway” twice; it applies to both.

This is another way of writing the command – it doesn’t matter which one you use.

by statements with overlaid graphs
"by()" statements with overlaid graphs

“qfitci” is a type of graph which plots the prediction line from a quadratic regression, and adds a confidence interval. The “stdf” option specifies that the confidence interval be created on the basis

stdf is an option of qfitci.

by(foreign) is an option of twoway.

by statements with overlaid graphs1
"by()" statements with overlaid graphs

Another way of writing the previous command is:


This was is easier to read.

This way is easier to type.

graphs with many options and overlays
Graphs with Many Options and Overlays

You can make pretty impressive graphs just from code, if you overlay the graphs and specify certain options like: multiple axes, notes, titles and subtitles, axis titles and labels, and legends.

code for previous graph
Code for Previous Graph

This may look scary, but it is actually fairly straightforward. See the accompanying do-file for explanation of each component.

using the graph editor
Using the Graph Editor

It is often easier to make changes in the graph editor than to specify all the options in code.

Let’s make graph 1 into graph 2 by using the graph editor tools.

recording edits in the graph editor
Recording Edits in the Graph Editor

Before you start making changes, click the record button. After you are done, click it again, and save your changes as a recording so you can “play” them back later. We will save this recording as advanced_workshop_1.

play your graph recording
Play Your Graph Recording

You can create a graph, open the graph editor, click the green play button, and then play back your recorded edits.

Or, you can play your edits right from the code:

You can run your recorded edits on a graph of a different type, though in this case not all of your edits will make sense:

You can also run all of your recorded edits on a different graph, and just change the title:

storing and moving your recordings
Storing and Moving Your Recordings

Graph recordings are stored as .grec files in your “personal” folder, under the “grec” folder. Type “personal” to see where this is; normally it is C:\ado\personal. So by default Stata should store your .grec files in C:\ado\personal\grec.

Unfortunately, if you are not faculty, you are probably using lab computers to use Stata, and when they are re-imaged, you will lose the files in your grec folder. So you can store the recordings on your flash drive by clicking the Browse button when you save your recording. Now, when you are in the graph editor and click the play button, your recording will not appear in the list because it is not stored where Stata knows to look for it. Never fear, just click Browse, and navigate to where your .grec file is. If you want your recording to be available right from code, as in play(advanced_workshop_1), you will need to move it (at least temporarily) to the “grec” folder, or write the directory location in the code: play(E:\flashdrive\Graph Recordings\advanced_workshop_1)

using schemes in graphing
Using Schemes in Graphing

Recordings are great if you are going to be making the same kind of graph a lot. But a recording for a scatter plot will hardly affect a histogram at all, and might even make it look terrible. If you want to change the look of all graphs that you make, you may want to make a scheme. Schemes are text files which tell Stata how to draw graphs.

more on schemes
More on Schemes

Schemes are very powerful, because they let your implement a certain look without specifying a long series of options in every graph, or running every graph through the graph editor. However, creating schemes is fairly time consuming.

For more on creating your own schemes, see:


manipulating graphs memory vs disk
Manipulating Graphs: Memory vs. Disk
  • When you draw a graph, it is stored in memory, under the name Graph.
  • If you draw another graph, it replaces the previous one in memory, and is now called Graph.
  • If you want to have multiple graphs up at the same time, you can use the name option.
  • graph save moves your graph from memory to disk, saving it as a .gph file.
  • graph dir lists all graphs in memory and on disk (in the current directory)
  • graph drop drops a graph from memory. Graphs contain the data files they represent, so if the dataset is large, they can actually take up quite a bit of memory.
manipulating graphs demo
Manipulating Graphs: Demo

Graph manipulation commands are quite useful for exploratory analysis.

See do file for code.

more example graphs
More Example Graphs

Note: Annotated code is in the do file for all of these

Histogram, with overlaid normal distribution

more example graphs1
More Example Graphs

Use graph bar to make bar graphs

more example graphs2
More Example Graphs

Use graph combine to combine 3 graphs into one:

more example graphs3
More Example Graphs

Graph matrix is a great alternative to a correlation matrix to investigate relationships between variables

more example graphs4
More Example Graphs

Get data labels (called marker labels in Stata) from the values of another variable

more example graphs5
More Example Graphs

Xtline from a panel data set can overlay lines for each value of panel variable.

  • Macros come in two general types:
    • Globals
      • Exist until Stata is closed
    • Locals
      • Exist until the end of the do file
  • Other types of macros exist, but are rarely used
g lobal vs local
global vs. local

Creating the global

Creating the local

- References to locals have to

be enclosed in single quotes

- References to globals have to

begin with a $

End of the do file

The local no longer exists

Conversely, the global still exists

when do we need for loops
When do we need “for” loops?
  • If a STATA program involves repetitive actions on a group of variables, files, or other items
  • Examples
    • Creating new variables
    • Recoding missing values on a list of variables
    • Merging multiple datasets
    • Labeling variables
determining what macros already exist
Determining what macros already exist

The local we created

General macros automatically created by Stata

The global we created

  • Syntax of foreach command
    • foreachlname {in|ofvarilist} variables {

commands referring to `lname'


  • The open brace must appear on the same line as the foreach;
  • Nothing may follow the open brace except, of course, comments; the first command to be executed must appear on a new line;
  • The close brace must appear on a line by itself

Differences in Using -in- option and -of varlist- option in the -foreach- command

    • foreachi in variable1-variable5 {

Stata commands


        • There is only one variable called “variable1-variable5”
    • foreachi of varlist variable1-variable5 {

Stata commands


        • There are five variables, including variable1 through variable5
running parallel lists with macros
Running Parallel lists with macros

Create a local called “1”

Create local called “2”

Create macro 3 = # of words in macro 1

Extracting word `I’ from local “1”

Extracting word `I’ from local “2”

Using the new locals in a display command with other text


creating a program in stata
Creating a program in Stata

Program name

Command name

First command to be run when the program is implemented

Second command to be run when the program is implemented

Telling Stata that there are no more commands to be used as part of the program

simple vs complex sample
Simple vs. ComplexSample
  • Many Statistical techniques assume simple random sample
  • Simple random sample—each element of the sample has equal probability of being sampled.
complex survey
Complex Survey
  • Sampling weights
    • inverse probability of being sampled
    • represent weight elements in the population
  • Clustering
    • groups sampled together
    • primary sampling units (PSU) -- first level clusters
  • Stratification
    • groups of clusters– strata
    • strata sampled separately
  • States, Counties, Schools, Students

sample states in different regions

sample counties within each state

sample schools within each county

sample students from schools

  • svysetpsu? [pweight=?] , strata = (?) fpc(?)

|| psu?, fpc(?)

psu = primary sampling unit

pweight = probability weight

fpc = finite population correction (total # of stratus or clusters PSU is sampled from)

|| = next stage

svyset examples
SVYSET Examples
  • use
  • svyset county [pw=sampwgt], strata(state) fpc(ncounties) || school, fpc(nschools)
  • save highschool
  • use highschool
  • svyset
svy prefix examples
SVY Prefix Examples
  • svy: proportion race sex
  • svy: tab race sex, ci
  • svy: tab race sex, count ci
  • svy, subpop(if sex==1): mean weight height
  • svy, subpop(if sex==2): mean weight height, over (race)
  • svy: reg weight sex

Note: subpop is preferred over “if statement” as stata will include all cases for estimating standard errors

take home message
Take-home Message
  • Ask what sampling design for your data before running analysis.
  • If complex survey data, consider svyset or multilevel modeling.
x tset declare p anel data
xtset—Declare Panel Data
  • xtsetpanelvarspecifyunit observed repeatedly
  • xtsetpanelvartimevar [, tsoptions] specify time var
  • xtsetdisplay current xtset
  • xtset, clear clearxtset


Statistics > Longitudinal/panel data > Setup and utilities > Declare dataset to be panel data

time unit options
Time-Unit Options
  • [unitoptions] specify units of time

clocktime, daily, weekly, monthly, quarterly, halfyearly, yearly…

  • [deltaoption] specify duration between observations

delta (#) e.g. deta (2)

delta (exp) delta (7*24)

delta (# units) delta (10 min)/(7 days)

xtdescribe pattern of xt data
Xtdescribe—pattern of xt data
  • xtdescribe [if] [in] [, options]


patterns(#) e.g. p(10) -- display max. 10

width(#) w(80) -- display 80 columns


Statistics > Longitudinal/panel data > Setup and utilities > Describe pattern of xt data

  • use
  • xtset
  • Browse
  • xtdes, p(20)
  • xtsum hours
  • xttab race
  • xtregln_w grade age ttl_exp tenure south, mle
generating variables with fitted values

Generating variables with fitted values

  • After a regression, use the “predictnewvar” syntax to create a new variable,
  • that contains the fitted values for each observation.
  • If the model is fitted only for a limited sample, use the following syntax to
  • get the predicted value for that sample
generating variables with residuals
Generating variables with residuals
  • After a regression, use the “predictnewvar, r ” syntax to create a new variable,
  • that contains the residuals for each observation.
reformat and write regression tables to document files
Reformat and write regression tables to document files
  • ‘Outreg’ command can be used to reformat and write regression tables to document files
  • Example
  • Outreg has lots of options that lets us customize the look of the output table.
  • Margins can be useful to understand regression results
  • Example –
  • In the above regression, the coefficient on weight is misleading as an increase in weight affects both

weight and weight squared. So, the total effect depends on the starting value of weight.

  • The following command will set the variables to their means and find the derivative of expected price with

respect to weight at that point.

  • Often, the results from margins can be hard to read as in the following example.
  • The command ‘marginsplot’ can be used to visualize the results and understand them better.
  • Example
using saved r esults
Using saved results
  • Stata stores results from a command in various forms – scalar, string, matrices etc. Such results are called returned results
  • Returned results can be used to make other computations in STATA
  • We can type ‘return list’ after we run a command to see what the returned results
  • Example –
  • We can use the returned results as variables and perform computations
  • Example – gen range = r(max) – r(min)
using saved results contd
Using saved results contd…
  • Results are stored mainly as r() class or e() class depending on the commands used
  • Access r() class results – return list, access e() class results – ereturn list
  • Matrices in returned results can be used as regular matrices.
  • Example :
  • More advanced computations with matrices can be done in MATA which is a matrix language built into STATA.
post estimation statistics
Post estimation statistics
  • estatic
  • Available only after commands that report log likelihood
  • Given two models, the one with the smaller AIC and BIC values fits the data better
  • estatvce

- displays the covariance matrix estimates

  • Results can be stored into a STATA dataset using the ‘postfile’ command
  • This can be useful when we have to run a lot of regressions, for example - montecarlosimulations.
  • Lets consider an example from the STATA manual –

Suppose we want the means and variances from 10,000 randomly constructed 100-observation samples of data and store the results in results.dta

We could do that as follows (refer to the do file)