### Teaching with Stata

Peter A. Lachenbruch

&

Alan C. Acock

Oregon State University

peter.lachenbruch@oregonstate.edu

alan.acock@oregonstate.edu

First Course Requirement—Data Entry

- I want a first course to be able to do the things I want students to do:
- Enter and edit data--must be “want to know topic”
- Students can do a small survey to get data on topics of interest to them.
- Voter poll
- Attitudes toward diversity issues on campus
- Beliefs about regulating the internet
- Learn how to create a codebook, use codebookandcodebook, compact
- Where possible use “real” data

First Course Requirement—Data Management

- Balance statistical content with proper data management content—hard decision
- Storing original dataset and creating a working dataset
- Keeping a record of every data modification they make using do-file
- Menu system is an aid
- Do-files are the requirement
- Missing values--distinguish types
- Variable names, labels, and value labels

First Course Requirements—Data Management

- Transformations – log, , exp
- Logical editing – beware of logical transformations when missing values are present (gen y = x < 10 leads to “.” transforming to 0)
- Appending
- Append student generated datasets
- Merging
- Merging two waves of data

First Course Requirements—Data Management

- Constructing Measures
- When to use egen newvar =rowtotal(var1, var2, var3)
- When to use egen newvar =rowmean(var1, var2, var3)
- When to use misschk command, what it does
- Suppose the variable category is 0 or 1
- If there are missing values in category, there is a difference between
- gen y = 1 if category
- gen y = 1 if (category==1)
- gen y = 1 if (category>0)
- The first and third will give scores of 1 for missing values. The second will give a score of 0 for missing values - BEWARE

First Course Requirements—Data Management

- edit command, insheet input, infile(csv files)
- gen newvar = ln(oldvar)
- Rarely use replace oldvar = sqrt(oldvar) – only when correcting an error – don’t replace data
- merge ptid assessment using file, update (need for data to be sorted)

First Course Requirement (2)

- Data presentation, numerical summary measures – summarize, detail; list; browse; edit; describe; codebook; codebook, compact
- Graphic presentation--bar chart, histogram, box plot seem minimum
- Probability computations – binomial, binomialtail, chi2, chi2tail, F, Ftail, normal – use of the inverse functions for these.

Examples

- summarize sp,detail; list sp; describe s*; codebook s*
- display binomial(10,3,0.1) for cumulative or display Binomial(10,3,.1) for reverse cumulative; Note disp 1-binomial(10,2,.1) gives the same result (also binomialtail(10,3,.1)
- display normal(1.2)
- gen y = invnormal(uniform())*5+20

First Course Requirement (3)

- Confidence intervals
- Binomial – ci—ci variable
- Normal – ci—ci variable
- Poisson – ci—ci variable, poisson
- Percentiles –
- summarize,d
- centile price, c(10(10)90)

Examples

- cii 20 4;
- cii 20 4, agresti
- Sometimes we want to use the Agresti formulation. The exact is usually preferable
- ci varname, level(99)
- summarize weakness, detail
- Can use su weakn,d (i.e. abbreviate commands, options and variables)
- centile weakness,c(20,40,60,80)
- Or centile weakness,c(20(20)80)

First Course Requirements (4)

- Hypothesis Testing:
- Normal r.v.s
- One sample (including paired data) -
- Two sample - ttest
- K samples – ANOVA
- Binomial variables
- One sample – proportion
- Two samples – tabulate, chi2

Examples

- ttest sp = 120 [one-sample]
- ttest spmen = spfem [paired]
- ttest spmen = spfem, unpaired unequal welch
- ttest sp, by(sex) [unequal welch etc.]
- Also immediate form – see help
- anova sp agegrp

Examples

- bitest success = 0.8[one sample binomial]
- tabulate success group, chi2 row col
- prtest success, by(group)[two sample binomial]

First Course Requirements (5)

- Hypothesis Testing (cont.)
- Power considerations – sampsi (or spreadsheet – nice exercise for some good ones)
- Nonparametric methods – sign, signrank, ranksum
- Contingency tables – tabulate, epitab

Examples

- sampsi 132.86 127.44, p(0.8) r(2) sd1(15.34) sd2(18.23)
- ranksum sp, by(survive)
- signrank before = after
- When should we supplement Stata with other software such as G*power 3 that is free and more flexible than sampsi or other software such as PASS or nQuery Advisor?

First Course Requirements (6)

- Simple linear regression – regress, rvfplot, other diagnostics
- Correlation – corr, spearman, ktau – I tend not to use corr because of the sensitivity to the normality assumption for tests and confidence intervals
- Only pwcorr and not corr provide test of significance

Examples

- regress mpg weight
- rvfplot
- Stata’s “type a little, get a little” very different from other packages
- correlate mpg weight or pwcorr mpg weight (especially when you have more than 2 variables – can specify sig and obs—Note that these only work with pwcorr)
- spearman mpg weight – would be nice to have Stata produce a Spearman correlation matrix

Examples

- It’s easy to use permutation tests

. permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest on estimation sample)

Monte Carlo permutation results Number of obs = 97

command: ttest ald7, by(anyhcq)

t: r(t)

permute var: anyhcq

---------------------------------------------------------------------------

T | T(obs) c n p=c/n SE(p) [95% Conf. Interval]

-------------+-------------------------------------------------------------

t | 1.648305 13 100 0.1300 0.0336 .071073 .2120407

---------------------------------------------------------------------------

Note: confidence interval is with respect to p=c/n.

Note: c = #{|T| >= |T(obs)|}

- One can do similar things with the bootstrap
- These are easy to use and intuitive for students

Use of Stata in the Classroom

- Use Stata sparingly
- It’s not easy to follow commands typed or used from menus – students will get confused
- Have handouts of what you do – make spacing large enough that students can annotate – even if only to write nasty things about the instructor
- Balancing coverage of Stata, e.g. data management with coverage of Statistics is a constant issue
- Remember – it’s a course in statistics, not in Stata

Data Sets

- Place data sets on a LAN or common drive or available for copying to flash drive or CD
- Use real data
- Not too many variables
- May have missing values – but should not affect main analyses – unless you want to demonstrate the problems with missing values

In the Classroom

- Using CD rather than flash drive is better(?)
- Many desktops have USB port located inconveniently (darn you Dell!)
- Sometimes newer PCs have USB port on monitor, and laptops usually have an easy slot for the flash drive
- Light level in the room should allow students to read easily
- Days of dim projectors are over

In the Classroom (2)

- Enlarge the Stata font by using right mouse button
- I have found that 14 point is pretty good
- Be careful about wraparound of output – if needed, reduce point size temporarily
- Don’t ever use red on blue font
- See what I mean? It’s more difficult to read
- Show how to move and fix windows

In the Classroom (2)

- Optimizing visibility with projector
- Use rich color background
- EditPreferencesGeneral preferences. Blue background option good but it relies on red for errors, green for Standard text, and doesn’t bold fonts.
- Custom may be better because you can make fonts bold and pick colors that do not disadvantage students who are colorblind.

Virtual Lab

- A server supporting 30 simultaneous sessions of Stata is remarkably inexpensive.
- A department can require students to have laptops or provide a cart with enough laptops
- Because laptops are really “dumb” terminals with server, the laptops can be cheap and not updated very often
- Any room becomes a lab
- Students should have 24/7 access to the server

Handouts and Data Sets

- Have handouts of your lecture notes
- Have handouts of your data analysis demonstrations
- Include commands as well as output!
- Data sets
- On line – LAN or CD or Floppy disk --Lots of laptops don’t have floppy drives any more, flash drives are inexpensive
- Include
- Student generated datasets
- Datasets with large Ns and relatively few variables

Emphasis in Course

- Lectures devoted to statistics
- Labs to learning Stata and working on homework and discussion
- Proper printing of output
- Don’t split output between two pages if possible (at least, find a good break point)
- Always use a monotype font (such as Courier New)

Some Final Issues

- Multiple testing can distort inference (i.e. doing 100 tests guarantees some significant results – but they may be meaningless) – Worry about this
- Controlling the digits in the output. Use outreg, estout, esttab

The End

