- 115 Views
- Uploaded on
- Presentation posted in: General

Getting Started with Stata

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Getting Started with Stata

2/11/2010

Tom Tomberlin

Nealia Khan

Learning Technologies Center

Harvard Graduate School of Education

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

Why use Stata?

- Availability
- Can self-program, or use menus
- Cutting –edge statistical methods (including user-defined functions)
- Publication-quality graphics

- A word about programming in and using Stata
- Stata is case sensitive, so Myvar is different from myvar
- All commands in Stata are lower-case
- “and’ = &, “or” = |, “not”= !
- Assignment is “=“ , value equivalency is “==“

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

- Opening Stata
- Opening Data:
- Stata formatted data
- “use” command

- Comma-separated variables
- “insheet using”

- Tab-delimited variables
- “insheet using”

- Flat-files
- Create a dictionary

- Stata formatted data

- Exercise 1:
- Open Stata
- Using the insheet command, open the comma-separated variables data file located in
- F:\workshops\SATdata.csv
- (HINT: all Stata commands must be written in lower case.
- Don’t forget to put pathnames in quotes!)

- F:\workshops\SATdata.csv

- Look at your data – did our data import correctly?
- How are our data measured?
- What kinds of variables do we have?

- How would we describe the distribution of our data?
- Graphs
- Histograms
- Scatterplots

- Charts/Tables
- Frequency tables
- Cross-tabs

- Graphs

- There are several ways to look at our data in Stata
- Editor
- Browser
- Stata commands
- codebook
- des
- Tables of frequency and distribution
- Graphs of distribution

- Let’s look at how the variable ‘csat’ is distributed
- hist csat
- tab csat

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

What are do-files?

‘Do’ files are essentially a syntax list of all of the commands that you wish to run, and the setting that you would like to set

- Why use them?
- Replication
- Collaboration
- Audit trail
- Help

- How to create and run one

- Creating and running a do-file

- EXERCISE 2: Create a simple do-file from the commands that you have already entered.
(HINT: you must clear the data in memory before opening a new dataset.)

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

- Labeling
- To label a variable: label var varname label
- To label values:
- label define labelname 1 ‘high’ 0 ’low’
- Label val varname labelname

- Renaming
- ren varname1 varname2

- Recoding
- recode varname oldvalue=newvalue

- Generating a new variable
- gen newvarname=somevalue

- Replacing values of an already generated variable
- replace newvarname=somevalue

- Subsetting
- keep
- drop
- if
Merging

merge

must sort both files by the linkage variable!

ex: merge linkage_var using “F:\workshops\newfile”

- EXERCISE 3:
- generate a dichotomous variable called hi_score from the csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922.
- label it as 0=low and 1=high.

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

- Univariate analysis
- summarize
- histogram
- Table
Bivariate analysis

tabulate

pwcorr

ttest

- EXERCISE 4:
- Generate a histogram of the expense variable
- generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable
- If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education.

- Multivariate models
- Linear regression
regress depvar indepvar1 indepvar2 … indepvarN

- Logistic Regression
- logit depvar indepvar1 indepvar2 … indepvarN

- Linear regression

- Exercise 5:
Generate two scaterplots – one to look at the relationship between expense and csat , one to look at expense and hi_score.

Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores

- Overview of Stata
- Getting Started
- ‘Do’ files
- Basic data cleaning
- Basic data management
- Beginning analysis
- Special topics (time permitting)

Questions?

Gutman Library, room 323a&b

http://www.isites.harvard.edu/icb/icb.do?keyword=ltc