1 / 10

Labor Economics Exercise session # 1 Artificial D ata G eneration

Labor Economics Exercise session # 1 Artificial D ata G eneration. TA: Natalia Shestakova October, 2007. Overview. Generating random variables Graphing Throwing seeds Generating random dummy variables from sample Drawing from multivariate distributions

Download Presentation

Labor Economics Exercise session # 1 Artificial D ata G eneration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Labor EconomicsExercise session # 1ArtificialData Generation TA: Natalia ShestakovaOctober, 2007

  2. Overview • Generating random variables • Graphing • Throwing seeds • Generating random dummy variables from sample • Drawing from multivariate distributions • Loops and distribution of estimated coefficients

  3. Generating random variables-1 Random-number functions: • uniform() returns uniformly distributed pseudorandom numbers on the interval [0,1). uniform() takes no arguments, but the parentheses must be typed. • invnormal(uniform()) returns normally distributed random numbers with mean 0 and standard deviation 1. Reminder: • Discrete uniform distribution: all values of a finite set of possible values are equally probable, continuous: all intervals of the same length are equally probable • Normal distribution: family of continuous probability distributions. Each member of the family may be defined by two parameters, location and scale: the mean ("average") and standard deviation ("variability"), respectively

  4. Generating random variables-2 Examples: 500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) 500 draws from the uniform distribution between 3 and 12 gen x4 = 3 + 9*uniform() 500 observations of the variable that is a linear combination of other variables gen z = 4 - 3*x4 + 8*x2

  5. Graphing

  6. Throwing seeds => Allows you to generate a particular sample anytime again: set obs 500 set seed 2 gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed 19840607 gen z3 = invnorm(uniform()) dotplot z1 z2 z3

  7. Generating random dummy variables from sample Task: generate a variable that characterizes whether an individual smokes (smoke=1) or does not (smoke=0) smoke. (a) for period 1, assume that (s)he smokes with probability 30%, (b) for each of the following 30 periods, there is a 65% chance that a smoker keeps smoking and a 5% chance that a non-smoker starts smoking Solution: • Note, that a uniformly distributed at [0,1) variable is less than 0.3 with 30% chance. Then: gen smoke = uniform()<.3 • first, for every individual, give her/him an ID and create observations for 30 years (they will be the same); then, step by step, update probabilities to smoke in every year for every ID: by pid: replace smoke=uniform()<(.05+.6*smoke[_n-1]) if _n>1

  8. Drawing from multivariate distributions Task: generate a number of variables that are correlated with each other (have multivariate distribution) Solution: (a) drawnorm: draws a sample from a multivariate normal distribution with desired means and covariance matrix drawnorm x y, n(1000) means(m) corr(C) (b) corr2data: creates an artificial dataset with a specified correlation structure (is not a sample from an underlying population with the summary statistics specified) corr2data x y, n(1000) means(m) corr(C) Note: matrices m and C can be specified using mat

  9. Loops and distribution of estimated coefficients Why to use loops? -> low probability that one randomly drawn sample coincides with the real one -> drawing more samples for estimating a coefficient of interest and taking the average of these coefficients makes the estimate closer to the real one How to use loops? gen b1=0 /* all observations of b1 are assigned 0 value local i=1 /* i is a counter variable in the following loop set more off /* useful command so we do not have to hit enter every time the regression runs while `i'<=500 { /* command to start a loop of 500 repeatitions drop _all /* drop all specified observations so we can randomly generate them again /*generate random variables /*regression scalar d =_b[x1] /* store the output of regression into a variable replace b1 = scalar(d) if _n==`i‘ /* put the estimated coefficient in the ith regression into ith observation of variable b1 local i=`i'+1 /* adds 1 to the counter } /*end of the loop

  10. Any questions???

More Related