Statistical Methods for the Social Sciences RahulMukherjee REVIEW SESSION 01 TA: Marcio Cruz • email@example.com Office hours • Wednesdays 09:00-11:00 • Rigot10
Basic data management Where do I findinteresting data? GOOGLE!!! ;-) • Someinteresting links to MDEV and MIA students: • MACRO (agregate variables/different countries) • World DevelopmentIndicators (World Bank) • http://data.worldbank.org/data-catalog/world-development-indicators • World Economic Outlook Database (IMF) • http://www.imf.org/external/pubs/ft/weo/2011/02/weodata/index.aspx • World Economic Outlook Database (IMF) • http://stat.wto.org/Home/WSDBHome.aspx
Basic data management • By region: • US Economy- Federal Reserve Economic Data • http://research.stlouisfed.org/fred2/ • European Union Economy - ECB statistics • http://sdw.ecb.europa.eu/ • China – National Bureau of Statistics of China • http://www.stats.gov.cn/english/statisticaldata/ • Mexico – Banco de México • http://www.banxico.org.mx/estadisticas/index.html
Basic data management • MACRO DATA: You canfind macro dataset for most of countries on their central banks and national statistics bureau webpages. • Central Banks • http://www.bis.org/cbanks.htm • Official National Bureau of Statistics
Basic data management • MICRO (householdsurveys , firm-level data, etc.) • Official National Bureau of Statistic • http://www.census.gov/acs/www/data_documentation/public_use_microdata_sample/ • http://epp.eurostat.ec.europa.eu/portal/page/portal/microdata/introduction • http://www.esds.ac.uk/international/access/micro.asp • International Organizations • http://microdata.worldbank.org/index.php/home • Some blogs provide good links: • https://sites.google.com/site/medevecon/development-economics/devecondata/micro • http://openmicrodata.wordpress.com/ • Facultywebpages • http://dvn.iq.harvard.edu/dvn/dv/JAngrist
Basic Excel 2 . How should I downloadthis data? Let us startwith an exampleusing MACRO data (from WDI). • .csv , .txt or .xls? Whatis the difference? • How to manage this data on excel? • How to sort this data? • How to do basic math operations on excel? • How to get basic descriptive statistics on excel? • How to generate a graph?
Statistical packages Whyshould I manage data using a statistical package? It providesyou more flexibility and youcankeep the information about whatyoudid in yourresearch! Someexamples of statistical packages: http://en.wikipedia.org/wiki/List_of_statistical_packages SPSS – comprehensive statistics package EViews – for econometric analysis Stata – comprehensive statistics package; SAS – comprehensive statistical package MATLAB – programming language with statistical features; R – A free implementation of the S language. S-PLUS – general statistics package
Basic STATA 3 . Wherecan I findresources and tips for learning STATA? GOOGLE!!! ;-) • Stata webpage, universities webpage, etc. • Resources for learning Stata http://www.stata.com/links/resources1.html • Stata Starter Kit: Learning Modules http://www.ats.ucla.edu/stat/stata/sk/modules_sk.htm • Getting Started in Data Analysis http://dss.princeton.edu/training/
Basic STATA This link provides some exercises from the course's textbook: Statistical Methods for the Social Sciences, the 3rd edition by Alan Agresti & Barbra Finlay http://www.ats.ucla.edu/stat/examples/smss/default.htm Textbook Examples: Introduction to the Practice of Statistics by David Moore and George McCabe http://www.ats.ucla.edu/stat/examples/mm/default.htm • How to start on STATA? • .do, .dta, .log files? • USE .do FILES!!! Why? You cankeep the information about everythingyou have done! • If youneed to manage data: use .do file!
.do FILE • How to use a .do file? • Open STATA • New .do file editor • Set memory (thiscanimprovethe performance of STATA), but itdependson the capacity of your computer. So, if itdoes not work, youshoulddemandlessmemory. (You don’tneed to use this command) ex: set memory 1200m • Define the directory youwillwork: cd "C:\Users\My Documents… " Seeexample: " rs01_example01.do "
Importing data to STATA 4. How to import data fromexcel to STATA? Importing data from excel: Source: http://www.stata.com/support/faqs/data/newexcel.html 1. A rule to remember Stata expects one matrix or table of data from one sheet, with at most one line of text at the start defining the contents of the columns. 2. How to get information from Excel into Stata • Start Excel. • Enter data in rows and columns or read in a previously saved file. • Highlight the data of interest, and then select Edit and click Copy. • Start Stata and open the Data Editor (type edit at the Stata dot prompt). • Paste data into editor by selecting Edit and clicking Paste. You can do this (2), but betteravoidit!Why???
INSHEET COMMAND THE BEST WAY TO IMPORT DATA FROM EXCEL!!! 3.1 insheet command • Launch Excel and read in your Excel file. • Save as a text file (tab delimited or comma delimited) by selecting File and clicking Save As. If the original filename is filename.xls, then save the file under the name filename.txt or filename.csv. (Use the Save as type list—specifying an extension such as .txt is not sufficient to produce a text file.) • Quit Excel if you wish. • Launch Stata if it is not already running. (If Stata is already running, then either save or clear your current data.) • In Stata, type insheet usingfilename.ext, where filename.ext is the name of the file that you just saved in Excel. Give the complete filename, including the extension. • In Stata, type compress. • Save the data as a Stata dataset using the save • command.
Importing data to STATACommon problems 5.1 Nonnumeric characters • One cell containing a nonnumeric character, such as a letter, within a column of data is enough for Stata to make that variable a string variable. 5.2 Spaces • What appear to be purely numeric data in Excel are often treated by Stata as string variables because they include spaces 5.3 Cell formats • Much formatting within Excel interferes with Stata's ability to interpret the data reasonably. Just before saving the data as a text file, make sure that all formatting is turned off, at least temporarily. You can do this by highlighting the entire spreadsheet, selecting Format, and then Cells, and clicking General.
Importing data to STATACommon problems 5.4 Variable names • Stata limits variable names to 32 characters and does not allow within such names any characters that it uses as operators or delimiters. Also, variable names should start with a letter. 5.5 Missing rows and columns • Completely empty rows in a spreadsheet are ignored by Stata, but completely empty columns are not. A completely empty column gets read in as a variable with missing values for every observation. 5.6 Leading zeros • With integer-like codes, such as ICD-9 codes or U.S. Social Security numbers, that do not contain a dash, leading zeros will get dropped when pasted into Stata from Excel. One solution is to flag within the first line that the variable is string: add a nonnumeric character in Excel on that line, and then remove it in Stata. 5.7 Filename and folder • Confirm the filename and location of the file you are trying to read. Use Explorer or its equivalent to check.
STATA - data types • Numeric variables • String variables • Whatis a ‘STRING’ variable ? How to deal withthem?
Some basic commands • Summary: sum • Conditions: if, &, | • Sort variables: sort • Ordervariables: order • Generatevariables: gen var • Drop variables (columns): drop • Drop rows: drop in • Concatanatevariables: concat() • Destringvariables: destringvar, replace • Generatenumerical variables from string variables: tab var, gen(newvar) • Basic math operations: / ; *; -; + or rsum(var1, var2, …, varn); • Replace: replace var • Collapse: collapse (sum) var, by(var) – see help collpase
Linkingwith class notes… How to generate a quantitative variable from a categorical variable? For example: . Favorite music type of (rock, jazz, folk, classical) Command on STATA tab, gen(name of the var. For example: music) tab, gen(music)
EXERCISE The slide on page 30 of the first class notes is the following: www.stat.ufl.edu/~aa/social/data.html
EXERCISE • Access this webpage (www.stat.ufl.edu/~aa/social/data.html) and do the following procedure: • Download the data in Excel; • Plot a graph showing the age of students (on axes x) and the time theyspend on TV (on axes y); • Plot a pie graph showing the number of males and females; • Save this data as .csv; • Transfer this data to STATA • Identifywhich variables are numerical and which one are string. • Plot a graph showing the age of students (on axes x) and the time theyspend on TV (on axes y); • Plot a pie graph showing the number of males and females; • How many of thesestudents are: • D = Democrat, R = Republican, I = independent ? • Generate a variable called average_gpa that is: • average_gpa = (high school GPA (on a four-point scale) + college GPA)/2
I have a problem on STATA… • If you have anydoubt about how to use one specificprocedure on STATA, how shouldyou deal withthis? • 1. Google!!! ;-) …. If thisdoesn’twork: • 2. Google!!! Tryagain, maybeyouhaven’tsearchedproperly… but, if thisdoesn’twork: • 3. Google!!! Try once more, just in case. • 4 . Command HELP on STATA. • 5. Sendyour questions to statalist: http://www.stata.com/statalist/ • 6. Talk to you TA • 7. Talk to yourProfessor • You can talk to your TA wheneveryouwant, but tryat least the first 4 steps. This willbe important for developingyourskills to deal with Stata! ;-)