1 / 26

Computing for Research I Spring 2014

Computing for Research I Spring 2014. Stata Graphics February 24. Primary Instructor: Elizabeth Garrett-Mayer. Basic syntax for commands. p refix: command varlist , options Examples: regress y x , level(90) by race: sum y x , detail t test y , by(x) unequal.

walter
Download Presentation

Computing for Research I Spring 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing for Research ISpring 2014 Stata Graphics February 24 Primary Instructor: Elizabeth Garrett-Mayer

  2. Basic syntax for commands • prefix: command varlist, options • Examples: • regress y x, level(90) • by race: sum y x, detail • ttesty, by(x) unequal

  3. Stata Graphics • Maybe we can just end class now! • Check out these links: • http://www.ats.ucla.edu/stat/stata/library/GraphExamples/default.htm • http://www.ats.ucla.edu/stat/stata/topics/graphics.htm • http://data.princeton.edu/stata/graphics.html • http://www.stata.com/capabilities/graphics.html

  4. Basic univariate displays • Boxplots • Histograms • Density plots

  5. Ceramide Data • Let’s look at the ceramide markers • What are their distributions? • Are there outliers? • Should we consider taking logs, or using % change? Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head and neck cancers: serum C₁₈-ceramide as a novel biomarker for monitoring response. Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB, Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K, Ogretmen B. ClinCancer Res. 2011 Sep 15;17(18):6097-105. Epub 2011 Jul 26.

  6. Histogram • hist c18

  7. Let’s make it prettier * prettier histograms hist c18 , freqxaxis(1 2) ylabel(0(2)24) xlabel(20 "Twenty" 40 "Forty") hist c18, title("Histogram of C18 Ceramide") subtitle("PI: K. Shirai") hist c18, ytitle("number of patients") freqyline(0(10)20) hist c18, xaxis(1 2) xlabel(19.6 "mean" 11.9 "median", axis(2) grid) finding help on these can sometimes be tricky! e.g. help axis_choice_options

  8. Boxplots • graph box c18

  9. Boxplots graph box c18, by(cycle) graph box c18, over(cycle) tab cycle graph box c18 if cycle<7, over(cycle) sort patient cycle merge m:1 patient using "Ptdata.GemDox.dta" graph box c18 if cycle<7, over(cycle) over(gender) graph hbox c18, over(initial) capsize(5)

  10. graph hbox c18, over(initial) capsize(5) graph hbox c18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large)) graph hbox c18, over(initial) ytitle(“C18”)

  11. Labels • Sometimes xlabels cannot be applied (e.g. boxplots) • need to label your values • Example: cycle for boxplots • label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle 5" 7 "cycle 7" • label values cycle cycle • graph box c18 if cycle<7, over(cycle) • (Hint: use this on the homework!)

  12. Dotplot • Excellent way to show data across groups when you have a relatively small dataset • dotplot y, over(group) dotplot c18, over(cycle) dotplot c18, over(gender) dotplot c18, over(gender) nogroup dotplot c18, over(gender) nogroup jitter(3) dotplot c18, over(gender) nogroup median center

  13. Dotplot, by gender

  14. Scatterplots • Two way graph • Syntax: • graph twoway scatter y x1 x2 • graph twoway scatter y x1 • Example: • graph twoway scatter c18 totalceramide

  15. Regression example • Scatterplot • Residual plots • Leverage • Fitted line with raw data

  16. Code * generate new variable gen logc18=log(c18) scatter logc18 totalcer scatter logc18 totalcer, mlabel(gender) help marker_label_options scatter logc18 totalcer, mlabel(gender) mlabp(0) scatter logc18 totalcer, mlabel(gender) mlabp(0) s(i) scatter logc18 totalcer, s(Oh) * redo regression regress logc18 totalcer rvfplot, yline(0) lvr2plot predict logfit * make plot of fitted model and raw data scatter logfit logc18 totalcer scatter logfit logc18 totalcer, s(i o) c(l .) graph twoway scatter logfittotalcer, s(i) c(l) || scatter logc18 totalcer, s(o) c(.) graph twoway scatter c18 totalcer regress c18 totalcer * residual plot * (residual vs. fitted) rvfplot * the long way * 1. generate a new variable from the regression, residuals predict resid, res * 2. generate a new variable from the regression, fitted values predict fit scatter res fit, yline(0) * leverage vs. residual plot lvr2plot * take transform of C18? gladder c18 boxcox c18

  17. The next graph to create

  18. Fancier way to put regression lines * data is described at http://data.princeton.edu/wws509/datasets/ infilestr14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw graph twoway scatter change setting graph twoway (scatter change setting ) (lfit change setting ) graph twoway (scatter change setting ) (qfit change setting ) graph twoway (scatter change setting ) (lfitci change setting ) • /// “continuation” comment • scatter makes a scatterplot of the two variables • lfit plots the regression line of y on x • qfit plots a fitted quadratic model of y on x • lfitci plots the line AND a confidence interval!

  19. Fancier way to put regression lines Plot using qfit Plot using lfitci

  20. graph twoway (lfitci change setting) (scatter change setting, mlabel(country) ) • One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua). • We can solve this problem by specifying the position of the label relative to the marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option. • We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock).

  21. gen pos=3 replace pos = 11 if country == "TrinidadTobago" replace pos = 9 if country == "CostaRica" replace pos = 2 if country == "Panama" | country == "Nicaragua“ graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) * see ‘marker_label_options’ in help

  22. Legends graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(pos(5) order(2 "linear fit" 1 "95% CI")) graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(off) * see help ‘title_options’ for pos and ring in legend

  23. Spaghetti plots Command available from UCLA: spagplot * spaghetti plots clear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv" finditspagplot spagplot c18 cycle, id(patient) spagplot c18 cycle, id(patient) nofit * remove patients who only have cycle=1 sort patient cycle by patient: gen visit=_n egenmaxvis=max(visit), by(patient) spagplot c18 cycle if maxvis>1, id(patient) nofit * or, use c(L) graph twoway scatter c18 cycle if maxvis>1, c(L) help connectstyle

  24. other neat stuff • graph matrix • saving graphs: click and save as desired format • saving and combining (see princetonsite, section 3.3) • http://data.princeton.edu/stata/graphics.html • See GraphExamples on ucla site: • http://www.ats.ucla.edu/stat/stata/library/GraphExamples/

More Related