biostat 201 winter 2011 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Biostat 201: Winter 2011 PowerPoint Presentation
Download Presentation
Biostat 201: Winter 2011

Loading in 2 Seconds...

play fullscreen
1 / 46

Biostat 201: Winter 2011 - PowerPoint PPT Presentation


  • 307 Views
  • Uploaded on

Biostat 201: Winter 2011. Lab Session 1 Week 1 and Week 2. Introduction. Wendy Shih wendyshi@ucla.edu Office Hours: Tues 2-3pm or by appointment A1-228 or Biostat Consulting Room (two doors to the left of the Lab). Access to SAS/STATA. In the lab: login= sph , password=hello

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Biostat 201: Winter 2011' - MikeCarlo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
biostat 201 winter 2011

Biostat 201: Winter 2011

Lab Session 1

Week 1 and Week 2

introduction
Introduction
  • Wendy Shihwendyshi@ucla.edu
  • Office Hours:
    • Tues 2-3pm or by appointment
    • A1-228 or Biostat Consulting Room (two doors to the left of the Lab)
access to sas stata
Access to SAS/STATA
  • In the lab: login=sph, password=hello
  • one year SAS student license
    • Check with your department
    • www.softwarecentral.ucla.edu
  • Computers/laptops at UCLA library
  • TLC lab at Biomed library
  • STATA Only
    • shortcut.clicc.ucla.edu
typical lab session
Typical lab session
  • 4 assignments total
  • Brief (very brief!) overview of the assignment
  • Introduce statistical tools/methods that may be helpful with accompanying SAS/STATA code fragments
  • Further discussion (time permitting)
  • Go analyze!
some additional notes
Some additional notes
  • Both SAS and STATA code will be introduced, but need only to know how to use one (so use whichever is most familiar to you)
  • Code will not be given to you in electronic format
  • Might want to bring a USB drive or have a way to save your documents
  • No raw outputs from SAS or STATA. All submitted results must be formatted.
please do not paste raw outputs
Please Do NOT Paste Raw Outputs

. tabstat dage, by(grad) stat(n mean semean min max)

Summary for variables: dage

by categories of: grad (Center Grade)

grad | N mean se(mean) min max

---------+--------------------------------------------------

excellen | 36 29.13889 1.993702 18 68

good | 36 30.27778 1.581446 18 60

fair | 36 37.13889 1.792911 18 55

poor | 36 37.97222 1.853134 19 69

---------+--------------------------------------------------

Total | 144 33.63194 .9552307 18 69

------------------------------------------------------------

The MEANS Procedure

Analysis Variable : dage

N N

grad Obs Miss Mean Std Error Minimum Maximum

-------------------------------------------------------------------------------------------

1 36 0 29.1388889 1.9937015 18.0000000 68.0000000

2 36 0 30.2777778 1.5814455 18.0000000 60.0000000

3 36 0 37.1388889 1.7929105 18.0000000 55.0000000

4 36 0 37.9722222 1.8531338 19.0000000 69.0000000

-------------------------------------------------------------------------------------------

formatted results
Formatted Results

Table 1: Summary Statistics for Donor Age (Years) by Center Grades

the assignments
The assignments
  • All four assignments are reports, not problem sets
    • Introduction
    • Methods
    • Results
  • Can be submitted via e-mail as a Microsoft Word file
  • E-mail: wendyshi@ucla.edu
    • Subject: Biostat 201 W10 hw# Last First
    • Filename: Biostat 201 W10 hw# Last First
    • ex: Biostat 201 hw1 Shih Wendy
assignment grades
Assignment grades
  • Graded on a 0.0 – 4.0 scale
    • 0.0 to 1.9: major errors / misunderstandings
    • 2.0 to 2.5: a few major or multiple minor errors
    • 2.6 to 3.0: a few minor errors
    • 3.1 to 3.5: good/excellent job
    • 3.6 to 4.0: very impressive!
assignment expectations
Assignment expectations
  • Brief
    • 2.5-3.5 pages (with tables and figures), 12pt, double-spaced is often sufficient
  • Complete
    • Requested analyses were performed and properly interpreted
  • Logical
    • Has an easy-to-follow flow
    • Easy to see how the analyses guided each step of the investigation
    • No ambiguity on what you were thinking
common pitfalls
Common pitfalls
  • Lack of explanation
    • Why are you doing what you are doing?
    • Example:
      • We run a multivariate linear regression. (why?)
      • We run a multivariate linear regression to evaluate the association between crime rate and depression while adjusting for socioeconomic factors. (ah, that’s better!)
common pitfalls12
Common pitfalls
  • Lack of interpretation
    • On what basis are you making your claims?
    • Example:
      • There is a significant difference between the IQ’s of UCLA and USC students. (what makes you say this?)
      • The two-sample t-test result indicates that the SAT scores of UCLA and USC freshmen are statistically different (p=0.0032), with UCLA students having an average SAT score that is 220 points greater than USC students. (note: method used, measure used, statistical significance, magnitude, direction)
common pitfalls13
Common pitfalls
  • Lack of follow-up
    • How exactly did your findings guide you in your investigation?
    • Example:
      • A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. (How does this finding influence your analysis?)
      • A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. Therefore, the association of SAT score and GPA among males and females were evaluated separately.
questions to ask yourself
Questions to ask yourself
  • What are you investigating?
  • What analytical method are you using to investigate it?
  • What do the results of that analysis tell you?
  • How do those results guide your subsequent analyses, or what conclusions do you draw from it?
sas stata code key
SAS/STATA code key
  • I will use the following convention in these slides:
    • statements: bold
    • keywords: italics
    • options: underlined
    • Variables, or something you specify yourself: courier font
what do we need to do
What do we need to do?
  • Import data
  • Summary statistics and plots
  • Choose and specify a model
  • Investigate if the model is appropriate
  • Predicted mean differences for covariate profiles
  • Conduct and interpret the model results
sas importing data
SAS: Importing data
  • http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm
  • http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm
  • Can use import wizard:file  import data…
  • proc importout=datasetdatafile="directory_of_excel_file"dbms=excelreplace;sheet="sheet_name";run;
sas importing data19
SAS: Importing data
  • http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm
  • http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm
  • Can use import wizard:file  import data…
  • proc importout=hdldatadatafile="C:\SAS\data\hdltable.csv"dbms=csvreplace;sheet="sheet3";run;
stata importing data
STATA: Importing data
  • http://www.ats.ucla.edu/stat/stata/faq/readcommatab.htm
  • cd "directory_of_csv_file"
  • insheetusingfile_name
example kidney data
Example: Kidney Data

SAS

proc import

datafile="G:\TA - Biostat 201 Winter 2011\KIDNEY.csv“

out=kidney

dbms=csv

replace;

run;

STATA

cd "G:\TA - Biostat 201 Winter 2011"

insheet using "KIDNEY.csv"

sas summary statistics
SAS: Summary statistics
  • proc meansdata=dataset [options];varvar1 var2 var3;run;
  • proc meansdata=dataset [options];classgrpvar;varvar1 var2 var3;run;
  • proc univariatedata=dataset;varvar1 var2 var3;run;
sas summary statistics23
SAS: Summary statistics

procmeans data=kidney nmiss mean stderr min max;

var dage cith;

run;

procmeans data=kidney nmiss mean stderr min max;

class grad;

var dage cith;

run;

procunivariate data=kidney;

var dage cith;

run;

procunivariate data=kidney;

class grad;

var dage cith;

run;

stata summary statistics
STATA: Summary statistics
  • summarizevar1 var2
  • bysort grpvar: summarizevar1 var2
  • summarizevar1 var2,detail
  • sum dage cith
  • sum dage cith, detail
  • bysort grad: sum dage cith, detail
sas bivariate statistics continuous variables
SAS: Bivariate statistics (continuous variables)
  • proc ttestdata=dataset;classgrpvar;varvar1 var2 var3;run;
  • proc npar1waydata=dataset;classgrpvar;varvar1 var2 var3;run;
sas bivariate statistics continuous variables26
SAS: Bivariate statistics (continuous variables)
  • procttest data=kidney;

class cens;

var cith;

run;

  • procnpar1way data=kidney;

class cens;

var cith;

run;

stata bivariate statistics continuous variables
STATA: Bivariate statistics (continuous variables)
  • ttestvar1, by(grpvar)
  • kwallisvar1, by(grpvar)
  • ttest cith, by(cens)
  • ttest cith, by(cens) unequal
  • kwallis cith, by(cens)
sas plots
SAS: Plots
  • proc gplotdata=dataset;plotyvar * xvar = grpvar;run; quit;
  • procgplot data=kidney;

plot dage*cith=cens;

run; quit;

stata plots
STATA: Plots
  • twoway (scatter yvarxvarifgrpvar==value, mcolor(color))
  • twoway (scatter dage cith if cens==0, ms(o) mcolor(red)) (lfit dage cith if cens==0, clcolor(red)) (scatter dage cith if cens==1, ms(o) mcolor(blue)) (lfit dage cith if cens==1, clcolor(blue)), legend(off)
choose a model
Choose a model
  • Right now, we assume that this assignment is driving toward a linear regression model. Just know that this may not always be appropriate in real-world situations.
sas linear model
SAS: Linear model
  • procregdata=dataset;modelyvar = x1x2x3;run; quit;
  • procreg data=kidney;

model cith=censdage;

run; quit;

stata linear model
STATA: Linear model
  • regress yvarx1x2x3
  • regress cith cens dage
sas stratified model
SAS: Stratified model
  • proc sortdata=dataset; by grpvar;run;procregdata=dataset;modelyvar = x1x2x3;bygrpvar;run; quit;

You must SORT by the grouping variable before you run the stratified model.

sas stratified model34
SAS: Stratified model
  • procsort data=kidney;

by cens;

run;

  • procreg data=kidney;

model cith=dage;

by cens;

run; quit;

stata stratified model
STATA: Stratified model
  • bysortgrpvar: regress yvarx1x2
  • bysort cens: regress cith dage
sas dummy encoded model
SAS: Dummy encoded model
  • proc regdata=dataset;modelyvar = x1x2x3z1z2;run; quit;
  • Note: “z” represents dummy-encoded variables
  • procreg data=kidney;

model cith = dage cens excel good fair;

run; quit;

Newly created dummy variables.

stata dummy encoded model
STATA: Dummy encoded model
  • regress yvarx1x2z1 z2
  • Note: “z” represents dummy-encoded variables
  • regress cith cens dage excel good fair

Newly created dummy variables.

sas interaction model
SAS: Interaction model
  • datadataset;setdataset;intnvar = x1 * x2;run;proc regdata=dataset;modelyvar = x1x2intnvar;run; quit;
sas interaction model39
SAS: Interaction model
  • data kidney;

set kidney;

d_c=dage*cens;

run;

  • procreg data=kidney;

model cith=dagecensd_c;

run;quit;

stata interaction model
STATA: Interaction model
  • gen intnvar = x1 * x2regressyvarx1x2intnvar
  • gen d_c=dage*cens

regress cith dage cens d_c

predicted mean differences
Predicted mean differences
  • Question:Observation 1 has “this” particular profile, and observation 2 has “that” particular profile. Is there a difference in their predicted mean response/outcome?
  • Example:Obs1: 56 years old and censoredObs2: 61 years old and censored
predicted mean differences42
Predicted mean differences
  • Strategy
    • Add observations with the specified covariate profiles with the outcome missing
    • Run the linear regression model and request the predicted outcome with standard error of the prediction
    • Look at the results
sas predicted mean differences
SAS: Predicted mean differences
  • Add observations
  • data profiles;

input dage cens;

cards;

56 0

61 0

;

run;

data kidney;

set kidney profiles;

run;

sas predicted mean differences44
SAS: Predicted mean differences
  • Analyze and request standard error of the prediction
  • procreg data=kidney;

model cith=dagecens;

output out=kidney_new p=ypredstdp=yprese;

run; quit;

  • Now if you open the “kidney_new” dataset, you can scroll down and view the predicted values and the standard error of the prediction
stata predicted mean differences
STATA: Predicted mean differences
  • Add observations
    • It’s probably easiest to do this using the data editor
    • Suppose our dataset has 100 observations:
  • set obs 146

replace dage=56 in 145

replace cens=0 in 145

replace dage=61 in 146

replace cens=0 in 146

stata predicted mean differences46
STATA: Predicted mean differences
  • Analyze and request the standard error of the prediction
    • regress cith cens dage
    • predict ypred
    • predict yprese, stdp
  • Now if you open the data browser, you can scroll down and view the predicted values and the standard error of the prediction