- 301 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Biostat 201: Winter 2011' - MikeCarlo

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Introduction

- Wendy Shihwendyshi@ucla.edu
- Office Hours:
- Tues 2-3pm or by appointment
- A1-228 or Biostat Consulting Room (two doors to the left of the Lab)

Access to SAS/STATA

- In the lab: login=sph, password=hello
- one year SAS student license
- Check with your department
- www.softwarecentral.ucla.edu
- Computers/laptops at UCLA library
- TLC lab at Biomed library
- STATA Only
- shortcut.clicc.ucla.edu

Typical lab session

- 4 assignments total
- Brief (very brief!) overview of the assignment
- Introduce statistical tools/methods that may be helpful with accompanying SAS/STATA code fragments
- Further discussion (time permitting)
- Go analyze!

Some additional notes

- Both SAS and STATA code will be introduced, but need only to know how to use one (so use whichever is most familiar to you)
- Code will not be given to you in electronic format
- Might want to bring a USB drive or have a way to save your documents
- No raw outputs from SAS or STATA. All submitted results must be formatted.

Please Do NOT Paste Raw Outputs

. tabstat dage, by(grad) stat(n mean semean min max)

Summary for variables: dage

by categories of: grad (Center Grade)

grad | N mean se(mean) min max

---------+--------------------------------------------------

excellen | 36 29.13889 1.993702 18 68

good | 36 30.27778 1.581446 18 60

fair | 36 37.13889 1.792911 18 55

poor | 36 37.97222 1.853134 19 69

---------+--------------------------------------------------

Total | 144 33.63194 .9552307 18 69

------------------------------------------------------------

The MEANS Procedure

Analysis Variable : dage

N N

grad Obs Miss Mean Std Error Minimum Maximum

-------------------------------------------------------------------------------------------

1 36 0 29.1388889 1.9937015 18.0000000 68.0000000

2 36 0 30.2777778 1.5814455 18.0000000 60.0000000

3 36 0 37.1388889 1.7929105 18.0000000 55.0000000

4 36 0 37.9722222 1.8531338 19.0000000 69.0000000

-------------------------------------------------------------------------------------------

Formatted Results

Table 1: Summary Statistics for Donor Age (Years) by Center Grades

The assignments

- All four assignments are reports, not problem sets
- Introduction
- Methods
- Results
- Can be submitted via e-mail as a Microsoft Word file
- E-mail: wendyshi@ucla.edu
- Subject: Biostat 201 W10 hw# Last First
- Filename: Biostat 201 W10 hw# Last First
- ex: Biostat 201 hw1 Shih Wendy

Assignment grades

- Graded on a 0.0 – 4.0 scale
- 0.0 to 1.9: major errors / misunderstandings
- 2.0 to 2.5: a few major or multiple minor errors
- 2.6 to 3.0: a few minor errors
- 3.1 to 3.5: good/excellent job
- 3.6 to 4.0: very impressive!

Assignment expectations

- Brief
- 2.5-3.5 pages (with tables and figures), 12pt, double-spaced is often sufficient
- Complete
- Requested analyses were performed and properly interpreted
- Logical
- Has an easy-to-follow flow
- Easy to see how the analyses guided each step of the investigation
- No ambiguity on what you were thinking

Common pitfalls

- Lack of explanation
- Why are you doing what you are doing?
- Example:
- We run a multivariate linear regression. (why?)
- We run a multivariate linear regression to evaluate the association between crime rate and depression while adjusting for socioeconomic factors. (ah, that’s better!)

Common pitfalls

- Lack of interpretation
- On what basis are you making your claims?
- Example:
- There is a significant difference between the IQ’s of UCLA and USC students. (what makes you say this?)
- The two-sample t-test result indicates that the SAT scores of UCLA and USC freshmen are statistically different (p=0.0032), with UCLA students having an average SAT score that is 220 points greater than USC students. (note: method used, measure used, statistical significance, magnitude, direction)

Common pitfalls

- Lack of follow-up
- How exactly did your findings guide you in your investigation?
- Example:
- A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. (How does this finding influence your analysis?)
- A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. Therefore, the association of SAT score and GPA among males and females were evaluated separately.

Questions to ask yourself

- What are you investigating?
- What analytical method are you using to investigate it?
- What do the results of that analysis tell you?
- How do those results guide your subsequent analyses, or what conclusions do you draw from it?

SAS/STATA code key

- I will use the following convention in these slides:
- statements: bold
- keywords: italics
- options: underlined
- Variables, or something you specify yourself: courier font

What do we need to do?

- Import data
- Summary statistics and plots
- Choose and specify a model
- Investigate if the model is appropriate
- Predicted mean differences for covariate profiles
- Conduct and interpret the model results

SAS: Importing data

- http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm
- http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm
- Can use import wizard:file import data…
- proc importout=datasetdatafile="directory_of_excel_file"dbms=excelreplace;sheet="sheet_name";run;

SAS: Importing data

- http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm
- http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm
- Can use import wizard:file import data…
- proc importout=hdldatadatafile="C:\SAS\data\hdltable.csv"dbms=csvreplace;sheet="sheet3";run;

STATA: Importing data

- http://www.ats.ucla.edu/stat/stata/faq/readcommatab.htm
- cd "directory_of_csv_file"
- insheetusingfile_name

Example: Kidney Data

SAS

proc import

datafile="G:\TA - Biostat 201 Winter 2011\KIDNEY.csv“

out=kidney

dbms=csv

replace;

run;

STATA

cd "G:\TA - Biostat 201 Winter 2011"

insheet using "KIDNEY.csv"

SAS: Summary statistics

- proc meansdata=dataset [options];varvar1 var2 var3;run;
- proc meansdata=dataset [options];classgrpvar;varvar1 var2 var3;run;
- proc univariatedata=dataset;varvar1 var2 var3;run;

SAS: Summary statistics

procmeans data=kidney nmiss mean stderr min max;

var dage cith;

run;

procmeans data=kidney nmiss mean stderr min max;

class grad;

var dage cith;

run;

procunivariate data=kidney;

var dage cith;

run;

procunivariate data=kidney;

class grad;

var dage cith;

run;

STATA: Summary statistics

- summarizevar1 var2
- bysort grpvar: summarizevar1 var2
- summarizevar1 var2,detail
- sum dage cith
- sum dage cith, detail
- bysort grad: sum dage cith, detail

SAS: Bivariate statistics (continuous variables)

- proc ttestdata=dataset;classgrpvar;varvar1 var2 var3;run;
- proc npar1waydata=dataset;classgrpvar;varvar1 var2 var3;run;

SAS: Bivariate statistics (continuous variables)

- procttest data=kidney;

class cens;

var cith;

run;

- procnpar1way data=kidney;

class cens;

var cith;

run;

STATA: Bivariate statistics (continuous variables)

- ttestvar1, by(grpvar)
- kwallisvar1, by(grpvar)
- ttest cith, by(cens)
- ttest cith, by(cens) unequal
- kwallis cith, by(cens)

SAS: Plots

- proc gplotdata=dataset;plotyvar * xvar = grpvar;run; quit;
- procgplot data=kidney;

plot dage*cith=cens;

run; quit;

STATA: Plots

- twoway (scatter yvarxvarifgrpvar==value, mcolor(color))
- twoway (scatter dage cith if cens==0, ms(o) mcolor(red)) (lfit dage cith if cens==0, clcolor(red)) (scatter dage cith if cens==1, ms(o) mcolor(blue)) (lfit dage cith if cens==1, clcolor(blue)), legend(off)

Choose a model

- Right now, we assume that this assignment is driving toward a linear regression model. Just know that this may not always be appropriate in real-world situations.

SAS: Linear model

- procregdata=dataset;modelyvar = x1x2x3;run; quit;
- procreg data=kidney;

model cith=censdage;

run; quit;

STATA: Linear model

- regress yvarx1x2x3
- regress cith cens dage

SAS: Stratified model

- proc sortdata=dataset; by grpvar;run;procregdata=dataset;modelyvar = x1x2x3;bygrpvar;run; quit;

You must SORT by the grouping variable before you run the stratified model.

SAS: Stratified model

- procsort data=kidney;

by cens;

run;

- procreg data=kidney;

model cith=dage;

by cens;

run; quit;

STATA: Stratified model

- bysortgrpvar: regress yvarx1x2
- bysort cens: regress cith dage

SAS: Dummy encoded model

- proc regdata=dataset;modelyvar = x1x2x3z1z2;run; quit;
- Note: “z” represents dummy-encoded variables
- procreg data=kidney;

model cith = dage cens excel good fair;

run; quit;

Newly created dummy variables.

STATA: Dummy encoded model

- regress yvarx1x2z1 z2
- Note: “z” represents dummy-encoded variables
- regress cith cens dage excel good fair

Newly created dummy variables.

SAS: Interaction model

- datadataset;setdataset;intnvar = x1 * x2;run;proc regdata=dataset;modelyvar = x1x2intnvar;run; quit;

SAS: Interaction model

- data kidney;

set kidney;

d_c=dage*cens;

run;

- procreg data=kidney;

model cith=dagecensd_c;

run;quit;

STATA: Interaction model

- gen intnvar = x1 * x2regressyvarx1x2intnvar
- gen d_c=dage*cens

regress cith dage cens d_c

Predicted mean differences

- Question:Observation 1 has “this” particular profile, and observation 2 has “that” particular profile. Is there a difference in their predicted mean response/outcome?
- Example:Obs1: 56 years old and censoredObs2: 61 years old and censored

Predicted mean differences

- Strategy
- Add observations with the specified covariate profiles with the outcome missing
- Run the linear regression model and request the predicted outcome with standard error of the prediction
- Look at the results

SAS: Predicted mean differences

- Add observations
- data profiles;

input dage cens;

cards;

56 0

61 0

;

run;

data kidney;

set kidney profiles;

run;

SAS: Predicted mean differences

- Analyze and request standard error of the prediction
- procreg data=kidney;

model cith=dagecens;

output out=kidney_new p=ypredstdp=yprese;

run; quit;

- Now if you open the “kidney_new” dataset, you can scroll down and view the predicted values and the standard error of the prediction

STATA: Predicted mean differences

- Add observations
- It’s probably easiest to do this using the data editor
- Suppose our dataset has 100 observations:
- set obs 146

replace dage=56 in 145

replace cens=0 in 145

replace dage=61 in 146

replace cens=0 in 146

STATA: Predicted mean differences

- Analyze and request the standard error of the prediction
- regress cith cens dage
- predict ypred
- predict yprese, stdp
- Now if you open the data browser, you can scroll down and view the predicted values and the standard error of the prediction

Download Presentation

Connecting to Server..