Loading in 5 sec....

Statistics for Medical ResearchersPowerPoint Presentation

Statistics for Medical Researchers

- By
**Jims** - Follow User

- 95 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Statistics for Medical Researchers' - Jims

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Statistics for Medical Researchers

Hongshik Ahn

Professor

Department of Applied Math and Statistics

Stony Brook University

Biostatistician, Stony Brook GCRC

Descriptive Statistics and Distributions

Comparison of Means

Comparison of Proportions

Power Analysis/Sample Size Calculation

Correlation and Regression

ContentsTreatment: something that researchers administer to experimental units

Factor: controlled independent variable whose levels are set by the experimenter

Experimental design

Control

Treatment

Placebo effect

Blind

single blind, double blind, triple blind

1. Experimental DesignCompletely randomized design

Randomized block design: if there are specific differences among groups of subjects

Permuted block randomization: used for small studies to maintain reasonably good balance among groups

Stratified block randomization: matching

1. Experimental DesignThe computer generated sequence:

4,8,3,2,7,2,6,6,3,4,2,1,6,2,0,…….

Two Groups (criterion: even-odd):

AABABAAABAABAAA……

Three Groups:

(criterion:{1,2,3}~A, {4,5,6}~B, {7,8,9}~C; ignore 0’s)

BCAACABBABAABA……

Two Groups: different randomization ratios(eg.,2:3):

(criterion:{0,1,2,3}~A, {4,5,6,7,8,9}~B)

BBAABABBABAABAA……..

1. Experimental DesignWith a block size of 4 for two groups(A,B), there are 6

possible permutations and they can be coded as:

1=AABB, 2=ABAB, 3=ABBA, 4=BAAB, 5=BABA, 6=BBAA

Each number in the random number sequence in turn

selects the next block, determining the next four participant

allocations (ignoring numbers 0,7,8 and 9).

e.g., The sequence 67126814…. will produce BBAA AABB

ABAB BBAA AABB BAAB.

In practice, a block size of four is too small since

researchers may crack the code and risk selection bias.

Mixing block sizes of 4 and 6 is better with the size kept un known to the investigator.

1. Experimental DesignRandom sampling

Systematic sampling

Convenience sampling

Stratified sampling

1. Experimental DesignSelection so that each individual member has an equal chance of being selected

Systematic Sampling

Select some starting point and then select every k th element in the population

1. Experimental DesignParameter: population quantity

Statistic: summary of the sample

Inference for parameters: use sample

Central Tendency

Mean (average)

Median (middle value)

Variability

Variance: measure of variation

Standard deviation (sd): square root of variance

Standard error (se): sd of the estimate

Median, quartiles, min., max, range, boxplot

Proportion

2. Descriptive Statistics & Distributions2. Descriptive Statistics & Distributions

Two independent groups: Control and treatment

Continuous variables

Assumption: populations are normally distributed

Checking normality

Histogram

Normal probability curve (Q-Q plot): straight?

Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling test

If the normality assumption is violated

T-test is not appropriate.

Possible transformation

Use non-parametric alternative: Mann-Whitney U-test (Wilcoxon rank-sum test)

3. Inference for MeansA clinical trial on effectiveness of drug A in preventing premature birth

30 pregnant women are randomly assigned to control and treatment groups of size 15 each

Primary endpoint: weight of the babies at birth

TreatmentControl

n 15 15

mean 7.08 6.26

sd 0.90 0.96

3. Inference for MeansHypothesis: The group means are different premature birth

Null hypothesis (Ho):1 = 2

Alternative hypothesis (H1):12

Significance level: = 0.05

Assumption: Equal variance

Degrees of freedom (df):

Calculate the T-value (test statistic)

P-value: Type I error rate (false positive rate)

Reject Ho if p-value <

Do not reject Ho if p-value >

3. Inference for MeansPrevious example: Test at premature birth

P-value: 0.026 < 0.05

Reject the null hypothesis that there is no drug effect.

3. Inference for MeansConfidence interval (CI): premature birth

An interval of values used to estimate the true value of a population parameter.

The probability 1- that is the proportion of times that the CI actually contains the population parameter, assuming that the estimation process is repeated a large number of times.

Common choices: 90% CI ( = 10%), 95% CI ( = 5%), 99% CI ( = 1%)

3. Inference for Means3 premature birth.Inference for Means

CI for a comparison of two means:

where

A 95% CI for the previous example:

SAS programming for Two-Sample T-test premature birth

Data steps :

Click ‘File’ Click ‘Import Data’ Select a data source

Click ‘Browse’ and find the path of the data file

Click ‘Next’ Fill the blank of ‘Member’ with the name of the SAS data set

Click ‘Finish’

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’

Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Hypothesis Tests’

Click ‘Two-Sample T-test for Means’

Select the independent variable as ‘Group’ and the dependent variable as

‘Dependent’ Choose the interested Hypothesis and Click ‘OK’

3. Inference for Means3. premature birthInference for Means

Click ‘File’ to import data and create the SAS data set.

Click ‘Solution’to create a project to run statistical test

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.

Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) premature birth

Nonparametric alternative to two-sample t-test

The populations don’t need to be normal

H0: The two samples come from populations

with equal medians

H1: The two samples come from populations

with different medians

3. Inference for MeansMann-Whitney U-Test Procedure premature birth

Temporarily combine the two samples into one big sample, then replace each sample value with its rank

Find the sum of the ranks for either one of the two samples

Calculate the value of the z test statistic

3. Inference for MeansMann-Whitney U-Test, Example premature birth

Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7.

R1 and R2: sum of ranks

3. Inference for MeansHypothesis: The group means are different premature birth

Ho: Men and women have same median BMI’s

H1: Men and women have different median BMI’s

p-value= 0.33, thus we do not reject H0 at =0.05.

There is no significant difference in BMI between men and women.

3. Inference for MeansSAS Programming for Mann-Whitney U-Test Procedure premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ ANOVA’

Click ‘Nonparametric One-Way ANOVA’

Select the ‘Dependent’ and ‘Independent’ variables respectively

and choose the interested test Click ‘OK’

3. Inference for Means3. premature birthInference for Means

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.

Select the dependent and independentvariables:

Paired t-test premature birth

Mean difference of matched pairs

Test for changes (e.g., before & after)

The measures in each pair are correlated.

Assumption: population is normally distributed

Take the difference in each pair and perform one-sample t-test.

Check normality

If the normality assumption is viloated

T-test is not appropriate.

Use non-parametric alternative: Wilcoxon signed rank test

3. Inference for MeansNotation for paired t-test premature birth

d= individual difference between the two

values of a single matched pair

µd= mean value of the differences dfor the

population of paired data

= mean value of the differences dfor the paired sample data

sd= standard deviation of the differences dfor the paired sample data

n = number of pairs

3. Inference for MeansHypothesis: The group means are different premature birth

Ho: vs. H1:

Significance level: = 0.05

Degrees of freedom (df):

Test statistic

P-value: 0.009, thus reject Ho at =0.05

The data support the claim that oral contraceptives affect the systolic bp.

3. Inference for MeansConfidence interval for matched pairs premature birth

100(1-)% CI:

95% CI for the mean difference of the systolic bp:

(1.53, 8.07)

3. Inference for MeansSAS Programming for Paired T-test premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Hypothesis tests’

Click ‘Two-Sample Paired T-test for means’

Select the ‘Group1’ and ‘Group2’ variables respectively

Click ‘OK’

(Note: You can also calculate the difference, and use it as the dependent variable to run the one-sample t-test)

3. Inference for Means3. premature birthInference for Means

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.

Put the two group variables into ‘Group 1’ and ‘Group 2’

Comparison of more than two means: premature birth

ANOVA (Analysis of Variance)

One-way ANOVA: One factor, eg., control, drug 1, drug 2

Two-way ANOVA: Two factors, eg., drugs, age groups

Repeated measures: If there is a repeated measures within subject such as time points

3. Inference for MeansExample: Pulmonary disease premature birth

Endpoint: Mid-expiratory flow (FEF) in L/s

6 groups: nonsmokers (NS), passive smokers (PS), noninhaling smokers (NI), light smokers (LS), moderate smokers (MS) and heavy smokers (HS)

3. Inference for meansExample: Pulmonary disease premature birth

Ho: group means are the same

H1: not all the groups means are the same

P-value<0.001

There is a significant difference in the mean FEF among the groups.

Comparison of specific groups: linear contrast

Multiple comparison: Bonferroni adjustment (/n)

3. Inference for meansSAS Programming for One-Way ANOVA premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ ANOVA’

Click ‘One-Way ANOVA’

Select the ‘Independent’ and ‘Dependent’ variables respectively

Click ‘OK’

3. Inference for Means3. Inference for Means premature birth

Click ‘File’ to open the SAS data set.

Click ‘Solutions’ to select the statistical procedure.

Select the dependent and Independentvariables:

Chi-square test premature birth

Testing difference of two proportions

n: #successes, p: success rate

Requirement: &

H0: p1 = p2

H1: p1 p2 (for two-sided test)

If the requirement is not satisfied, use Fisher’s exact test.

4. Inference for ProportionsDecide significance level (eg. 0.05) premature birth

Decide desired power (eg. 80%)

One-sided or two-sided test

Comparison of means: two-sample t-test

Need to know sample means in each group

Need to know sample sd’s in each group

Calculation: use software (Nquery, power, etc)

Comparison of proportions: Chi-square test

Need to know sample proportions in each group

Continuity correction

Small sample size: Fisher’s exact test

Calculation: use software

5. Power/Sample Size CalculationCorrelation premature birth

Pearson correlation for continuous variables

Spearman correlation for ranked variables

Chi-square test for categorical variables

Pearson correlation

Correlation coefficient (r): -1<r<1

Test for coefficient: t-test

Larger sample more significant for the same value of the correlation coefficient

Thus it is not meaningful to judge by the magnitude of the correlation coefficient.

Judge the significance of the correlation by p-value

6. Correlation and RegressionRegression premature birth

Objective

Find out whether a significant linear relationship exists between the response and independent variables

Use it to predict a future value

Notation

X: independent (predictor) variable

Y: dependent (response) variable

Multiple linear regression model

Where is the random error

Checking the model (assumption)

Normality: q-q plot, histogram, Shapiro-Wilk test

Equal variance: predicted y vs. error is a band shape

Linear relationship: predicted y vs. each x

6. Correlation and Regression6. premature birthCorrelation and Regression

The regression equation is premature birth

The mean blood pressure increases by 1.08 if weight (x1) increases by one pound and age (x2) remains fixed. Similarly, a 1-year increase in age with the weight held fixed will increase the mean blood pressure by 0.425.

s=2.509 R2=95.8%

Error sd is estimated as 2.509 with df=13-3=10

95.8% of the variation in y can be explained by the regression.

6. Correlation and RegressionSAS Programming for Linear Regression premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Regression’ Click ‘Linear’

Select the ‘Dependent’ (Response) variable and the ‘Explanatory’

(Predictor) variable respectively

Click ‘OK’

6. Correlation and Regression6. premature birthCorrelation and Regression

Click ‘File’ to open the SAS data set.

Click ‘Solutions’ to select the statistical procedure.

Select the dependent and explanatory variables:

Other regression models premature birth

Polynomial regression

Transformation

Logistic regression

6. Correlation and Regression
Download Presentation

Connecting to Server..