Statistics for medical researchers
Download
1 / 49

Statistics for Medical Researchers - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Statistics for Medical Researchers. Hongshik Ahn Professor Department of Applied Math and Statistics Stony Brook University Biostatistician, Stony Brook GCRC. Experimental Design Descriptive Statistics and Distributions Comparison of Means Comparison of Proportions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics for Medical Researchers' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics for medical researchers
Statistics for Medical Researchers

Hongshik Ahn

Professor

Department of Applied Math and Statistics

Stony Brook University

Biostatistician, Stony Brook GCRC


Contents

Experimental Design

Descriptive Statistics and Distributions

Comparison of Means

Comparison of Proportions

Power Analysis/Sample Size Calculation

Correlation and Regression

Contents


1 experimental design

Experiment

Treatment: something that researchers administer to experimental units

Factor: controlled independent variable whose levels are set by the experimenter

Experimental design

Control

Treatment

Placebo effect

Blind

single blind, double blind, triple blind

1. Experimental Design


1 experimental design1

Randomization

Completely randomized design

Randomized block design: if there are specific differences among groups of subjects

Permuted block randomization: used for small studies to maintain reasonably good balance among groups

Stratified block randomization: matching

1. Experimental Design


1 experimental design2

Completely randomized design

The computer generated sequence:

4,8,3,2,7,2,6,6,3,4,2,1,6,2,0,…….

Two Groups (criterion: even-odd):

AABABAAABAABAAA……

Three Groups:

(criterion:{1,2,3}~A, {4,5,6}~B, {7,8,9}~C; ignore 0’s)

BCAACABBABAABA……

Two Groups: different randomization ratios(eg.,2:3):

(criterion:{0,1,2,3}~A, {4,5,6,7,8,9}~B)

BBAABABBABAABAA……..

1. Experimental Design


1 experimental design3

Permuted block randomization

With a block size of 4 for two groups(A,B), there are 6

possible permutations and they can be coded as:

1=AABB, 2=ABAB, 3=ABBA, 4=BAAB, 5=BABA, 6=BBAA

Each number in the random number sequence in turn

selects the next block, determining the next four participant

allocations (ignoring numbers 0,7,8 and 9).

e.g., The sequence 67126814…. will produce BBAA AABB

ABAB BBAA AABB BAAB.

In practice, a block size of four is too small since

researchers may crack the code and risk selection bias.

Mixing block sizes of 4 and 6 is better with the size kept un known to the investigator.

1. Experimental Design


1 experimental design4

Methods of Sampling

Random sampling

Systematic sampling

Convenience sampling

Stratified sampling

1. Experimental Design


1 experimental design5

Random Sampling

Selection so that each individual member has an equal chance of being selected

Systematic Sampling

Select some starting point and then select every k th element in the population

1. Experimental Design


1 experimental design6

Convenience Sampling

Use results that are easy to get

1. Experimental Design


1 experimental design7

Stratified Sampling

Draw a sample from each stratum

1. Experimental Design


2 descriptive statistics distributions

Parameter: population quantity

Statistic: summary of the sample

Inference for parameters: use sample

Central Tendency

Mean (average)

Median (middle value)

Variability

Variance: measure of variation

Standard deviation (sd): square root of variance

Standard error (se): sd of the estimate

Median, quartiles, min., max, range, boxplot

Proportion

2. Descriptive Statistics & Distributions


2 descriptive statistics distributions1

Normal distribution

2. Descriptive Statistics & Distributions


2 descriptive statistics distributions2

Standard normal distribution:

Mean 0, variance 1

2. Descriptive Statistics & Distributions


2 descriptive statistics distributions3

Z-test for means

T-test for means if sd is unknown

2. Descriptive Statistics & Distributions


3 inference for means

Two-sample t-test

Two independent groups: Control and treatment

Continuous variables

Assumption: populations are normally distributed

Checking normality

Histogram

Normal probability curve (Q-Q plot): straight?

Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling test

If the normality assumption is violated

T-test is not appropriate.

Possible transformation

Use non-parametric alternative: Mann-Whitney U-test (Wilcoxon rank-sum test)

3. Inference for Means


3 inference for means1

A clinical trial on effectiveness of drug A in preventing premature birth

30 pregnant women are randomly assigned to control and treatment groups of size 15 each

Primary endpoint: weight of the babies at birth

TreatmentControl

n 15 15

mean 7.08 6.26

sd 0.90 0.96

3. Inference for Means


3 inference for means2

Hypothesis: The group means are different premature birth

Null hypothesis (Ho):1 = 2

Alternative hypothesis (H1):12

Significance level:  = 0.05

Assumption: Equal variance

Degrees of freedom (df):

Calculate the T-value (test statistic)

P-value: Type I error rate (false positive rate)

Reject Ho if p-value < 

Do not reject Ho if p-value > 

3. Inference for Means


3 inference for means3

Previous example: Test at premature birth

P-value: 0.026 < 0.05

Reject the null hypothesis that there is no drug effect.

3. Inference for Means


3 inference for means4

Confidence interval (CI): premature birth

An interval of values used to estimate the true value of a population parameter.

The probability 1-  that is the proportion of times that the CI actually contains the population parameter, assuming that the estimation process is repeated a large number of times.

Common choices: 90% CI ( = 10%), 95% CI ( = 5%), 99% CI ( = 1%)

3. Inference for Means


3 premature birth.Inference for Means

CI for a comparison of two means:

where

A 95% CI for the previous example:


3 inference for means5

SAS programming for Two-Sample T-test premature birth

Data steps :

Click ‘File’ Click ‘Import Data’ Select a data source

Click ‘Browse’ and find the path of the data file

Click ‘Next’ Fill the blank of ‘Member’ with the name of the SAS data set

Click ‘Finish’

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’

Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Hypothesis Tests’

Click ‘Two-Sample T-test for Means’

Select the independent variable as ‘Group’ and the dependent variable as

‘Dependent’ Choose the interested Hypothesis and Click ‘OK’

3. Inference for Means


3 inference for means6
3. premature birthInference for Means

Click ‘File’ to import data and create the SAS data set.

Click ‘Solution’to create a project to run statistical test

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.


3 inference for means7

Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) premature birth

Nonparametric alternative to two-sample t-test

The populations don’t need to be normal

H0: The two samples come from populations

with equal medians

H1: The two samples come from populations

with different medians

3. Inference for Means


3 inference for means8

Mann-Whitney U-Test Procedure premature birth

Temporarily combine the two samples into one big sample, then replace each sample value with its rank

Find the sum of the ranks for either one of the two samples

Calculate the value of the z test statistic

3. Inference for Means


3 inference for means9

Mann-Whitney U-Test, Example premature birth

Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7.

R1 and R2: sum of ranks

3. Inference for Means


3 inference for means10

Hypothesis: The group means are different premature birth

Ho: Men and women have same median BMI’s

H1: Men and women have different median BMI’s

p-value= 0.33, thus we do not reject H0 at =0.05.

There is no significant difference in BMI between men and women.

3. Inference for Means


3 inference for means11

SAS Programming for Mann-Whitney U-Test Procedure premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ ANOVA’

Click ‘Nonparametric One-Way ANOVA’

Select the ‘Dependent’ and ‘Independent’ variables respectively

and choose the interested test Click ‘OK’

3. Inference for Means


3 inference for means12
3. premature birthInference for Means

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.

Select the dependent and independentvariables:


3 inference for means13

Paired t-test premature birth

Mean difference of matched pairs

Test for changes (e.g., before & after)

The measures in each pair are correlated.

Assumption: population is normally distributed

Take the difference in each pair and perform one-sample t-test.

Check normality

If the normality assumption is viloated

T-test is not appropriate.

Use non-parametric alternative: Wilcoxon signed rank test

3. Inference for Means


3 inference for means14

Notation for paired t-test premature birth

d= individual difference between the two

values of a single matched pair

µd= mean value of the differences dfor the

population of paired data

= mean value of the differences dfor the paired sample data

sd= standard deviation of the differences dfor the paired sample data

n = number of pairs

3. Inference for Means


3 inference for means15

Example: Systolic Blood Pressure premature birth

OC:Oral contraceptive

3. Inference for Means


3 inference for means16

Hypothesis: The group means are different premature birth

Ho: vs. H1:

Significance level:  = 0.05

Degrees of freedom (df):

Test statistic

P-value: 0.009, thus reject Ho at =0.05

The data support the claim that oral contraceptives affect the systolic bp.

3. Inference for Means


3 inference for means17

Confidence interval for matched pairs premature birth

100(1-)% CI:

95% CI for the mean difference of the systolic bp:

 (1.53, 8.07)

3. Inference for Means


3 inference for means18

SAS Programming for Paired T-test premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Hypothesis tests’

Click ‘Two-Sample Paired T-test for means’

Select the ‘Group1’ and ‘Group2’ variables respectively

Click ‘OK’

(Note: You can also calculate the difference, and use it as the dependent variable to run the one-sample t-test)

3. Inference for Means


3 inference for means19
3. premature birthInference for Means

Click ‘File’ to open the SAS data set.

Click ‘Statistics’ to select the statistical procedure.

Put the two group variables into ‘Group 1’ and ‘Group 2’


3 inference for means20

Comparison of more than two means: premature birth

ANOVA (Analysis of Variance)

One-way ANOVA: One factor, eg., control, drug 1, drug 2

Two-way ANOVA: Two factors, eg., drugs, age groups

Repeated measures: If there is a repeated measures within subject such as time points

3. Inference for Means


3 inference for means21

Example: Pulmonary disease premature birth

Endpoint: Mid-expiratory flow (FEF) in L/s

6 groups: nonsmokers (NS), passive smokers (PS), noninhaling smokers (NI), light smokers (LS), moderate smokers (MS) and heavy smokers (HS)

3. Inference for means


3 inference for means22

Example: Pulmonary disease premature birth

Ho: group means are the same

H1: not all the groups means are the same

P-value<0.001

There is a significant difference in the mean FEF among the groups.

Comparison of specific groups: linear contrast

Multiple comparison: Bonferroni adjustment (/n)

3. Inference for means


3 inference for means23

SAS Programming for One-Way ANOVA premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ ANOVA’

Click ‘One-Way ANOVA’

Select the ‘Independent’ and ‘Dependent’ variables respectively

Click ‘OK’

3. Inference for Means


3 inference for means24
3. Inference for Means premature birth

Click ‘File’ to open the SAS data set.

Click ‘Solutions’ to select the statistical procedure.

Select the dependent and Independentvariables:


4 inference for proportions

Chi-square test premature birth

Testing difference of two proportions

n: #successes, p: success rate

Requirement: &

H0: p1 = p2

H1: p1 p2 (for two-sided test)

If the requirement is not satisfied, use Fisher’s exact test.

4. Inference for Proportions


5 power sample size calculation

Decide significance level (eg. 0.05) premature birth

Decide desired power (eg. 80%)

One-sided or two-sided test

Comparison of means: two-sample t-test

Need to know sample means in each group

Need to know sample sd’s in each group

Calculation: use software (Nquery, power, etc)

Comparison of proportions: Chi-square test

Need to know sample proportions in each group

Continuity correction

Small sample size: Fisher’s exact test

Calculation: use software

5. Power/Sample Size Calculation


6 correlation and regression

Correlation premature birth

Pearson correlation for continuous variables

Spearman correlation for ranked variables

Chi-square test for categorical variables

Pearson correlation

Correlation coefficient (r): -1<r<1

Test for coefficient: t-test

Larger sample  more significant for the same value of the correlation coefficient

Thus it is not meaningful to judge by the magnitude of the correlation coefficient.

Judge the significance of the correlation by p-value

6. Correlation and Regression


6 correlation and regression1

Regression premature birth

Objective

Find out whether a significant linear relationship exists between the response and independent variables

Use it to predict a future value

Notation

X: independent (predictor) variable

Y: dependent (response) variable

Multiple linear regression model

Where is the random error

Checking the model (assumption)

Normality: q-q plot, histogram, Shapiro-Wilk test

Equal variance: predicted y vs. error is a band shape

Linear relationship: predicted y vs. each x

6. Correlation and Regression


6 correlation and regression2
6. premature birthCorrelation and Regression


6 correlation and regression3

The regression equation is premature birth

The mean blood pressure increases by 1.08 if weight (x1) increases by one pound and age (x2) remains fixed. Similarly, a 1-year increase in age with the weight held fixed will increase the mean blood pressure by 0.425.

s=2.509 R2=95.8%

Error sd  is estimated as 2.509 with df=13-3=10

95.8% of the variation in y can be explained by the regression.

6. Correlation and Regression


6 correlation and regression4

SAS Programming for Linear Regression premature birth

Data steps :

The same as slide 21.

Procedure steps :

Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’

Click ‘File’ Click ‘Open By SAS Name’

Select the SAS data set and Click ‘OK’

Click ‘Statistics’ Click ‘ Regression’ Click ‘Linear’

Select the ‘Dependent’ (Response) variable and the ‘Explanatory’

(Predictor) variable respectively

Click ‘OK’

6. Correlation and Regression


6 correlation and regression5
6. premature birthCorrelation and Regression

Click ‘File’ to open the SAS data set.

Click ‘Solutions’ to select the statistical procedure.

Select the dependent and explanatory variables:


6 correlation and regression6

Other regression models premature birth

Polynomial regression

Transformation

Logistic regression

6. Correlation and Regression


ad