Mathematical description of the relationship between two variables using regression
Download
1 / 45

Mathematical description of the relationship between two variables using regression - PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on

Mathematical description of the relationship between two variables using regression. Lecture 13 BIOL2608 Biometrics. Regression Analysis. A simple mathematical expression to provide an estimate of one variable from another

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mathematical description of the relationship between two variables using regression' - kennedy-pitts


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mathematical description of the relationship between two variables using regression

Mathematical description of the relationship between two variables using regression

Lecture 13

BIOL2608 Biometrics


Regression analysis
Regression Analysis variables using regression

  • A simple mathematical expression to provide an estimate of one variable from another

  • It is possible to predict the likely outcome of events given sufficient quantitative knowledge of the processes involved


Regression models
Regression models variables using regression

y

x

  • Model 1

    • “Controlled” parameter (Independent variable) vs. Measured parameter (dependent variable)

    • Independent variable (on x-axis) must be measured with a high degree of accuracy & is not subjected to random variation

    • Other inferential factors must be kept constant

    • Dependent variable (on y-axis) may vary randomly and its ‘error’ should follow a normal distribution


Normally distributed populations of y values variables using regression

Y

X

  • The population of y values is normally distributed &

  • The variances of different population of y values corresponding to different individual x values are similar


Regression models1
Regression models variables using regression

x2

x1

  • Model 2

    • Both measured parameters (x1 & x2 not x & y) which cannot be controlled

    • Both are subject to random variation & called random-effects factors

    • Common in field studies where conditions are difficult to control

    • Correlation rather than regression, required for bivariately normal distributions

    • e.g. measurements of human arm and leg lengths


Example for model 1
Example for model 1 variables using regression

  • Study the rate of disappearance of a pesticide in a seawater sample

    • Time (independent) vs. Concentration (dependent)

    • Other factors such as pH, salinity must be kept constant

  • Study the growth rate of fish at different fixed water temperature

    • Temp (independent) vs. Growth rate (dependent)

    • Other factors such as diet, feeding frequency must be kept constant


Y variables using regression

x, y

c

d

a

X

Model: y = a + bx

Slope: coefficient b = c/d

Intercept: coefficient a


b= -ve variables using regression

b= +ve

Y

b= 0

X


Wing lengths of 13 sparrows of various age variables using regression


The concept of least squares variables using regression

Sum of di2 indicates the deviations of the points from the regression line

Best fit line is achieved with minimum sum of square deviations (di2)

d


Calculation for a regression
Calculation for a regression variables using regression

  • y = a + bx

  • b = [xy – (xy/n)]/ [x2 – (x)2/n]

  • a = y – bx

    b = [514.8-(130)(44.4)/13]/[1562 – (130)2/13]

    b = 0.720 cm/day

    a = 3.4 – (0.720)(10) = 0.715 cm

    The simple linear regression equation is

    Y = 0.715 + 0.270X

^


Normally distributed populations of y values variables using regression

Y

X

  • The population of y values is normally distributed &

  • The variancesof different population of y values corresponding to different individual x values are similar


^ variables using regression

Residual = y – y = y – (a + bx)

+

+

0

0

-

-

+

+

0

0

-

-


Testing the significance of a regression
Testing the significance of a regression variables using regression

Source of variation Sum of squares (SS) DF Mean square (MS)

Total (Yi- Y) Yi2 – (Yi)2/n n –1

Linear regression [XiYi - XiYi/n]2 1 regression SS

(Yi – Y)  Xi2 – (Xi)2/n regression DF

Residual (Yi – Yi) total SS – regression SS n – 2 residual SS

residual DF

^

^

Compute F = regression MS/ residual MS, the critical F(1), 1, (n-2)

Coefficient of determination r2 = regression SS/total SS

r2 indicates the proportion (or %) of the total variation in Y that is explained or accounted by fitted the regression:

e.g. r2 = 0.81 i.e. the regression can explain 81% of the total variation


Anova testing of ho b 0
ANOVA testing of Ho: b = 0 variables using regression

Total SS = 171.3 – (44.4)2/13 = 19.66

Regression SS

= [514.8-(130)(44.4)/13]2/[1562 – (130)2/13] = 19.13

Residual SS = 19.66 – 19 .13 = 0.53

Total DF = n –1 = 12

Source of variation SS DF MS F Crit F P

Total 19.66 12

Linear regression 19.13 1 19.13 401.1 4.84 <0.001

Residual 0.53 11 0.048

Thus, reject Ho. r2 = 19.13/19.66 = 0.97

=(Sy·x)2


How to report the results in texts
How to report the results in texts? variables using regression

Source of variation SS DF MS F Crit F P

Total 19.66 12

Linear regression 19.13 1 19.13 401.1 4.84 <0.001

Residual 0.53 11 0.048

Thus, reject Ho. r2 = 19.13/19.66 = 0.97

The wing length of the sparrow significantly increases with increasing age (F1, 11 = 401.1, r2 = 0.97, p < 0.001). 97% of the total variance in the wing length data can be explained by age.


Confidence intervals of the regression coefficients
Confidence intervals of the regression coefficients variables using regression

  • Assume that the slopes b of a regression are normally distributed, then it is possible to fit confidence intervals (CI) to the slope:

  • 95% CI = b  t (2), (n-2) Sb

    where Sb = (Sy·x)2/( Xi2 – (Xi)2/n)

  • CI for the interceptYi = Yi t (2), (n-2) SYi

    where SYi = (Sy·x)2[1/n + (Xi – X)2/( Xi2 – (Xi)2/n)]


Standard errors of predicted values of y follow the same example
Standard errors of predicted values of Y variables using regression(follow the same example)

^

Y = 0.715 + 0.270X, mean = 10

If x = 13 days, then y = 4.225 cm

  • SYi = (Sy·x)2[1/n + (Xi – X)2/( Xi2 – (Xi)2/n)]

    = (0.0477)[1/13 + (13 –10)2/(1562 – 1302/13)]

    = 0.073 cm

  • CI for the intercept Yi = Yi t 0.05(2), 11 SYi

    = 4.225  (2.201)(0.073)

    = 4.225  0.161 cm


Inverse prediction follow the same example
Inverse prediction ( variables using regressionfollow the same example)

^

Y = 0.715 + 0.270X and mean Y = 3.415, if Yi = 4.5, then

X = (Yi – 0.715)/0.270 = 14.019 days

To compute 95% CI:

t 0.05(2), 11 = 2.201

K = b2 –t2Sb2 = 0.2702 – (2.201)2[(Sy·x)2/( Xi2 – (Xi)2/n)]

= 0.2702 – (2.201)2[(0.0477)/(262)] = 0.0720

95% CI:

=X + b(Yi – Y)/K  (t/K)(Sy·x)2{[(Yi – Y)2/( Xi2 – (Xi)2/n)]+K(1 + 1/n)}

=14.069 (0.270/0.072) (0.0477){[(4.5 – 3.415)2/262]+ 0.072(1 + 1/13)}

= 14.069  1.912 days

Lower limit = 12.157 days; Higher limit = 15.981

Important for toxicity test


Regression with replication variables using regression

See p. 345 – 357, example 17.8 (Zar, 1999)


Example 17.8 variables using regression

Total SS (DF = 20 –1)

= 383346 – (2744)2/20 = 6869.2

Regression SS (DF = 1)

= [149240-(1050)(2744)/20]2/[59100 – (1050)2/20] = (5180)2/3975 = 6751.29

Among-groups SS (DF = k –1 = 5 – 1 = 4)

= 383228.73 – (2744)2/20 = 6751.93

Within groups SS (DF = total – among-groups = 19 – 4 = 15)

= 6869.2 – 6751.93 = 117.27

Deviations-from-linearity SS (DF = among-groups – regression = 4 –1 = 3)

= 6751.93 – 6751.29 = 1.64

b = [149240-(1050)(2744)/20]/ [59100 – (1050)2/20] = 5180/3975 = 1.303 mm Hg/yr

Mean x = 52.5; mean y = 137.2

a = 137.2 – (1.303)(52.5) = 68.79 mm Hg

Y = 68.79 + 1.303x


Ho: The population regression is linear. variables using regression

Source of variation SS DF MS F P

Total 6869.2 19

Among groups 6751.93 4

Linear regression 6750.29 1

Deviations from linearity 1.64 3 0.55 0.07 >0.25

Within groups 117.27 15 7.82

Thus, accept Ho.

Ho:  = 0

Source of variation SS DF MS F P

Total 6869.2 19

Linear regression 6750.29 1 6750.29 1021.2 <0.001

Within groups 118.91 18 6.61

Thus, reject Ho. r2 = 6750.29/6869.2 = 0.98


Model 2 regression
Model 2 regression variables using regression

  • The regression coefficient b’ (b prime) is obtained as Sx1/Sx2

  • e.g. field observations of PCB (polychlorinated biphenyl) concentration per gram of fish and compared with the kidney biomass


b’ = 1.2074/2.2033 = -0.548 variables using regression

(by inspection of the scatter-graph)

a’ = mean X1 – b(mean X2)

= 9.739 – 0.548(5.835) = 12.94

X1 = 12.94 – 0.548X2


Key notes
Key notes variables using regression

  • There are two regression models

  • Model 1 regression is used when one of the variable is fixed so that it is measured with negligible error

  • In model 1, the fixed variable is called the independent variable x and the dependent variable is designated y.

  • The regression equation is written y = a + bx where a and b are the regression coefficients

  • Model 2 regression is used where neither variable is fixed and both are measured with error.


Comparing simple linear regression equations multiple regression analysis

Comparing simple linear regression equations & variables using regressionMultiple regression analysis


Comparing two slopes
Comparing Two Slopes variables using regression

  • Use of Student’s t

    t = (b1 – b2) / Sb1-b2

    Sb1-b2 = (SY·X)p2/(x2)1 + (SY·X)p2/(x2)2

Y

Y

Y

X

X

X


Comparing two slopes1
Comparing Two Slopes variables using regression

  • (SY·X)p2= (residual SS)1 + (residual SS)2

  • (residual DF)1 + (residual DF)2

  • Critical t : DF = n1 + n2 – 4

  • Test Ho: Equal slopes

Y

Y

Y

X

X

X


Comparing two slopes example

b = 2.97 variables using regression

Y

Comparing Two Slopes - Example

 x2 =  Xi2 – ( Xi)2/n

 xy= XiYi – ( XiYi/n)

 y2 = Yi2 – ( Yi)2/n

b = 2.17

X

  • For sample 1: Temperature (C) For sample 2: Volumes (ml)

  •  x2 = 1470.8712  x2 = 2272.4750

  •  xy=4363.1627  xy= 4928.8100

  • y2 =13299.5296  y2 =10964.0947

    n = 26 n = 30

    b =  xy/  x2

    b = 4363.1627/1470.8712 = 2.97 b = 4928.8100/2272.4750 = 2.17

    Residual SS = RSS =  y2 –( xy)2/ x2

    = 13299.5296 – (4363.1627)2/1470.8712 = 10964.0947 – (4928.81)2/2272.475

    = 356.7317 = 273.9142

    Residual DF = RDF = n –2 = 26 – 2 = 30 - 2 = 28

    (SY·X)p2 = (RSS1 + RSS2)/(RDF1 + RDF2)

    = (356.7317+ 273.9142)/(24+28) = 12.1278


(S variables using regressionY·X)p2 = (RSS1 + RSS2)/(RDF1 + RDF2)

= (356.7317+ 273.9142)/(24+28) = 12.1278

Sb1-b2 = (SY·X)p2/(x2)1 + (SY·X)p2/(x2)2

= (12.1278)/(1470.8712) + (12.1278)/(2272.4750)

= 0.1165

t = (2.97 – 2.17) / 0.1165 = 6.867 (p < 0.001)

DF = 24 + 28 = 52, Critical t 0.05(2), 52 = 2.007

Reject Ho.

Therefore, there is a significant difference between the two slopes (t 0.05(2), 52 = 6.867, p < 0.001)

b = 2.97

Y

b = 2.17

X


b = 2.97 variables using regression

Y

Testing for difference between points on the two nonparallel regression lines. We are testing whether the volumes (Y) are different in the two groups at X = 12: Ho: same value

Further we need to know

A1 = 10.57 and a2 = 24.91;

Mean X1 = 22.93 and mean X2 = 18.95

Then, Y = a + bX

Estimated Y1 = 46.21; Y2 = 50.95

SY1-Y2 = (SY·X)p2/[1/n1 + 1/n2 + (X – X1)2/(x2)1 +(X – X2)2/(x2)2 ]

= (12.1278)/(1/26)+(1/30)+(12-22.93)2/(1470.8712) +

(12 – 18.95)2/(2272.4750)

= 1.45 ml

t = (46.21 – 50.95) / 1.45 = -3.269 (0.001< p < 0.002)

DF = 26 + 30 - 4 = 52, Critical t 0.05(2), 52 = 2.007

Reject Ho.

b = 2.17

X


Comparing two elevations
Comparing Two Elevations variables using regression

Y

  • If there is no significant difference between the slopes, you are required to compare the two elevations

X

For Common regression:

Sum of squares of X = Ac = ( x2)1 + ( x2)2

Sum of cross-products = Bc = ( xy)1 + ( xy)2

Sum of squares of Y = Cc = ( y2)1 + ( y2)2

Residual SSc = Cc – Bc2/ Ac

Residual DFc = n1 + n2 – 3

Residual MS = SSc/DFc


Comparing two elevations1
Comparing Two Elevations variables using regression

For Common regression:

Sum of squares of X = Ac = ( x2)1 + ( x2)2

Sum of cross-products = Bc = ( xy)1 + ( xy)2

Sum of squares of Y = Cc = ( y2)1 + ( y2)2

Residual SSc = Cc – Bc2/ Ac

Residual DFc = n1 + n2 – 3

Residual MS = (SYX)2c = SSc/DFc

bc = Bc/Ac

t = (Y1 – Y2) – bc(X1 - X2)

(SYX)2c [1/n1 + 1/n2 + (X1 - X2)2/Ac]

Y

X


Comparing slopes and elevations an example
Comparing slopes and elevations: An example variables using regression

For sample 1: For sample 2:

 x2 = 1012.1923  x2 = 1659.4333

 xy= 1585.3385  xy= 2475.4333

  • y2 =2618.3077  y2 = 3848.9333

    n = 13 n = 15

    Mean X = 54.65 Mean X = 56.93

    Mean Y = 170.23 Mean Y = 162.93

    b =  xy/  x2

    b = 1.57 b = 1.49

    RSS =  y2 –( xy)2/ x2

    = 136.2230 = 156.2449

    RDF = n –2 = 11 = 13

    (SY·X)p2 = (RSS1 + RSS2)/(RDF1 + RDF2) = 12.1862

    Test Ho: equal slopes

    DF for t test = 11 +13 = 24

    Sb1-b2 = 0.1392

    t = (1.57 – 1.49)/0.1392 = 0.575 < Critical t 0.05(2), 24 = 2.064, p > 0.50

    Accept Ho

Y

X


Comparing slopes and elevations an example1
Comparing slopes and elevations: An example variables using regression

For sample 1: For sample 2:

 x2 = 1012.1923  x2 = 1659.4333 Ac = 2671.6256

 xy= 1585.3385  xy= 2475.4333 Bc = 4060.7718

  • y2 =2618.3077  y2 = 3848.9333 Cc = 6467.2410

    n = 13 n = 15

    Mean X = 54.65 Mean X = 56.93

    Mean Y = 170.23 Mean Y = 162.93

    b = 1.57 b = 1.49

    bc = 4060.7718/ 2671.6256 = 1.520

    SSc = 6467.2410 – (4060.7718)2/2671.6256 = 295.0185

    DFc = 13 + 15 – 3 = 25

    Residual MS = (SYX)2c = SSc/DFc = 11.8007

    Test Ho: equal elevation

    t = (170.23 – 162.93) – 1520(54.65-56.93)/

    11.8007[1/13 + 1/15 + (54.65 – 56.93)2/2671.6256] = 8.218

    > Critical t 0.05(2), 25 = 2.060, p < 0.001

    Reject Ho

Y

X


Comparing more than two slopes
Comparing more than two slopes variables using regression

  • See Section 18.4 (Zar, 1999) for details

  • You can also perform an Analysis of Covariance (ANCOVA) using SPSS


Ancova example
ANCOVA - example variables using regression

  • Covariate MUST be a continuous ratio or interval scale (e.g. Temp)

  • Dependent variable(s) is/are dependent on the covariate (increase or decrease)

B

A


Ancova example1
ANCOVA - example variables using regression

  • Plot the figure

  • Test Ho: equal slope

  • Then Ho: equal elevation

B

A


Ancova in spss
ANCOVA in SPSS variables using regression

  • Heart beat as dependent variable

  • Temp as covariate

  • Species as fixed factor

  • Test Ho: equal slope using Model with an interaction: Species x Temp

  • Sig. interaction indicates different slopes

  • No sig. Interaction, then remove this term from the Model: Sig. Species effect  different elevations

B

A


Double log transformation variables using regression

Log Y

Y

X

Log X

Y = a + bx

Then log Y = k + m log X

log Y = log (10k) + log Xm

log Y = log (10k)Xm

Y = (10k) Xm

Y = CXm


Double log transformation variables using regression

wt

wt

length

length

Then ln W = a + b ln L (natural log can also be used)

W= (expa) Lb

Similarly, physiological response (e.g. E = ammonia excretion rate) vs. body wt:

E = expaW-b


Multiple regression analysis
Multiple regression analysis variables using regression

  • Y = a + bX

    Models:

  • Y = a + b1X1 + b2X2

  • Y = a + b1X1 + b2X2 + b3X3

  • Y = a + b1X1 + b2X2 + b3X3 + b4X4

  • Y = a + biXi

  • Similar to simple linear regression, there is only one effect or dependent variable (Y). However, there are several cause or independent variables chosen by the experimenter.


Multiple regression analysis1
Multiple regression analysis variables using regression

  • The same assumptions as for linear regression apply so each of the cause (independent) variables must be measured without error.

  • ANDeach of these cause variables must be independent of each other

  • If not independent, partial correlation should be used.


Multiple regression analysis2
Multiple regression analysis variables using regression

  • Works in exactly the same way as linear regression only the best-fit line is made up a separate slope for each of the cause variables;

  • There is a single intercept which is the value of the effect variable when all cause variables are zero


Multiple regression analysis3
Multiple regression analysis variables using regression

  • A multiple regression using just two causevariables is possible to visualize using a 3D diagram

  • If there are any more cause variables, there is no way to display the relationships

  • Can be done easily with SPSS – Stepwise multiple regression analysis will help you to determine the most important cause factor(s)


ad