1 / 27

# Computational Issues on Statistical Genetics - PowerPoint PPT Presentation

Computational Issues on Statistical Genetics. Research Question s Review the Literature. Test the power and robustness by c omputer simulation. Develop Method s. Database construction (Excel, Access) Translate d ata to analyzable form Preliminary results (figures, tables).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Computational Issues on Statistical Genetics' - tobias

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Research Questions

Review the Literature

Test the power and robustness by computer simulation

Develop Methods

Database construction (Excel, Access)

Translate data to analyzable form

Preliminary results (figures, tables)

Data Collection

Analyze Data

Program languages

Efficient, feasible

Write Reports/Papers

• Graphics

• Excel graphics

• Programmable graphics

• Fortran, C, C++

• Matrix language: MATLAB, S-Plus, R, SAS IML

• Symbolic Calculation: Mathematika,Maple,Matlab

• Interface Programming: dotnet, C#, Visual Basic

• SAS, SPSS, BMDP

• Database: Access, Excel, SQL, SAS, Oracle

• MACRO

• Excel, Access, PowerPoint, Word

• Editor: WinEdt

• SAS Macro

Two Point Analysis in F2Fully Informative Markers (codominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0

• Given r(0), For t=0,1, 2,…

• Do While abs[r(t+1)-r(t)]>1.e-8

• E-step: Calculate (t) = r(t)2/[(1-r(t))2+r(t)2] (expected the number of recombination events for the double heterozygote AaBb)

• M-step: r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2(t)n11]

Two Point Analysis in F2Fully Informative Markers (codominant)

Input:

Result:

BB

Bb

bb

AA

Aa

aa

n

r0

(t) = r(t)2/[(1-r(t))2+r(t)2]

r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2(t)n11]

Two Point Analysis in F2Fully Informative Markers (codominant)

Matlab program to estimate recombinant r

function r=rEstF2(n22,n21,n20,n12,n11,n10,n02,n01,n00)

n=n22+n21+n20+n12+n11+n10+n02+n01+n00;

r=0.2; r1=-1;

while (abs(r1-r)>1.e-8)

r1=r;

%E-step

phi=r^2/((1-r)^2+r^2);

%M step

r=1/(2*n)*(2*(n20+n02)+(n21+n12+n10+n01)+2*phi*n11);

end

Two alternative hypotheses

H0: r = 0.5 vs. H1: r  0.5

Likelihood value under H1

L1(r|nij) = n!/(n22!...n00!)

[¼(1-r)2]n22+n00[¼r2]n20+n02[½r(1-r)]n21+n12+n10+n01[½(1-r)2+½r2]n11

Likelihood value under H0

L0(r=0.5|nij) = n!/(n22!...n00!)

[¼(1-0.5)2]n22+n00[¼0.52]n20+n02[½0.5(1-0.5)]n21+n12+n10+n01[½(1-0.5)2+½0.52]n11

LOD = log10[L1(r|nij)/L0(r=0.5|nij)]

= {(n22+n00)2[log10(1-r)-log10(1-0.5)+…} = 6.08 > critical LOD=3

Two Point Analysis in F2Fully Informative Markers (codominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2(r,n22,n21,n20,n12,n11,n10,n02,n01,n00)

%%log likelihood under H1

LOD=(n22+n00)*log10((1-r)^2/4)...

+(n20+n02)*log10(r^2/4)...

+(n21+n12+n10+n01)*log10(r*(1-r)/2)...

+n11*log10((1-r)^2/2+r^2/2);

%%log likelihood under H0

r=0.5;

LOD0=(n22+n00)*log10((1-r)^2/4)...

+(n20+n02)*log10(r^2/4)...

+(n21+n12+n10+n01)*log10(r*(1-r)/2)...

+n11*log10((1-r)^2/2+r^2/2);

LOD=LOD-LOD0;

Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2 ½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0

Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

B_ bb

AA Obs n2_ =n22+n21 n20

Freq ¼(1-r)2+½r(1-r) ¼r2

Recom. C1= ½r(1-r)/[¼(1-r)2+½r(1-r)] 2

Aa Obs n1_ =n12+n11 n10

Freq ½r(1-r)+½(1-r)2+½r2½r(1-r)

Recom. C2=[½r(1-r) +r2]/[½r(1-r)+½(1-r)2+½r2] 1

aa Obs n0_ =n02+n01 n00

Freq ¼r2+½r(1-r) ¼(1-r)2

Recom. C3=[2* ¼r2+½r(1-r)]/[¼r2+½r(1-r)] 0

Estimate of r=(c1* n2_ +c2* n1_ +c3* n0_+2* n20 + n00)/(2n)

Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

E-Step

C1= ½r(1-r)/[¼(1-r)2+½r(1-r)]

C2=[½r(1-r) +r2]/[½r(1-r)+½(1-r)2+½r2]

C3=[2* ¼r2+½r(1-r)]/[¼r2+½r(1-r)]

M-Step

r=(c1* n2_ +c2* n1_ +c3* n0_+2* n20 + n00)/(2n)

Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

Input:

Result:

B_

bb

AA

Aa

aa

n

r0

Two Point Analysis in F2Partial Informative Markers (co dominant X dominant)

Matlab program to estimate recombinant r

function r=rEstF2CoXdomin(n2_,n1_,n0_,n20,n10,n00)

n=n2_+n1_+n0_+n20+n10+n00;

r=0.2;r1=-1;

while(abs(r1-r)>1.e-8)

r1=r;

%E-step

c1= 1/2*r*(1-r)/[1/4*(1-r)^2+ 1/2*r*(1-r)];

c2=[1/2*r*(1-r)+r^2]/[1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2];

c3=[2*1/4*r^2+1/2*r*(1-r)]/[1/4*r^2+1/2*r*(1-r)];

%M-step

r=(c1*n2_+c2* n1_ +c3* n0_+2* n20 + n00)/(2*n);

end

Two Point Analysis in F2Partial Informative Markers (co dominant X dominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2CoXdomin(r, n2_,n1_,n0_,n20,n10,n00)

%%log likelihood under H1

LOD=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_ ...

+log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_ ...

+log([1/4*r^2+1/2*r*(1-r)])*n0_ ...

+log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00;

%%log likelihood under H0

r=0.5;

LOD0=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_ ...

+log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_ ...

+log([1/4*r^2+1/2*r*(1-r)])*n0_ ...

+log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00;

LOD=LOD-LOD0;

LOD=LOD/log(10);

Two Point Analysis in F2Partial Informative Markers (dominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0

Two Point Analysis in F2Partial Informative Markers (dominant)

B_ bb

A_ Obs n1 =n22+n21 +n12 +n11 n2=n20 +n10

Freq ¼(1-r)2 +r(1-r)+½(1-r)2+½r2¼r2

Recom. c1 c2

aa Obs n3=n02+n01 n4=n00

Freq ¼r2 +½r(1-r) ¼(1-r)2

Recom. C2= (2(¼r2 )+½r(1-r))0

/(¼r2 +½r(1-r))

where C1=[r2+r(1-r)]/[ ¼(1-r)2 +r(1-r)+½(1-r)2+½r2],expected

number of recombinant gametes

Estimate of r=(c1* n1 +c2* n2 +c2* n3)/(2n)

Two Point Analysis in F2Fully Informative Markers (codominant)

Input:

Result:

B_

bb

A_

aa

n

r0

C1=[r2+r(1-r)]/[ ¼(1-r)2 +r(1-r)+½(1-r)2+½r2],

C2= (2(¼r2 )+½r(1-r)) /(¼r2 +½r(1-r))

Estimate of r=(c1* n1 +c2* n2 +c2* n3)/(2n)

Two Point Analysis in F2Partial Informative Markers (dominant)

Matlab program to estimate recombinant r

function r=rEstF2Partial(n1,n2,n3,n4)

n=n1+n2+n3+n4;

r=0.2;r1=-1;

while (abs(r1-r)>1.e-8)

r1=r;

%E-step

c1=(r^2+r*(1-r))/((1-r)^2/4+r*(1-r)+(1-r)^2/2+r^2/2);

c2=(r^2/2+r*(1-r)/2)/(r^2/4+r*(1-r)/2);

%M-step

r=1/(2*n)*(c1*n1+c2*n2+c2*n3);

end

Log-likelihood ratio test statisticPartial Informative Markers (dominant)

Two alternative hypotheses

H0: r = 0.5 vs. H1: r  0.5

Likelihood value under H1

L1(r|nij) = n!/(n1!...n4!)

[3/4(1-r)2 +r(1-r)+½r2]n1[¼r2 +½r(1-r)]n2+n3[¼(1-r)2]n4

Likelihood value under H0

L0(r=0.5|nij) = n!/(n1!...n4!)

[3/4(1-.5)2 +.5(1-.5)+½.52]n1[¼.52 +½.5(1-.5)]n2+n3[¼(1-.5)2]n4

LOD = log10[L1(r|nij)/L0(r=0.5|nij)]

= 3.17 > critical LOD=3

Two Point Analysis in F2Partial Informative Markers (dominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2Partial(r,n1,n2,n3,n4)

%%log likelihood under H1

LOD=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))...

+(n2+n3)*log10(r^2/4+r*(1-r)/2)...

+(n4)*log10((1-r)^2/4);

%%log likelihood under H0

r=0.5;

LOD0=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))...

+(n2+n3)*log10(r^2/4+r*(1-r)/2)...

+(n4)*log10((1-r)^2/4);

LOD=LOD-LOD0;

chrom2

chrom3

chrom4

RG218

RG437

RG104

RG472

13.0

7.7

8.1

RZ262

RG348

RG544

8.6

19.2

5.3

13.2

RG190

RG246

RZ329

12.6

RG171

6.9

16.1

22.2

RZ892

RG908

9.8

13.7

K5

RG157

RG91

4.8

2.8

RG100

3.2

U10

RG449

4.7

27.4

RG532

17.5

RG191

16.1

RZ318

RG788

15.3

6.3

RZ678

8.4

W1

RZ565

15.5

RG173

16.8

29.3

Pall

41.6

RZ675

Amy1B

15.0

21.4

RZ58

RZ574

3.8

RZ276

10.2

RG163

CDO686

37.1

3.3

RG146

8.8

RZ284

28.2

Amy1A/C

RZ590

34.3

15.6

12.8

2.7

RG345

RZ394

RG214

2.5

RG95

8.4

18.5

12.2

RG381

pRD10A

RG143

23.5

RG654

5.1

2.5

5.9

RZ19

RZ403

10.0

RG620

8.2

RG256

RG690

5.0

5.4

RG179

28.6

13.2

RZ730

13.1

RZ213

CDO337

1.9

RZ123

RZ337A

33.1

22.5

RZ801

RZ448

RG520

15.0

2.6

RG810

RZ519

RG331

32.1

9.2

Pgi -1

7.1

CDO87

9.2

RG910

17.9

RG418A

Summarized the data as

Marker RG472 denoted by A, RG246 by B, K5 by C

Multilocus likelihood – determination of a most likely gene order

• Consider three markers A, B, C, with no particular order assumed.

• A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc

Genotype ABC or abc ABc or abC Abc or aBC AbC or aBc

Obs. n00 =69 n01=12 n10=16 n11=3

Frequency under

Order A-B-C (1-rAB)(1- rBC) (1-rAB) rBC rAB(1- rBC) rAB rBC

Order A-C-B (1-rAC)(1- rBC) rAC rBC rAC(1-rBC)(1-rAC)rBC

Order B-A-C (1-rAB)(1- rAC) (1-rAB) rAC rABrAC rAB(1-rAC)

rAB = the recombination fraction between A and B= (n10 + n11)/n=0.19

rBC = the recombination fraction between B and C=(n01 + n11)/n=0.15

rAC = the recombination fraction between A and C=(n01 + n10)/n=0.28

LABC (1-rAB)n00+n01 (1-rBC)n00+n10 (rAB)n10+n11 (rBC)n01+n11

LACB (1-rAC)n00+n11 (1-rBC)n00+n10 (rAC)n01+n10 (rBC)n01+n11

LBAC (1-rAB)n00+n01 (1-rAC)n00+n11 (rAB)n10+n11 (rAC)n01+n10

Log(LABC) = -90.8932

Loo(LACB) = -101.5662

Log(LBAC) = -107.9176

According to the maximum likelihood principle, the linkage order that gives the maximum likelihood for a data set is the best linkage order supported by the data.

A B C

20cM 15cM

Genotype ABC or abc ABc or abC Abc or aBC AbC or aBc

Obs. n00 =69 n01=12 n10=16 n11=3

Result:

rAB = =0.19

rBC = =0.15

rAC = =0.28

dAB=1/4*ln[(1+2 rAB)/(1-2 rAB)]=20

dBC=1/4*ln[(1+2 rBC)/(1-2 rBC)]=15

Log(LABC) = -90.8932

Loo(LACB) = -101.5662

Log(LBAC) = -107.9176