Computational issues on statistical genetics
Download
1 / 27

Computational Issues on Statistical Genetics - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Computational Issues on Statistical Genetics. Research Question s Review the Literature. Test the power and robustness by c omputer simulation. Develop Method s. Database construction (Excel, Access) Translate d ata to analyzable form Preliminary results (figures, tables).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computational Issues on Statistical Genetics' - tobias


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computational issues on statistical genetics
Computational Issues on Statistical Genetics

Research Questions

Review the Literature

Test the power and robustness by computer simulation

Develop Methods

Database construction (Excel, Access)

Translate data to analyzable form

Preliminary results (figures, tables)

Data Collection

Analyze Data

Program languages

Efficient, feasible

Write Reports/Papers

  • Graphics

    • Excel graphics

    • Programmable graphics


Program languages
Program Languages

  • Fortran, C, C++

  • Matrix language: MATLAB, S-Plus, R, SAS IML

  • Symbolic Calculation: Mathematika,Maple,Matlab

  • Interface Programming: dotnet, C#, Visual Basic

  • SAS, SPSS, BMDP

  • Database: Access, Excel, SQL, SAS, Oracle

  • MACRO

    • Excel, Access, PowerPoint, Word

    • Editor: WinEdt

    • SAS Macro


Two point analysis in f2 fully informative markers codominant
Two Point Analysis in F2Fully Informative Markers (codominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0


Computational issues on statistical genetics

EM algorithm to estimate the recombination fraction r:

  • Given r(0), For t=0,1, 2,…

  • Do While abs[r(t+1)-r(t)]>1.e-8

    • E-step: Calculate (t) = r(t)2/[(1-r(t))2+r(t)2] (expected the number of recombination events for the double heterozygote AaBb)

    • M-step: r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2(t)n11]


Two point analysis in f2 fully informative markers codominant1
Two Point Analysis in F2Fully Informative Markers (codominant)

Input:

Result:

BB

Bb

bb

AA

Aa

aa

n

r0

(t) = r(t)2/[(1-r(t))2+r(t)2]

r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2(t)n11]


Two point analysis in f2 fully informative markers codominant2
Two Point Analysis in F2Fully Informative Markers (codominant)

Matlab program to estimate recombinant r

function r=rEstF2(n22,n21,n20,n12,n11,n10,n02,n01,n00)

n=n22+n21+n20+n12+n11+n10+n02+n01+n00;

r=0.2; r1=-1;

while (abs(r1-r)>1.e-8)

r1=r;

%E-step

phi=r^2/((1-r)^2+r^2);

%M step

r=1/(2*n)*(2*(n20+n02)+(n21+n12+n10+n01)+2*phi*n11);

end


Log likelihood ratio test statistic
Log-likelihood ratio test statistic

Two alternative hypotheses

H0: r = 0.5 vs. H1: r  0.5

Likelihood value under H1

L1(r|nij) = n!/(n22!...n00!)

[¼(1-r)2]n22+n00[¼r2]n20+n02[½r(1-r)]n21+n12+n10+n01[½(1-r)2+½r2]n11

Likelihood value under H0

L0(r=0.5|nij) = n!/(n22!...n00!)

[¼(1-0.5)2]n22+n00[¼0.52]n20+n02[½0.5(1-0.5)]n21+n12+n10+n01[½(1-0.5)2+½0.52]n11

LOD = log10[L1(r|nij)/L0(r=0.5|nij)]

= {(n22+n00)2[log10(1-r)-log10(1-0.5)+…} = 6.08 > critical LOD=3


Two point analysis in f2 fully informative markers codominant3
Two Point Analysis in F2Fully Informative Markers (codominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2(r,n22,n21,n20,n12,n11,n10,n02,n01,n00)

%%log likelihood under H1

LOD=(n22+n00)*log10((1-r)^2/4)...

+(n20+n02)*log10(r^2/4)...

+(n21+n12+n10+n01)*log10(r*(1-r)/2)...

+n11*log10((1-r)^2/2+r^2/2);

%%log likelihood under H0

r=0.5;

LOD0=(n22+n00)*log10((1-r)^2/4)...

+(n20+n02)*log10(r^2/4)...

+(n21+n12+n10+n01)*log10(r*(1-r)/2)...

+n11*log10((1-r)^2/2+r^2/2);

LOD=LOD-LOD0;


Two point analysis in f2 partial informative markers codominant x dominant
Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2 ½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0


Two point analysis in f2 partial informative markers codominant x dominant1
Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

B_ bb

AA Obs n2_ =n22+n21 n20

Freq ¼(1-r)2+½r(1-r) ¼r2

Recom. C1= ½r(1-r)/[¼(1-r)2+½r(1-r)] 2

Aa Obs n1_ =n12+n11 n10

Freq ½r(1-r)+½(1-r)2+½r2½r(1-r)

Recom. C2=[½r(1-r) +r2]/[½r(1-r)+½(1-r)2+½r2] 1

aa Obs n0_ =n02+n01 n00

Freq ¼r2+½r(1-r) ¼(1-r)2

Recom. C3=[2* ¼r2+½r(1-r)]/[¼r2+½r(1-r)] 0

Estimate of r=(c1* n2_ +c2* n1_ +c3* n0_+2* n20 + n00)/(2n)


Two point analysis in f2 partial informative markers codominant x dominant2
Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

E-Step

C1= ½r(1-r)/[¼(1-r)2+½r(1-r)]

C2=[½r(1-r) +r2]/[½r(1-r)+½(1-r)2+½r2]

C3=[2* ¼r2+½r(1-r)]/[¼r2+½r(1-r)]

M-Step

r=(c1* n2_ +c2* n1_ +c3* n0_+2* n20 + n00)/(2n)


Two point analysis in f2 partial informative markers codominant x dominant3
Two Point Analysis in F2Partial Informative Markers (codominant X dominant)

Input:

Result:

B_

bb

AA

Aa

aa

n

r0


Two point analysis in f2 partial informative markers co dominant x dominant
Two Point Analysis in F2Partial Informative Markers (co dominant X dominant)

Matlab program to estimate recombinant r

function r=rEstF2CoXdomin(n2_,n1_,n0_,n20,n10,n00)

n=n2_+n1_+n0_+n20+n10+n00;

r=0.2;r1=-1;

while(abs(r1-r)>1.e-8)

r1=r;

%E-step

c1= 1/2*r*(1-r)/[1/4*(1-r)^2+ 1/2*r*(1-r)];

c2=[1/2*r*(1-r)+r^2]/[1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2];

c3=[2*1/4*r^2+1/2*r*(1-r)]/[1/4*r^2+1/2*r*(1-r)];

%M-step

r=(c1*n2_+c2* n1_ +c3* n0_+2* n20 + n00)/(2*n);

end


Two point analysis in f2 partial informative markers co dominant x dominant1
Two Point Analysis in F2Partial Informative Markers (co dominant X dominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2CoXdomin(r, n2_,n1_,n0_,n20,n10,n00)

%%log likelihood under H1

LOD=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_ ...

+log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_ ...

+log([1/4*r^2+1/2*r*(1-r)])*n0_ ...

+log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00;

%%log likelihood under H0

r=0.5;

LOD0=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_ ...

+log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_ ...

+log([1/4*r^2+1/2*r*(1-r)])*n0_ ...

+log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00;

LOD=LOD-LOD0;

LOD=LOD/log(10);


Two point analysis in f2 partial informative markers dominant
Two Point Analysis in F2Partial Informative Markers (dominant)

BB Bb bb

AA Obs n22 n21 n20

Freq ¼(1-r)2½r(1-r)¼r2

Recom. 0 1 2

Aa Obs n12 n11 n10

Freq ½r(1-r) ½(1-r)2+½r2½r(1-r)

Recom. 1 2r2/[(1-r)2+r2] 1

aa Obs n02 n01 n00

Freq ¼r2½r(1-r) ¼(1-r)2

Recom. 2 1 0


Two point analysis in f2 partial informative markers dominant1
Two Point Analysis in F2Partial Informative Markers (dominant)

B_ bb

A_ Obs n1 =n22+n21 +n12 +n11 n2=n20 +n10

Freq ¼(1-r)2 +r(1-r)+½(1-r)2+½r2¼r2

Recom. c1 c2

aa Obs n3=n02+n01 n4=n00

Freq ¼r2 +½r(1-r) ¼(1-r)2

Recom. C2= (2(¼r2 )+½r(1-r))0

/(¼r2 +½r(1-r))

where C1=[r2+r(1-r)]/[ ¼(1-r)2 +r(1-r)+½(1-r)2+½r2],expected

number of recombinant gametes

Estimate of r=(c1* n1 +c2* n2 +c2* n3)/(2n)


Two point analysis in f2 fully informative markers codominant4
Two Point Analysis in F2Fully Informative Markers (codominant)

Input:

Result:

B_

bb

A_

aa

n

r0

C1=[r2+r(1-r)]/[ ¼(1-r)2 +r(1-r)+½(1-r)2+½r2],

C2= (2(¼r2 )+½r(1-r)) /(¼r2 +½r(1-r))

Estimate of r=(c1* n1 +c2* n2 +c2* n3)/(2n)


Two point analysis in f2 partial informative markers dominant2
Two Point Analysis in F2Partial Informative Markers (dominant)

Matlab program to estimate recombinant r

function r=rEstF2Partial(n1,n2,n3,n4)

n=n1+n2+n3+n4;

r=0.2;r1=-1;

while (abs(r1-r)>1.e-8)

r1=r;

%E-step

c1=(r^2+r*(1-r))/((1-r)^2/4+r*(1-r)+(1-r)^2/2+r^2/2);

c2=(r^2/2+r*(1-r)/2)/(r^2/4+r*(1-r)/2);

%M-step

r=1/(2*n)*(c1*n1+c2*n2+c2*n3);

end


Log likelihood ratio test statistic partial informative markers dominant
Log-likelihood ratio test statisticPartial Informative Markers (dominant)

Two alternative hypotheses

H0: r = 0.5 vs. H1: r  0.5

Likelihood value under H1

L1(r|nij) = n!/(n1!...n4!)

[3/4(1-r)2 +r(1-r)+½r2]n1[¼r2 +½r(1-r)]n2+n3[¼(1-r)2]n4

Likelihood value under H0

L0(r=0.5|nij) = n!/(n1!...n4!)

[3/4(1-.5)2 +.5(1-.5)+½.52]n1[¼.52 +½.5(1-.5)]n2+n3[¼(1-.5)2]n4

LOD = log10[L1(r|nij)/L0(r=0.5|nij)]

= 3.17 > critical LOD=3


Two point analysis in f2 partial informative markers dominant3
Two Point Analysis in F2Partial Informative Markers (dominant)

Matlab program to calculate log likelihood test score (LOD)

function LOD=calcLOD_F2Partial(r,n1,n2,n3,n4)

%%log likelihood under H1

LOD=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))...

+(n2+n3)*log10(r^2/4+r*(1-r)/2)...

+(n4)*log10((1-r)^2/4);

%%log likelihood under H0

r=0.5;

LOD0=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))...

+(n2+n3)*log10(r^2/4+r*(1-r)/2)...

+(n4)*log10((1-r)^2/4);

LOD=LOD-LOD0;



Computational issues on statistical genetics

chrom1

chrom2

chrom3

chrom4

RG218

RG437

RG104

RG472

13.0

7.7

8.1

RZ262

RG348

RG544

8.6

19.2

5.3

13.2

RG190

RG246

RZ329

12.6

RG171

6.9

16.1

22.2

RZ892

RG908

9.8

13.7

K5

RG157

RG91

4.8

2.8

RG100

3.2

U10

RG449

4.7

27.4

RG532

17.5

RG191

16.1

RZ318

RG788

15.3

6.3

RZ678

8.4

W1

RZ565

15.5

RG173

16.8

29.3

Pall

41.6

RZ675

Amy1B

15.0

21.4

RZ58

RZ574

3.8

RZ276

10.2

RG163

CDO686

37.1

3.3

RG146

8.8

RZ284

28.2

Amy1A/C

RZ590

34.3

15.6

12.8

2.7

RG345

RZ394

RG214

2.5

RG95

8.4

18.5

12.2

RG381

pRD10A

RG143

23.5

RG654

5.1

2.5

5.9

RZ19

RZ403

10.0

RG620

8.2

RG256

RG690

5.0

5.4

RG179

28.6

13.2

RZ730

13.1

RZ213

CDO337

1.9

RZ123

RZ337A

33.1

22.5

RZ801

RZ448

RG520

15.0

2.6

RG810

RZ519

RG331

32.1

9.2

Pgi -1

7.1

CDO87

9.2

RG910

17.9

RG418A


Three point analysis in backcross
Three Point Analysis in Backcross

Summarized the data as


Rice data
Rice Data

Marker RG472 denoted by A, RG246 by B, K5 by C


Multilocus likelihood determination of a most likely gene order
Multilocus likelihood – determination of a most likely gene order

  • Consider three markers A, B, C, with no particular order assumed.

  • A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc

    Genotype ABC or abc ABc or abC Abc or aBC AbC or aBc

    Obs. n00 =69 n01=12 n10=16 n11=3

    Frequency under

    Order A-B-C (1-rAB)(1- rBC) (1-rAB) rBC rAB(1- rBC) rAB rBC

    Order A-C-B (1-rAC)(1- rBC) rAC rBC rAC(1-rBC)(1-rAC)rBC

    Order B-A-C (1-rAB)(1- rAC) (1-rAB) rAC rABrAC rAB(1-rAC)

    rAB = the recombination fraction between A and B= (n10 + n11)/n=0.19

    rBC = the recombination fraction between B and C=(n01 + n11)/n=0.15

    rAC = the recombination fraction between A and C=(n01 + n10)/n=0.28


Computational issues on statistical genetics

What order is the mostly likely?

LABC (1-rAB)n00+n01 (1-rBC)n00+n10 (rAB)n10+n11 (rBC)n01+n11

LACB (1-rAC)n00+n11 (1-rBC)n00+n10 (rAC)n01+n10 (rBC)n01+n11

LBAC (1-rAB)n00+n01 (1-rAC)n00+n11 (rAB)n10+n11 (rAC)n01+n10

Log(LABC) = -90.8932

Loo(LACB) = -101.5662

Log(LBAC) = -107.9176

According to the maximum likelihood principle, the linkage order that gives the maximum likelihood for a data set is the best linkage order supported by the data.

the best linkage order

A B C

20cM 15cM


Computational issues on statistical genetics

DATA

Genotype ABC or abc ABc or abC Abc or aBC AbC or aBc

Obs. n00 =69 n01=12 n10=16 n11=3

Result:

rAB = =0.19

rBC = =0.15

rAC = =0.28

dAB=1/4*ln[(1+2 rAB)/(1-2 rAB)]=20

dBC=1/4*ln[(1+2 rBC)/(1-2 rBC)]=15

Log(LABC) = -90.8932

Loo(LACB) = -101.5662

Log(LBAC) = -107.9176

the best linkage order

A B C

20cM 15cM