A new model of multi marker correlation for genome wide tag snp selection
Download
1 / 39

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection - PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection. Wei-Bung Wang Tao Jiang. Introduction. Outline. Introduction Problem Related Work Our Approach Result. Introduction. C T T A G C T T. 94%. C T T A G T T T. 6%. SNP. Single Nucleotide Polymorphism.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection' - gage-newman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline

Introduction SNP Selection

Outline

  • Introduction

  • Problem

  • Related Work

  • Our Approach

  • Result


Single nucleotide polymorphism

Introduction SNP Selection

C T T A G C T T

94%

C T T A G T T T

6%

SNP

Single Nucleotide Polymorphism

  • Single Nucleotide Polymorphism (SNP)

    • A genetic variation

Modified from slide by Yao-Ting Huang,

National Taiwan University

Department of Computer Science and Information Engineering


A new model of multi marker correlation for genome wide tag snp selection

Introduction SNP Selection

C T T A G C T T

94%

C T T A G T T T

6%

SNP

SNPs

  • SNPs are usually bi-allelic

    • Major allele

    • Minor allele

  • Minor allele frequency > 1% (or 5%)

  • Tri-allelic: very rare


Haplotype

Introduction SNP Selection

CTC

Haplotype 1

-A C T T T G C T C-

-A C T T A G C T T-

CAT

Haplotype 2

ATC

-A A T T T G C T C-

Haplotype 3

SNP1

SNP2

SNP3

SNP1

SNP2

SNP3

Haplotype

Modified from slide by Yao-Ting Huang,

National Taiwan University

Department of Computer Science and Information Engineering


Tag snp

Introduction SNP Selection

Tag SNP

  • What is a tag SNP?

  • Here I use some slides by Yao-Ting Huang and Kun-Mao Chao


Examples of tag snps
Examples of Tag SNPs SNP Selection

Haplotype patterns

An unknown haplotype sample

P1

P2

P3

P4

S1

  • Suppose we wish to distinguish an unknown haplotype sample.

  • We can genotype all SNPs to identify the haplotype sample.

S2

S3

S4

S5

S6

SNP loci

S7

S8

S9

: Major allele

S10

S11

: Minor allele

S12


Examples of tag snps1
Examples of Tag SNPs SNP Selection

Haplotype pattern

P1

P2

P3

P4

S1

  • In fact, it is not necessary to genotype all SNPs.

  • SNPs S3, S4, and S5 can form a set of tag SNPs.

S2

S3

S4

S5

S6

SNP loci

P1

P2

P3

P4

S7

S8

S3

S9

S4

S10

S5

S11

S12


Examples of wrong tag snps
Examples of Wrong Tag SNPs SNP Selection

Haplotype pattern

P1

P2

P3

P4

S1

  • SNPsS1, S2, and S3 can not form a set of tag SNPs because P1 and P4 will be ambiguous.

S2

S3

S4

S5

S6

SNP loci

P1

P2

P3

P4

S7

S1

S8

S2

S9

S3

S10

S11

S12


Examples of tag snps2
Examples of Tag SNPs SNP Selection

Haplotype pattern

  • SNPs S1 and S12 can form a set of tag SNPs.

  • This set of SNPs is the minimum solution in this example.

P1

P2

P3

P4

S1

S2

S3

S4

S5

S6

SNP loci

S7

S8

P1

P2

P3

P4

S9

S1

S10

S12

S11

S12


Problem

Problem SNP Selection

Problem

  • Tag SNP selection

    • How to select representatives?

    • Many different ways


Flowchart

What we do SNP Selection

Flowchart

A group of individuals(all SNPs are known)

Select

A set of SNPs(tag SNPs)

Relationships between tag SNPsand other SNPs

?

Assay

Haplotype: tag SNPs

Haplotype: all SNPs

Save money here


Problem1

Problem SNP Selection

Problem

  • Perfect world

    • Minimum set of tag SNPs

    • Save most money

    • NP-hard

  • Real life

    • Relatively small set

    • Sufficient accuracy/confidence


A new model of multi marker correlation for genome wide tag snp selection

A very frequently used method is SNP SelectionLinkage Disequilibrium (LD)

A group of individuals(all SNPs are known)

Select

What we do

A set of SNPs(tag SNPs)

Relationships between tag SNPsand other SNPs

?

Assay

Haplotype: tag SNPs

Haplotype: all SNPs

Save money here


Linkage disequilibrium ld

Related Work SNP Selection

2

r

Linkage Disequilibrium (LD)

  • Non-random association of alleles at two or more loci

  • Correlated coefficient: estimation of dependency

  • LD = correlated coefficient =


Linkage disequilibrium

Related Work SNP Selection

2

(

[

]

)

b

A

B

0

1

2

r

a

,

,

;

;

2

P

P

P

1

r

=

,

=

=

A

B

A

B

2

(

)

P

P

P

¡

A

B

A

B

2

r

=

P

P

P

P

b

A

B

a

Linkage Disequilibrium

  • r2 = 1: perfect correlation

  • r2 = 0.9: strong correlation (0.95, etc.)

  • r2 = 0: no correlation


An example

Related Work SNP Selection

An Example

A

A

T

T

C

C


Minimum dominating set problem

Related Work SNP Selection

Minimum Dominating Set Problem

Highly correlated

SNP


Examples of tag snps3
Examples of Tag SNPs SNP Selection

Haplotype patterns

An unknown haplotype sample

P1

P2

P3

P4

S1

  • Suppose we wish to distinguish an unknown haplotype sample.

S2

S3

S4

S5

S6

SNP loci

S7

S8

S9

: Major allele

S10

S11

: Minor allele

S12


Examples of tag snps4
Examples of Tag SNPs SNP Selection

Haplotype patterns

An unknown haplotype sample

P1

P2

P3

P4

S1

  • Suppose we wish to distinguish an unknown haplotype sample.

S2

S3

S4

S5

S6

SNP loci

S7

S8

S9

: Major allele

S10

S11

: Minor allele

S12


Examples of tag snps5
Examples of Tag SNPs SNP Selection

Haplotype pattern

  • SNPs S1 and S12 can form a set of tag SNPs.

  • This set of SNPs is the minimum solution in this example.

P1

P2

P3

P4

S1

S2

S3

S4

S5

SNPs can work together and help each other

S6

SNP loci

S7

S8

P1

P2

P3

P4

S9

S1

S10

S12

S11

S12


Our approach

We introduce new allele SNP SelectionAC and AC

Only one mistake

Our Approach

A

C

A

C

:

T

0

7

0

0

7

.

.

G

0

1

0

2

0

3

.

.

.

0

8

0

2

.

.

Our Approach

A C G else T


Our approach1

(snp SNP Selection1, snp2) vs. snp3

(snp1, snp2) vs. snp4

Our Approach

(

)

A

C

A

C

C

T

G

A

_

,

,

(

)

A

C

A

C

C

T

T

C

_

:

:

,

,

Our Approach


In the right order

Our SNP SelectionApproach

In the Right Order

A group of individuals(all SNPs are known)

Select

A set of SNPs(tag SNPs)

Relationships between tag SNPsand other SNPs

Second

First


Our approach2

Our SNP SelectionApproach

If SNP 1, 4, 10 are tag SNPsPredict SNP 17 with patterns …Accuracy / LD: 0.97

.

If SNP 5, 8, 13 are tag SNPsPredict SNP 11 with patterns …Accuracy / LD: 0.62

.

Our Approach

  • Generate relationships …………


How to predict determine the alleles

Our SNP SelectionApproach

(

)

b

b

b

b

A

B

C

P

A

C

A

P

C

M

D

j

_

_

_

_

>

c

a

a

m

c

a

o

r

)

=

)

d

A

B

C

D

A

B

C

(

)

d

P

A

B

P

B

C

B

i

_

_

<

c

a

a

m

c

n

o

r

m

)

=

)

d

A

B

D

A

B

c

c

¢

¢

¢

How to Predict / Determine the Alleles?

  • LD: (tag) SNP 1, 2, 3 vs. SNP 4

  • Allele A/a, B/b, C/c, D/d

abc

abC

Abc

aBc

AbC

aBC

SNP[123] becomes bi-allelic

ABC

ABc

majorbucket

minorbucket


Similar work

Our SNP SelectionApproach

Similar Work

  • Ke Hao also did a similar work

    • The same LD model

    • Different way to determine alleles for composite SNPs

      • Less flexibility

      • A special case of our model

    • Related paper: “Genome-wide selection of tag SNPs using multiple-marker correlation,” Bioinformatics, 2007


Sketch

Our SNP SelectionApproach

Sketch

  • Get r2 value for all possible combinations

  • Find a small subset of SNPs according to LD


Sketch1

Our SNP SelectionApproach

Sketch

  • Find a small subset of SNPs according to LD

Tag SNPs

Tag SNPs are also covered by themselves

covered

partialcovered


Sketch2

Our SNP SelectionApproach

Sketch

  • Simple greedy algorithm (Ke Hao)

    • Cover more SNPs in each iteration

  • Modified greedy algorithm (my work)

    • A SNP that can’t be covered by others

      • High priority

    • A SNP that is not picked but covered

      • OK

    • Break tie: partial cover


Supersede

Our SNP SelectionApproach

Supersede

No longer contributes


Supersede1

Our SNP SelectionApproach

Supersede


My program mmtagger

Our SNP SelectionApproach

l

i

h

A

T

M

M

M

T

1

t

w

o

a

r

k

e

r

a

g

g

e

r

g

o

r

m

-

i

f

l

R

i

t

t

t

e

q

u

r

e

:

s

e

o

r

p

e

s

h

i

l

h

d

d

S

N

P

t

1

w

e

o

e

r

e

a

r

e

s

u

n

c

o

v

e

r

e

:

i

f

h

h

d

h

S

N

P

i

i

i

i

t

t

t

2

e

n

e

r

e

s

a

s

w

n

o

n

c

o

m

n

g

e

g

e

s

:

¤

3

s

s

Ã

:

l

4

e

s

e

:

¤

h

h

h

d

S

N

P

S

N

P

t

t

t

t

t

5

s

e

a

c

o

v

e

r

s

e

m

o

s

u

n

c

o

v

e

r

e

s

Ã

:

(

)

f

h

l

f

f

d

¤

i

t

t

t

B

6

o

r

e

a

c

o

r

p

e

s

o

o

r

m

s

s

s

:

i

j

;

d

d

d

i

i

t

t

7

r

e

m

o

v

e

a

n

s

c

o

r

r

e

s

p

o

n

n

g

e

g

e

s

:

/

*

*

/

¤

¤

\

k

d

"

P

S

N

P

i

i

i

t

t

t

t

8

u

s

n

o

a

g

s

e

s

s

p

c

e

:

(

)

(

)

f

h

l

f

f

d

¤

¤

i

t

t

t

B

B

9

o

r

e

a

c

o

r

p

e

s

o

o

r

m

s

s

s

o

r

s

s

s

:

i

j

i

j

;

;

i

f

k

d

h

i

i

t

1

0

e

n

s

s

p

c

e

:

i

d

S

N

P

i

t

t

t

1

1

p

u

s

n

o

c

o

v

e

r

e

s

e

:

j

d

d

d

i

i

t

t

1

2

r

e

m

o

v

e

a

n

s

c

o

r

r

e

s

p

o

n

n

g

e

g

e

s

:

l

1

3

e

s

e

:

(

0

)

(

0

)

l

l

l

f

f

i

t

t

B

B

1

4

r

e

m

o

v

e

a

r

p

e

s

o

o

r

m

s

s

s

o

r

s

s

s

:

i

j

i

j

;

;

My Program: MMTagger

Pick a SNP

Data structure


Complexity

Our SNP SelectionApproach

(

)

­

T

Complexity

  • Computing r2 value

    • O(nk+1) for k-marker

  • Picking tag SNPs

    • where T is the number of relationships

    • O(T log T) time algorithm


Result

Result SNP Selection

Result

  • Our program: MMTagger

  • Vs. Single-marker approach (LRTag)

    • A state-of-the-art program

    • Single-marker

  • Vs. Hao’s program (MultiTag)

    • Multi-marker


Vs single marker approach

Result SNP Selection

Vs. Single-Marker Approach


Mmtagger vs multitag

Result SNP Selection

MMTagger Vs. MultiTag


Conclusion
Conclusion SNP Selection

  • We provide a new multi-marker model

    • Size of tag SNP set

      • 2- vs. 1-marker: apparently better

      • 3- vs. 2-marker: slightly better

      • 4-marker or more: slow, unacceptable

  • Performance

    • Our program outperforms the only other program with similar model


A new model of multi marker correlation for genome wide tag snp selection

Thank you! SNP Selection