Optimization problems for polymorphisms of single nucleotides
Download
1 / 74

Optimization Problems for Polymorphisms of Single Nucleotides - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Optimization Problems for Polymorphisms of Single Nucleotides. Polymorphisms. A polymorphism is a feature. Polymorphisms. A polymorphism is a feature - common to everybody. Polymorphisms. A polymorphism is a feature - common to everybody - not identical in everybody.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Optimization Problems for Polymorphisms of Single Nucleotides' - abdalla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimization problems for polymorphisms of single nucleotides

Optimization Problems for Polymorphisms of Single Nucleotides


Polymorphisms
Polymorphisms

A polymorphism is a feature


Polymorphisms1
Polymorphisms

A polymorphism is a feature

- common to everybody


Polymorphisms2
Polymorphisms

A polymorphism is a feature

- common to everybody

- not identical in everybody


Polymorphisms3
Polymorphisms

A polymorphism is a feature

- common to everybody

- not identical in everybody

- the possible variants (alleles) are just a few


Polymorphisms4
Polymorphisms

A polymorphism is a feature

- common to everybody

- not identical in everybody

- the possible variants (alleles) are just a few

E.g. think of eye-color


Polymorphisms5
Polymorphisms

A polymorphism is a feature

- common to everybody

- not identical in everybody

- the possible variants (alleles) are just a few

E.g. think of eye-color

Or blood-type for a feature not visible from outside



At DNA level, a polymorphism is a sequence of nucleotides

varying in a population.

The shortest possible sequence has only 1 nucleotide, hence

Single Nucleotide Polymorphism (SNP)


At DNA level, a polymorphism is a sequence of nucleotides

varying in a population.

The shortest possible sequence has only 1 nucleotide, hence

Single Nucleotide Polymorphism (SNP)

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac


At DNA level, a polymorphism is a sequence of nucleotides

varying in a population.

The shortest possible sequence has only 1 nucleotide, hence

Single Nucleotide Polymorphism (SNP)

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


- SNPs are predominant form of human variations

- On average one every 1,000 bases

- Used for drug design, study disease, forensic, evolutionary...

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


- Multimillion dollar SNP consortium project

- 1st step: buildmaps of severalthousandSNPs

- Goal: associate SNPs (or group of SNPs) to geneticdiseases

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

HAPLOTYPE: chromosome content at SNP sites

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

HAPLOTYPE: chromosome content at SNP sites

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgt

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

HAPLOTYPE: chromosome content at SNP sites

ct

cg

ag

at

at

at

ct

ag

ag

cg

ag

ag

ag

cg


HOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUS: different alleles

HAPLOTYPE: chromosome content at SNP sites

GENOTYPE: “union” of 2 haplotypes

ct

OcE

cg

ag

OaE

at

at

OaOt

at

ct

EE

ag

ag

EOg

cg

ag

ag

OaOg

OgE

ag

cg


CHANGE OF SYMBOLS: each SNP onlytwovalues in a poplulation (bio).

Call them1 and O. Also, call *the factthat a site isheterozygous

HAPLOTYPE: string over 1,O

GENOTYPE: string over 1,O,*

ct

OcE

cg

ag

OaE

at

at

OaOt

at

ct

EE

ag

ag

EOg

cg

ag

ag

OaOg

OgE

ag

cg


CHANGE OF SYMBOLS: each SNP onlytwovalues in a poplulation (bio).

Call them1and O. Also, call *the factthat a site isheterozygous

HAPLOTYPE: string over 1,O

GENOTYPE: string over 1,O,*

o1

o*

oo

1o

1*

11

11

11

11

o1

**

1o

1o

*o

oo

1o

1o

*o

*o

1o

oo


THE HAPLOTYPING PROBLEM

Single Individual: Given genomic data of one individual, determine

2 haplotypes (one per chromosome)

Population : Given genomic data of k individuals, determine

(at most) 2k haplotypes (one per chromosome/indiv.)

For the individual problem, input is erroneous haplotype data, from sequencing

For the population problem, data is ambiguous genotype data, from screening

OBJ is lead by Occam’s razor: find minimum explanation of observed data

under given hypothesis (a.k.a. parsimony principle)


Theory and results
Theory and Results

Single individual

- PolynomialAlgorithms for gaplesshaplotyping(L, Bafna, Istrail, Lippert,

Schwartz 01 & Bafna, L, Istrail, Rizzi 02)

- Polynomial Algorithms for bounded-length gapped haplotyping

(BLIR 02)

- NP-hardness for general gapped haplotyping (LBILS 01)

Population

- APX-hardness (Gusfield 00)

- Reduction to Graph-Theoretic model and I.P. approach(Gusfield 01)

-New formulations and DiseaseDetection(L, Ravi, Rizzi, 02)

- Exactalgorithms for min-sizesolution (L,Serafini 2011)

- Heuristics(Tininini, L, Bertolazzi 2010)


The single individual haplotyping problem

The Single-IndividualHaplotyping problem


Shotgun Assembly of a Chromosome

[ Webber and Myers, 1997]

ACTGAGCCTAGAGATTTCTAGGCGTATCTATCTTACACTGCATCGATCGATCGATCGA

fragmentation

ACTGA GATTT GCCTAG CTATCTT

ATAGATA GAGATTTC TAGAAATC TGAGCCTAG

TAGAGATTTC TCCTAAAGAT CGCATAGATA

sequencing

TGAGCCTAG GATTT GCCTAG CTATCTT

ATAGATA GAGATTTCTAGAAATC ACTGA

TAGAGATTTC TCCTAAAGAT CGCATAGATA

assembly

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT


MAIN ERROR SOURCES

-Sequencing errors:

ACTGCCTGGCCAATGGAACGGACAAG

CTGGCCAAT

CATTGGAAC

AATGGAACGGA

-Contaminants


Givenerrors, the data may be inconsistent with exactly 2 haplotypes

Hence, assembler is unable to build 2 chromosomes

PROBLEM: Find and remove the errors so that the data becomes consistent with exactly 2 haplotypes


The data: a SNP matrix

ACTGAAAGCGA ACTAGAGACAGCATG

ACTGATAGC GTAGAGTCA

ACTG TCGACTAGA CATG

ACTGA CGATCCATCG TCAGC

ACTGAAA ATCGATC AGCATG

ACTGAAAGCGAACTAGAGACAGCATG

ACTGATAGCGTAGAGTCA

ACTGTCGACTAGACATG

ACTGACGATCCATCGTCAGC

ACTGAAAATCGATCAGCATG

11O

OO1

1

11

1 O


Snips 1,..,n

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

31 1 O 1 1 - - - -

4 O O1 - - - - O -

5 - - - - - - - 1 O

6 - - - - O OO1 -

Fragments 1,..,m


Snips 1,..,n

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

31 1O 1 1 - - - -

4 O O1 - - - - O -

5 - - - - - - - 1 O

6 - - - - O OO1 -

Fragments 1,..,m

Fragment conflict: can’t be on same haplotype


Snips 1,..,n

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

31 1O 1 1 - - - -

4 O O1 - - - - O -

5 - - - - - - - 1 O

6 - - - - O OO1 -

Fragments 1,..,m

Fragment conflict: can’t be on same haplotype

Fragment Conflict Graph GF(M)

1

4

We have 2 haplotypes iff GF is BIPARTITE

5

2

6

3


Snips 1,..,n

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

31 1O 1 1 - - - -

4 O O1 - - - - O -

5 - - - - - - - 1 O

6 - - - - O OO1 -

Fragments 1,..,m

PROBLEM (Fragment Removal): make GF Bipartite

1

4

5

2

6

3


Snips 1,..,n

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

31 1 O 1 1 - - - -

4 O O1 - - - - O -

5 - - - - - - - 1 O

6 - - - - O OO1 -

Fragments 1,..,m

PROBLEM (Fragment Removal): make GF Bipartite

1 2 3 4 5 6 7 8 9

1 - - - O 1 1 O O -

2 - O - O 1 - - - 1

4 O O1 - - - - O -

31 1 O 1 1 - - - -

5 - - - - - - - 1 O

1

4

5

2

O O1 O 1 1 O O1

6

3

1 1 O 1 1 - - 1 O


Removing fewest fragments is equivalent

to maximum induced bipartite subgraph

NP-complete [Yannakakis, 1978a, 1978b; Lewis, 1978]

O(|V|(log log |V|/log |V|)2)-approximable [Halldórsson, 1999]

not O(|V|)-approximable for some  [Lund and Yannakakis, 1993]

Are there cases of M for which GF(M) is easier?

YES: the gapless M

---O11OO1O1O1OO1--- gapless

---O11OO---O1OO1--- gap

---O11--1O----O1--- 2 gaps


Why gaps?

Sequencingerrors (don’t call with lowconfidence)

---OO11?11--- ===> ---OO11-11---


Why gaps?

Sequencingerrors (don’t call with lowconfidence)

---OO11?11--- ===> ---OO11-11---

Celera’s mate pairs

attcgttgtagtggtagcctaaatgtcggtagaccttga

attcgttgtagtggtagcctaaatgtcggtagaccttga


THEOREM

For a gapless M, the Min Fragment Removal

Problem is Polynomial

NOTE: Does not need to be gapless. Enough if it can be

sorted to become such

(Consecutive Ones Property, Booth and Lueker, 1976)


An o nm n d p algo

3

An O(nm + n ) D.P. algo

1 - O O1 1 O O - -

2 - - 1 O 1 1 O - -

3 - - - 1 1 O - - -

4 - - - - O O1 O -

5 - - - - - 1 O 1 O


An o nm n d p algo1

3

An O(nm + n ) D.P. algo

LFT(i)

RGT(i)

1 - O O1 1 O O - -

2 - - 1 O 1 1 O - -

3 - - - 1 1 O - - -

4 - - - - O O1 O -

5 - - - - - 1 O 1 O

sort according to LFT


An o nm n d p algo2

3

An O(nm + n ) D.P. algo

LFT(i)

RGT(i)

1 - O O1 1 O O - -

2 - - 1 O 1 1 O - -

3 - - - 1 1 O - - -

4 - - - - O O1 O -

5 - - - - - 1 O 1 O

sort according to LFT

D(i;h,k) := min cost to solve up to row i, with k, h not removed and put in

different haplotypes, and maximizing RGT(k), RGT(h)

{

D(i-1; h,k) if i, k compatible and RGT(i) <= RGT(k)

or i, h compatible and RGT(i) <= RGT(h)

1 + D(i-1; h, k) otherwise

D(i; h,k) =

OPT is min h,k D( n; h, k ) and can be found in time O(nm + n^3)


WITH GAPS…..

Th: NP-Hard if 2 gaps per fragment

proof: (simple) use factthat for every G thereis M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraphon 3-regular graphs (in eachrow, max 3 non-bit, hencemax 2 gaps)


WITH GAPS…..

Th: NP-Hard if 2 gaps per fragment

proof: (simple) use factthat for every G thereis M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraphon 3-regular graphs (in eachrow, max 3 non-bit, hencemax 2 gaps)

Th: NP-Hard if even 1 gap per fragment

proof: technical. reduction from MAX2SAT


WITH GAPS…..

Th: NP-Hard if 2 gaps per fragment

proof: (simple) use factthat for every G thereis M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraphon 3-regular graphs (in eachrow, max 3 non-bit, hencemax 2 gaps)

Th: NP-Hard if even 1 gap per fragment

proof: technical. reduction from MAX2SAT

But, gaps must be long for problem to be difficult.

We have O( 2 mn + 2 n ) D.P.

for MFR on matrix with total gaps length L

2L

3L 3



What for mfr with gaps why not ilp1
What for MFR with gaps? Why not ILP...

1/2

1

0

2

5

1/3

4

3

1/4

1/2


What for mfr with gaps why not ilp2
What for MFR with gaps? Why not ILP...

1/2

1

1

1

0

2

2

5

5

2

5

1/3

4

4

3

3

4

3

1/4

1/2


What for mfr with gaps why not ilp3
What for MFR with gaps? Why not ILP...

1/2

1

1

5/12

5/12

1

0

2

2

5

5

2

5

1/3

4

4

3

3

4

3

1/4

1/2


What for mfr with gaps why not ilp4
What for MFR with gaps? Why not ILP...

1/2

1

1

5/12

5/12

1

0

2

2

5

5

2

5

1/3

4

4

3

3

4

3

1/4

1/2


What for mfr with gaps why not ilp5
What for MFR with gaps? Why not ILP...

1/2

1

1

5/12

5/12

1

0

2

2

5

5

2

5

1/3

4

4

3

3

4

3

1/4

1/2


What for mfr with gaps why not ilp6
What for MFR with gaps? Why not ILP...

1/2

1

1

5/12

5/12

1

0

2

2

5

5

2

5

1/3

4

4

3

3

4

3

1/4

1/2

Randomized rounding heuristic: round and repeat. Worked well at Celera


The fragment removal is good to get rid of contaminants.

However, we may want to keep all fragments and

correct errors otherwise

A dual point of view is to disregard some SNPs and keep

the largest subset sufficient to reconstruct the haplotypes

All fragments get assigned to one of the two haplotypes.

We describe the min SNP removal problem: remove the

fewest number of columns from M so that the fragment

graph becomes bipartite.


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1- - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

OK


SNP conflicts

- - - O 1 1O O -

- O 1 O 1 - - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

OK


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

OK


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1- - - 1

1 1 O 1 1- - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

CONFLICT !


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

CONFLICT !


SNP conflicts

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1 O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

SNP conflict graph GS(M)

1 node for each SNP (column)

edge between conflicting SNPs


SNP conflicts

1 2 3 4 5 6 7 8 9

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -


SNP conflicts

1 2 3 4 5 6 7 8 9

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

1

4

8

2

5

7

3

6

9


SNP conflicts

1 2 3 4 5 6 7 8 9

- - - O 1 1 O O -

- O 1 O 1 - - - 1

1 1O 1 1 - - - -

O O1 - - - O O -

- - - - - - 1 1 O

- - - - O OO1 -

1

4

8

2

5

7

3

6

9


THEOREM 1

For a gapless M, GF(M) is bipartite

if and only if GS(M) is an independent set

THEOREM 2

For a gapless M, GS(M) is a perfect graph

COROLLARY

For a gapless M, the min SNP removal

problem is polynomial


THEOREM 1

For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--OO11OO---------

----OO1OO1O11O---

--------11O1O111-

----11OO1O11O----

-------1OOO1-----

------11111O-----

--11O11O1OO------

Assume M gapless, GS(M) an independent set, but GF(M)

not bipartite.

Take an odd cycle in GF


THEOREM 1

For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?1???---------

----O????????O---

--------??O??1??-

----??????1??----

-------???O?-----

------????1?-----

--1???????O------

There is a generic structure of hor-vert cycle


THEOREM 1

For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?1???---------

----O????????O---

--------??O??1??-

----??????1??----

-------???O?-----

------????1?-----

--1???????O------

“vertical lines”

There cannot be only one vertical line in odd cycle

We merge rightmost and next to reduce them by 1

Hence, there cannot be a minimal (in n. of vertical lines) counterexample


THEOREM 1

For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

Must be 1

PROOF (sketch): by minimal counterexample

--O?1???---------

----O????????O---

--------??O??1??-

----??????1??----

-------???O?-----

------????1?-----

--1???????O------

“vertical lines”


THEOREM 1

For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

Must be 1

PROOF (sketch): by minimal counterexample

--O?1???---------

----O?????1??O---

--------??O??1??-

----??????1??----

-------???O?-----

------????1?-----

--1???????O------

“vertical lines”

Merge the rightmost lines


THEOREM 1

For a gapless M, GF(M) is bipartite if and onlyif

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?1???---------

----O?????1------

--------??O------

----??????1------

-------???O------

------????1------

--1???????O------

“vertical lines”

Merge the rightmost lines

Still a counterexample!


Note: Theorem not true if there are gaps

1 2 3

1 O - O

2 - O 1

31 1-

M

1

1

2

3

2

3

GF(M)

GS(M)


THEOREM 2

For a gapless M, GS(M) is a perfect graph

PROOF: GS(M) is the complement of a comparability graph A

Comparability graphs are perfect

Comparability Graphs: unoriented that can be oriented

to become a partial order


LEMMA: If i<j<k and (i,k) is a SNP conflict then

either (i,k) or (j,k) is also a SNP conflict

i j k

- 1O O ? 1 O 1-

- O1 O ? 1 1 1 -

O

O

O

1

Equal:conflicts with i

Different:conflicts with k

I.e. if (i,j) is not a conflict and (j,k) is not a conflict, also (i,k) is not a conflict

i

j

k

So (u,v) with u < v and u not a conflict with v is a comparability graph A

and GS is A complement

NOTE: ind set on perfect graph is in P (Lovasz, Schrijvers, Groetschel, 84)


Hence gapless MSR is polynomial (max stable set on perfect graph).

There are better, D.P., algorithms, O(mn + m^2)

What if gaps ?

THEOREM: The min SNP removal is NP-hard if there

can be gaps (Reduction from MAXCUT)

Again, gaps must be long for problem to be difficult.

We have O(mn + n ) D.P.

for MSR on matrix with total gaps length L

2L + 1

2L + 2


ad