# An Iterative Relaxation Technique for the NMR Backbone Assignment Problem - PowerPoint PPT Presentation

1 / 62

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem. Wen-Lian Hsu Institute of Information Science Academia Sinica. Characteristics of Our Method. Model this as a constraint satisfaction problem Solve it using natural language parsing techniques

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## An Iterative Relaxation Technique for the NMR Backbone Assignment Problem

Wen-Lian Hsu

Institute of Information Science

### Characteristics of Our Method

• Model this as a constraint satisfaction problem

• Solve it using natural language parsing techniques

• Both top-down and bottom-up

• An iterative approach

• Create spin systems based on noisy data.

• Link spin systems by using maximum independent set finding techniques.

### Outline

• Introduction

• Method

• Experiment Results

• Conclusion

### Blind Man’s Elephant

• We cannot directly “see” the positions of these atoms (the structure)

• But we can measure a set of parameters (with constraints) on these atoms

• Which can help us infer their coordinates

Each experiment can only determine

a subset of parameters (with noises)

To combine the parameters of different

experiments we need to stitch them together

### The Flow of NMR Experiments

Calculation and

simulation

- Energy minimization

- Fitness of structure

constraints

Get protein

Samples

Collect NMR spectra

Resonance assignment

Structure Constraints

### Chemical Shift Assignment

Find out Chemical Shift for Each Atom

• Backbone atoms: Ca, Cb, C’, N, NH

• Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

• Side chain: all others (especially CHs)

• TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY

Cd

H3

Cg

H2

One amino acid

Cb

H2

Ca

N

CO

H

H

18-23

55-60

17-23

30-35

16-20

31-34

19-24

### Some Relevant Parameters

ppm

CH3

CH3

O

H

H

H

H-C-H

O

Backbone

-N-C-C-N-C-C-N-C-C-N-C-C-

H-C-H

H

H-C-H

H

O

O

H

O

H

HSQC

### Three important experiments

• Backbone: Ca, Cb,C’,N,NH

• HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

• sequential assignment

• chemical shifts of Ca, Cb, NH

### Our NMR spectra

CBCA(CO)NH

CBCANH

• HSQC

• CBCA(CO)NH (2 peaks)

• HNCACB (4 peaks)

HSQC

### HSQC Spectra

• HSQC peaks (1 chemical shifts for an amino acid)

### CBCA(CO)NH Spectra

• CBCA(CO)NH peaks (2 chemical shifts for one amino acid)

-

-

+

+

### CBCANH Spectra

• CBCANH peaks (4 chemical shifts for one amino acid)

• Ca (+), Cb (-)

H

N

• HSQC

• HNCACB 4

• CBCA(CO)NH 2

### Backbone Assignment

• Goal

• Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone.

• General approaches

• Generate spin systems

• A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).

### Ambiguities

• All 4 point experiments are mixed together

• All 2 point experiments are mixed together

• Each spin system can be mapped to several amino acids in the protein sequence

• False positives, false negatives

Legal matching

Illegal matching under constraints

### Previous Approaches

• Constrained bipartite matching problem

• The spin system might be ambiguous

• Can’t deal with ambiguous link

### Natural Language Processing ─ Signal or Noise?

• Speech recognition：Homophone selection

An Error-Tolerant Algorithm

Phrase, Sentence Combination

Hierarchical Analysis

### Perfect Group

• Each spin group contains 6 points, in which

• 4 points are from the first experiments

• 2 points are from the second experiment

H

O

a

H

C

a

C

b

N

C

C

b

H

C

H

O

a

H

C

C

a

b

N

C

C

b

H

C

H

H

O

O

a

H

a

H

C

C

C

a

a

C

b

N

b

C

N

C

C

C

b

b

H

H

C

C

### Perfect Group

• Each spin group contains 6 points, in which

• 4 points are from the first experiments

• 2 points are from the second experiment

H

O

a

H

C

C

a

b

N

C

C

b

H

C

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Ca

Cb

Cb

### False Positives and False Negatives

• False positives

• Noise with high intensity

• Produce fake spin systems

• False negatives

• Peaks with low intensity

• Missing peaks

• In real wet-lab data, nearly 50% are noises (false positive).

Perfect

H

False Negative

False Positive

N

### Outline

• Introduction

• Method

• Experiment Results

• Conclusion

### Main Idea

• Deal with false negative in spin system generation procedures.

• Eliminate false positive in spin system linking procedures.

• Perform spin system generation and linking procedures in an iterative fashion.

### Spin System Group Generation

• Three types of spin system group are generated based on the quality of CBCANH data:

• Perfect

• Weak false negative

• Severe false negative

### Perfect Spin Systems

• A spin system is determined without any added pseudo peak.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Ca

Cb

Cb

### Weak False Negative Spin System Group

• A spin system is determined with an added pseudo peak.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Cb

Cb

115.481 9.604 60.044 1.30407e+008

Ca

### Severe false Negative Spin System Group

• A spin system is determined with two added pseudo peaks.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Note: it is also possible thatCai-1 = 28.166 and Cbi-1 = 59.419

Cb

119.857 8.435 28.166 3.36293e+007

119.857 8.435 59.419 1.56434e+008

Cb

Ca

### A note on spin system generation

• To generate *ALL* possible spin systems, a peak can be included in more than one spin system.

• False positives are eliminated in spin system linking procedure.

• False negative are treated by adding pseudo peaks.

• A rule-based mechanism is used to filter out incompatible spin systems (false positives).

• Adopt maximum weight independent set algorithm

• Goal

• Link spin system as long as possible.

• Constraints

• Each spin system is uniquely assigned to a position of the target protein sequence.

• Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

### A Peculiar Parking Lot (valet parking)

Information you have: The make of your car, the car parked in front of you (approximately).Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).

### Backbone Assignment

DGRIGEIKGRKTLATPAVRRLAMENNIKLS

### Spin System Positioning

• We assign spin system groups to a protein sequence according to their codes.

D 50

G 10

R 40

I 50|51

55.26638.67544.5550

Spin System

44.417055.04330.04

55.26638.67544.5550 => 50 10

44.417055.04330.04 =>10 40

44.417030.66528.72

44.417030.66528.72 =>10 40

5535629.78260.04437.541

5535629.78260.04437.541 => 40 50

Segment 1

Segment 2

Segment 3

D

G

R

I

44.417030.66528.72

55.26638.67544.5550

44.417055.04330.04

5535629.78260.04437.541

Step1

1

1

2

56

47

Step2

Segment 1

Segment 31

Segment 2

Step n-1

Segment 78

Segment 79

DGRI….FKJJREKL

1

Spin Systems

2

….

56

….

Step n

Segment 99

### Conflict Segments

DGRIGEIKGRKTLATPAVRRLAMENNIKLS

Segment 78

Segment 79

Segment 71

Segment 97

Segment 99

Segment 98

• Two kinds of conflict segments

• Overlap (e.g. segment 71, segment 99)

• Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )

### A Graph Model for Spin System Linking

• G(V,E)

• V: a set of nodes (segments).

• E:(u, v), u, v V,u and v are conflict.

• Goal

• Assign as many non-conflict segments as possible => find the maximum independent set of G.

SP13

Seg2

Overlap

Overlap

SP15

Seg4

Seg1

Seg3

Seg4

Seg2

### An Example of G

Seg1

Segment1: SP12->SP13->SP14

Segment2: SP9->SP13->SP20->SP4

Segment3: SP8->SP15->SP21

Segment4: SP7->SP1->SP15->SP3

Seg3

• Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE

### Segment weight

• The larger length of segment is, the higher weight of segment is.

• The less frequency of segment is, the higher of segment is.

### Find Maximum Weight Independent Set of G

• Boppana, R. and M.M. Halldόrsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, 1992. 32(2).

### An Iterative Approach

• We perform spin system generation and linking iteratively.

• Three stages.

### First Stage

• Generate perfect spin systems;

• Perform spin system concatenation on spin systems (newly generated perfect) to generate segments;

• Retain segments that contain at least 3 spin systems;

• Perform MaxIndSet on the segments;

• Drop spin systems (and related peaks) that are used in the resulting segments.

### Second Stage

• Generate weak false negative spin systems.

• Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative);

• Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments;

• Retain segments that contain at least 3 spin systems;

• Perform MaxIndSet on the segments;

• Drop spin systems (and related peaks) that are used in the resulting segments.

### Third Stage

• Generate severe false negative spin systems.

• Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative);

• Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments;

• Retain segments that contain at least 3 spin systems;

• Perform MaxIndSet on the segments.

12

29

109

29

….FKJJREKL….

109

New spin systems

1

2

….

45

New 109

97

78

77

99

97‘

71

99‘

77

99‘

97‘

### Segment Extension

DGRGEKGRKTLATPAVRRLAMENNIKLS

97

23

99

24

26

45

28

27

31

28

29

32

33

MaxIndSet

### Outline

• Introduction

• Method

• Experimental Results

• Conclusion

### Experimental Results

• Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica:

• Average precision: 87.5%

• Average recall: 73.1%

• Perfect data from BMRB: 99.1%

### Real Wet-Lab Datasets

• The two datasets are obtained from our collaborator Dr. Tai-Huang, Huang in IBMS at Academia Sinica, Taiwan.

### Outline

• Introduction

• Method

• Experiment Results

• Conclusion

### Conclusion

• We model the backbone assignment problem as a constraint satisfaction problem

• This problem is solved using a natural language parsing technique (both bottom-up and top-down approach)

• The same approach seem to work for a large class of noise reduction problems that are discrete in nature

### A genetic algorithm for NMR backbone resonance assignment (I)

• Randomly generate a population of chromosomes

• Each chromosome represents a possible backbone resonance assignment

• Fitness function

• Evaluate the fitness of each chromosome according to the connectivity between adjacent amino acids

### A genetic algorithm for NMR backbone resonance assignment (II)

• Crossover operation

• An offspring inherits different connected blocks from parents

• Mutation operation

• Make a new connected block from any position to increase the popular diversity

### Generation of a random chromosome

• Step1. Randomly select a position x

• Step2. Randomly select a SSGroup i from CL(x)

• Step3. Extend connected fragments from i to both sides by using adjacency lists until no more extension can be found.

• Step4. Repeat Step1~Step3 until all positions are assigned.

### Fitness Evaluation

Building Blocks: connected fragments

Fitness(ch) = The number of connected pairs associate with

their chemical shift differences.

Two principles:

1. The more connected pairs it has, the higher score it gets.

2. The less chemical shift differences it has, the higher score it gets.

cutting site

parents

offspring

### Mutation operation

• Once a position is going to mutate, the following positions will also mutate to produce a connected fragments.

Mutation point

### Experiment Results

• The accuracy on two real dataset

• SBD:95.1% (FP: 67%)

• LBD:100% (FP: 48%)

• The average accuracy on perfect BMRB datasets (902 proteins)