An iterative relaxation technique for the nmr backbone assignment problem
This presentation is the property of its rightful owner.
Sponsored Links
1 / 62

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem PowerPoint PPT Presentation


  • 34 Views
  • Uploaded on
  • Presentation posted in: General

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem. Wen-Lian Hsu Institute of Information Science Academia Sinica. Characteristics of Our Method. Model this as a constraint satisfaction problem Solve it using natural language parsing techniques

Download Presentation

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An iterative relaxation technique for the nmr backbone assignment problem

An Iterative Relaxation Technique for the NMR Backbone Assignment Problem

Wen-Lian Hsu

Institute of Information Science

Academia Sinica


Characteristics of our method

Characteristics of Our Method

  • Model this as a constraint satisfaction problem

  • Solve it using natural language parsing techniques

    • Both top-down and bottom-up

  • An iterative approach

    • Create spin systems based on noisy data.

    • Link spin systems by using maximum independent set finding techniques.


Outline

Outline

  • Introduction

  • Method

  • Experiment Results

  • Conclusion


Blind man s elephant

Blind Man’s Elephant

  • We cannot directly “see” the positions of these atoms (the structure)

  • But we can measure a set of parameters (with constraints) on these atoms

    • Which can help us infer their coordinates

Each experiment can only determine

a subset of parameters (with noises)

To combine the parameters of different

experiments we need to stitch them together


The flow of nmr experiments

The Flow of NMR Experiments

Calculation and

simulation

- Energy minimization

- Fitness of structure

constraints

Get protein

Samples

Collect NMR spectra

Resonance assignment

Structure Constraints


Chemical shift assignment

Chemical Shift Assignment

Find out Chemical Shift for Each Atom

  • Backbone atoms: Ca, Cb, C’, N, NH

  • Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

  • Side chain: all others (especially CHs)

  • TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY

Cd

H3

Cg

H2

One amino acid

Cb

H2

Ca

N

CO

H

H


Some relevant parameters

18-23

55-60

17-23

30-35

16-20

31-34

19-24

Some Relevant Parameters

ppm

CH3

CH3

O

H

H

H

H-C-H

O

Backbone

-N-C-C-N-C-C-N-C-C-N-C-C-

H-C-H

H

H-C-H

H

O

O

H

O

H


Three important experiments

HSQC

Three important experiments

  • Backbone: Ca, Cb,C’,N,NH

  • HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

  • sequential assignment

  • chemical shifts of Ca, Cb, NH


Our nmr spectra

Our NMR spectra

CBCA(CO)NH

CBCANH

  • HSQC

  • CBCA(CO)NH (2 peaks)

  • HNCACB (4 peaks)


Hsqc spectra

HSQC

HSQC Spectra

  • HSQC peaks (1 chemical shifts for an amino acid)


Cbca co nh spectra

CBCA(CO)NH Spectra

  • CBCA(CO)NH peaks (2 chemical shifts for one amino acid)


Cbcanh spectra

-

-

+

+

CBCANH Spectra

  • CBCANH peaks (4 chemical shifts for one amino acid)

    • Ca (+), Cb (-)


A dataset example

H

N

A Dataset Example

  • HSQC

  • HNCACB 4

  • CBCA(CO)NH 2


Backbone assignment

Backbone Assignment

  • Goal

    • Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone.

  • General approaches

    • Generate spin systems

      • A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).

    • Link spin systems


Ambiguities

Ambiguities

  • All 4 point experiments are mixed together

  • All 2 point experiments are mixed together

  • Each spin system can be mapped to several amino acids in the protein sequence

  • False positives, false negatives


Previous approaches

Legal matching

Illegal matching under constraints

Previous Approaches

  • Constrained bipartite matching problem

    • The spin system might be ambiguous

    • Can’t deal with ambiguous link


Natural language processing signal or noise

Natural Language Processing ─ Signal or Noise?

  • Speech recognition:Homophone selection

台 北 市 一 位 小 孩 走 失 了

台 北 市 小 孩

台 北

適 宜 走 失

事 宜

一 位

一 味

移 位


An iterative relaxation technique for the nmr backbone assignment problem

An Error-Tolerant Algorithm


An iterative relaxation technique for the nmr backbone assignment problem

Phrase, Sentence Combination


An iterative relaxation technique for the nmr backbone assignment problem

Hierarchical Analysis

句意模版

句型模版

片語模版

字詞模版


Perfect group

Perfect Group

  • Each spin group contains 6 points, in which

    • 4 points are from the first experiments

    • 2 points are from the second experiment

H

O

a

H

C

a

C

b

N

C

C

b

H

C

H

O

a

H

C

C

a

b

N

C

C

b

H

C


Perfect group1

H

H

O

O

a

H

a

H

C

C

C

a

a

C

b

N

b

C

N

C

C

C

b

b

H

H

C

C

Perfect Group

  • Each spin group contains 6 points, in which

    • 4 points are from the first experiments

    • 2 points are from the second experiment

H

O

a

H

C

C

a

b

N

C

C

b

H

C


A perfect spin system group

A Perfect Spin System Group

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Ca

Cb

Cb


False positives and false negatives

False Positives and False Negatives

  • False positives

    • Noise with high intensity

    • Produce fake spin systems

  • False negatives

    • Peaks with low intensity

    • Missing peaks

  • In real wet-lab data, nearly 50% are noises (false positive).


Spin system group

Perfect

H

False Negative

False Positive

N

Spin System Group


Outline1

Outline

  • Introduction

  • Method

  • Experiment Results

  • Conclusion


Main idea

Main Idea

  • Deal with false negative in spin system generation procedures.

  • Eliminate false positive in spin system linking procedures.

  • Perform spin system generation and linking procedures in an iterative fashion.


Spin system group generation

Spin System Group Generation

  • Three types of spin system group are generated based on the quality of CBCANH data:

    • Perfect

    • Weak false negative

    • Severe false negative


Perfect spin systems

Perfect Spin Systems

  • A spin system is determined without any added pseudo peak.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Ca

Cb

Cb


Weak false negative spin system group

Weak False Negative Spin System Group

  • A spin system is determined with an added pseudo peak.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Cb

Cb

115.481 9.604 60.044 1.30407e+008

Ca


Severe false negative spin system group

Severe false Negative Spin System Group

  • A spin system is determined with two added pseudo peaks.

CBCA(CO)NH

i -1

i -1

CBCANH

Ca

Note: it is also possible thatCai-1 = 28.166 and Cbi-1 = 59.419

Cb

119.857 8.435 28.166 3.36293e+007

119.857 8.435 59.419 1.56434e+008

Cb

Ca


A note on spin system generation

A note on spin system generation

  • To generate *ALL* possible spin systems, a peak can be included in more than one spin system.

    • False positives are eliminated in spin system linking procedure.

    • False negative are treated by adding pseudo peaks.

  • A rule-based mechanism is used to filter out incompatible spin systems (false positives).

    • Adopt maximum weight independent set algorithm


Spin system linking

Spin System Linking

  • Goal

    • Link spin system as long as possible.

  • Constraints

    • Each spin system is uniquely assigned to a position of the target protein sequence.

    • Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.


A peculiar parking lot valet parking

A Peculiar Parking Lot (valet parking)

Information you have: The make of your car, the car parked in front of you (approximately).Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).


Backbone assignment1

Backbone Assignment

DGRIGEIKGRKTLATPAVRRLAMENNIKLS


Spin system positioning

Spin System Positioning

  • We assign spin system groups to a protein sequence according to their codes.

D 50

G 10

R 40

I 50|51

55.26638.67544.5550

Spin System

44.417055.04330.04

55.26638.67544.5550 => 50 10

44.417055.04330.04 =>10 40

44.417030.66528.72

44.417030.66528.72 =>10 40

5535629.78260.04437.541

5535629.78260.04437.541 => 40 50


Link spin system groups

Segment 1

Segment 2

Segment 3

Link Spin System groups

D

G

R

I

44.417030.66528.72

55.26638.67544.5550

44.417055.04330.04

5535629.78260.04437.541


Iterative concatenation

Step1

1

1

2

56

47

Step2

Segment 1

Segment 31

Segment 2

Step n-1

Segment 78

Segment 79

Iterative Concatenation

DGRI….FKJJREKL

1

Spin Systems

2

….

56

….

Step n

Segment 99


Conflict segments

Conflict Segments

DGRIGEIKGRKTLATPAVRRLAMENNIKLS

Segment 78

Segment 79

Segment 71

Segment 97

Segment 99

Segment 98

  • Two kinds of conflict segments

    • Overlap (e.g. segment 71, segment 99)

    • Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )


A graph model for spin system linking

A Graph Model for Spin System Linking

  • G(V,E)

    • V: a set of nodes (segments).

    • E:(u, v), u, v V,u and v are conflict.

  • Goal

    • Assign as many non-conflict segments as possible => find the maximum independent set of G.


An example of g

SP13

Seg2

Overlap

Overlap

SP15

Seg4

Seg1

Seg3

Seg4

Seg2

An Example of G

Seg1

Segment1: SP12->SP13->SP14

Segment2: SP9->SP13->SP20->SP4

Segment3: SP8->SP15->SP21

Segment4: SP7->SP1->SP15->SP3

Seg3

  • Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE


Segment weight

Segment weight

  • The larger length of segment is, the higher weight of segment is.

  • The less frequency of segment is, the higher of segment is.


Find maximum weight independent set of g

Find Maximum Weight Independent Set of G

  • Boppana, R. and M.M. Halldόrsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, 1992. 32(2).


An iterative approach

An Iterative Approach

  • We perform spin system generation and linking iteratively.

  • Three stages.


First stage

First Stage

  • Generate perfect spin systems;

    • Perform spin system concatenation on spin systems (newly generated perfect) to generate segments;

    • Retain segments that contain at least 3 spin systems;

    • Perform MaxIndSet on the segments;

    • Drop spin systems (and related peaks) that are used in the resulting segments.


Second stage

Second Stage

  • Generate weak false negative spin systems.

    • Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative);

    • Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments;

    • Retain segments that contain at least 3 spin systems;

    • Perform MaxIndSet on the segments;

    • Drop spin systems (and related peaks) that are used in the resulting segments.


Third stage

Third Stage

  • Generate severe false negative spin systems.

    • Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative);

    • Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments;

    • Retain segments that contain at least 3 spin systems;

    • Perform MaxIndSet on the segments.


Segment extension

12

29

109

29

Segment Extension

….FKJJREKL….

109

New spin systems

1

2

….

45

New 109


Segment extension1

97

78

77

99

97‘

71

99‘

77

99‘

97‘

Segment Extension

DGRGEKGRKTLATPAVRRLAMENNIKLS

97

23

99

24

26

45

28

27

31

28

29

32

33

MaxIndSet


Outline2

Outline

  • Introduction

  • Method

  • Experimental Results

  • Conclusion


Experimental results

Experimental Results

  • Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica:

    • Average precision: 87.5%

    • Average recall: 73.1%

  • Perfect data from BMRB: 99.1%


Real wet lab datasets

Real Wet-Lab Datasets

  • The two datasets are obtained from our collaborator Dr. Tai-Huang, Huang in IBMS at Academia Sinica, Taiwan.


Experimental results on real data

Experimental Results on Real Data


Outline3

Outline

  • Introduction

  • Method

  • Experiment Results

  • Conclusion


Conclusion

Conclusion

  • We model the backbone assignment problem as a constraint satisfaction problem

  • This problem is solved using a natural language parsing technique (both bottom-up and top-down approach)

  • The same approach seem to work for a large class of noise reduction problems that are discrete in nature


A genetic algorithm for nmr backbone resonance assignment i

A genetic algorithm for NMR backbone resonance assignment (I)

  • Randomly generate a population of chromosomes

    • Each chromosome represents a possible backbone resonance assignment

  • Fitness function

    • Evaluate the fitness of each chromosome according to the connectivity between adjacent amino acids


A genetic algorithm for nmr backbone resonance assignment ii

A genetic algorithm for NMR backbone resonance assignment (II)

  • Crossover operation

    • An offspring inherits different connected blocks from parents

  • Mutation operation

    • Make a new connected block from any position to increase the popular diversity


Generation of a random chromosome

Generation of a random chromosome

  • Step1. Randomly select a position x

  • Step2. Randomly select a SSGroup i from CL(x)

  • Step3. Extend connected fragments from i to both sides by using adjacency lists until no more extension can be found.

  • Step4. Repeat Step1~Step3 until all positions are assigned.


Fitness evaluation

Fitness Evaluation

Building Blocks: connected fragments

Fitness(ch) = The number of connected pairs associate with

their chemical shift differences.

Two principles:

1. The more connected pairs it has, the higher score it gets.

2. The less chemical shift differences it has, the higher score it gets.


Crossover operation

Crossover Operation

cutting site

parents

offspring


Mutation operation

Mutation operation

  • Once a position is going to mutate, the following positions will also mutate to produce a connected fragments.

Mutation point


Experiment results

Experiment Results

  • The accuracy on two real dataset

    • SBD:95.1% (FP: 67%)

    • LBD:100% (FP: 48%)

  • The average accuracy on perfect BMRB datasets (902 proteins)


  • Login