1 / 43

Graph algorithm in NMR backbone assignment

Jia-Ming Chang 0509 Graph Algorithms and Their Applications to Bioinformatics. Graph algorithm in NMR backbone assignment. Determine Protein Structure. X-ray 波長約 1 Å  長度接近原子間的距離  研究結晶的狀態的分子行為  定出其晶體結構,也包含蛋白質體結構 X-ray 與結構生物學  利用 X-ray 繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。

zonta
Download Presentation

Graph algorithm in NMR backbone assignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jia-Ming Chang 0509 Graph Algorithms and Their Applications to Bioinformatics Graph algorithm in NMR backbone assignment

  2. Determine Protein Structure • X-ray • 波長約1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構 • X-ray與結構生物學  利用X-ray繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。 • Nuclear magnetic resonance (NMR) • NMR是涉及原子核吸收的過程。因為對某些原子核而言,具有自旋和磁矩的性質。因此,若暴露於強磁場中原子核會吸收電磁輻射,這是由磁場誘導而發生能階分裂的結果。科學家並發現,分子環境會影響在磁場中原子核的無線電波的吸收,利用這種特性來分析分子的結構 AVANCE 800 AV IBMS, Sinica

  3. NMR – Nuclear Spin (1/5)

  4. NMR – Nuclear Spin (2/5)

  5. NMR - Magnetic Field (3/5)

  6. NMR – Resonance (4/5)

  7. NMR – Chemical Shift (5/5)

  8. Cd H3 Cg H2 Cb H2 Ca N CO H H Chemical Shift Assignment (1/2) Find out Chemical Shift for Each Atom • Backbone: Ca, Cb, C’, N, NH • HSQC, CBCANH, CBCACONH One amino acid

  9. Chemical Shift Assignment (2/2) 18-23 55-60 17-23 30-35 16-20 31-34 19-24 ppm CH3 CH3 O H H H H-C-H O Backbone -N-C-C-N-C-C-N-C-C-N-C-C- H-C-H H H-C-H H O O H O H

  10. HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid) HSQC

  11. CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino acid)

  12. CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid) Ca (+), Cb (-) - - + +

  13. A Dataset Example H • HSQC • HNCACB • CBCA(CO)NH N

  14. A Perfect Spin System Group CBCA(CO)NH i -1 i -1 CBCANH Ca Ca Cb Cb

  15. Coding • Translate the target protein sequence and spin systems into coding sequences based on the following table. Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.

  16. Backbone Assignment • Goal • Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. • General approaches • Generate spin systems • A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). • Link spin systems

  17. Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS

  18. Blind Men’s Elephant • We cannot directly “see” the positions of these atoms (the 3D structure) • But we can measure a set of parameters (with constraints) on these atoms, which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together

  19. A Peculiar Parking Lot (valet parking) Information you have:The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars as possible (maximizing the overall satisfaction).

  20. Ambiguities • All 4 point experiments are mixed together • All 2 point experiments are mixed together • Each spin system can be mapped to several amino acids in the protein sequence • False positives, false negatives

  21. Multiple Candidates One spin system maybe assign to many places of a protein sequence. Spin system(SS) Protein Sequence: AKFERQHMDSSTSRNLTKDR Possible place SS SS SS SS

  22. False Positives and False Negatives • False positives • Noise with high intensity • Produce fake spin systems • False negatives • Peaks with low intensity • Missing peaks • In real wet-lab data, nearly 50% are noises (false positive).

  23. False Positive & False Negative Perfect False Negative H False Positive • HSQC • HNCACB • CBCA(CO)NH N

  24. Ambiguous Spin System Two possible spin systems

  25. Spin System Group • Nearest Neighboring (TATAPRO, RIBRA, GASA) H • HSQC • HNCACB • CBCA(CO)NH N

  26. Spin System Linking • Goal • Link spin system as long as possible. • Constraints • Each spin system is uniquely assigned to a position of the target protein sequence. • Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

  27. Legal matching Previous Approaches • Constrained bipartite matching problem* • Can’t deal with ambiguous link Illegal matching under constraints *Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.

  28. Naatural Language Processing ─ Noises or Ambiguity ? • Speech recognition:Homopone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

  29. An Error-Tolerant Algorithm

  30. Phrase, Sentence Combination

  31. Spin System Positioning • We assign spin system groups to a protein sequence according to their codes. D 50 G 10 R 40 I 50|51 55.26638.67544.5550 Spin System 44.417055.04330.04 55.26638.67544.5550 => 50 10 44.417055.04330.04 =>10 40 44.417030.66528.72 44.417030.66528.72 =>10 40 5535629.78260.04437.541 5535629.78260.04437.541 => 40 50

  32. Segment 1 Segment 2 Segment 3 Link Spin System groups D G R I 44.417030.66528.72 55.26638.67544.5550 44.417055.04330.04 5535629.78260.04437.541

  33. Step1 1 1 … 2 56 2 47 Step2 … Segment 1 Segment 31 Segment 2 Step n-1 Step n … Segment 78 Segment 99 Segment 79 Iterative Concatenation DGRI….FKJJREKL 1 Spin Systems 2 …. 56 ….

  34. Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 79 Segment 71 Segment 97 Segment 99 Segment 98 • Two kinds of conflict segments • Overlap (e.g. segment 71, segment 99) • Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1)

  35. Independent Set Subset S of vertices such that no two vertices in S are connected www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt

  36. Independent Set Subset S of vertices such that no two vertices in S are connected www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt

  37. A Graph Model for Spin System Linking G(V,E) V: a set of nodes (segments). E:(u, v), u, v V,u and v are conflict. Goal Assign as many non-conflict segments as possible => find the maximum independent set of G.

  38. SP13 Seg2 Overlap Overlap SP15 Seg4 Seg1 Seg3 Seg4 Seg2 An Example of G Seg1 Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg3 • Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE

  39. Segment weight • The larger length of segment is, the higher weight of segment is. • The less frequency of segment is, the higher of segment is.

  40. Find Maximum Weight Independent Set of G (1/2) Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2). V N(v) Head_N(v)

  41. Find Maximum Weight Independent Set of G (2/2) Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2). V I1 I2

  42. An Iterative Approach • We perform spin system generation and linking iteratively. • Three stages. • Perfect spin systems • Weak false negative spin systems • Severe false negative spin systems

  43. 97 78 77 99 97‘ 71 99‘ 77 99‘ 97‘ Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS 97 23 99 24 26 45 28 27 31 28 29 32 33 MaxIndSet

More Related