1 / 95

Universal DNA Tag Systems: A combinational design scheme

Universal DNA Tag Systems: A combinational design scheme. Yao-lin Chang,Chi-hung Tsai, Han-yu Chuang,Yu-cheng Huang, Bo-j Chen. Motivation from biology need. By Yao-lin Chang. cDNA microarray. DNA Tag/AntiTag System (TAT). Reporter Advantages

vera
Download Presentation

Universal DNA Tag Systems: A combinational design scheme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universal DNA Tag Systems:A combinational design scheme Yao-lin Chang,Chi-hung Tsai, Han-yu Chuang,Yu-cheng Huang, Bo-j Chen

  2. Motivation from biology need By Yao-lin Chang

  3. cDNA microarray

  4. DNA Tag/AntiTag System(TAT) Reporter • Advantages • These unversal components can be mass-produced. (reducing the manufacturing costs) AntiTag Tag Target-specific Part

  5. ExampleGenotyping ATTGGCTATTGCCCATCGGGAA Given: The positions of SNPs (e.g. red) Goal: Determine the variants present in a given sample. (e.g. GTG)

  6. Step 1 CGATAACGGGTAGCCCTT TAG1 ATTGGCTATTGCCCATCGGGAA ACGGGTAGCCCTT TAG2 ATTGGCTATTGCCCATCGGGAA CTT TAG3 ATTGGCTATTGCCCATCGGGAA

  7. Step 2 CGATAACGGGTAGCCCTT TAG1 G ATTGGCTATTGCCCATCGGGAA A A ACGGGTAGCCCTT TAG2 T G ATTGGCTATTGCCCATCGGGAA C T C CTT TAG3 ATTGGCTATTGCCCATCGGGAA

  8. Step 3 C CGATAACGGGTAGCCCTT TAG1 ATTGGCTATTGCCCATCGGGAA A ACGGGTAGCCCTT TAG2 ATTGGCTATTGCCCATCGGGAA C CTT TAG3 ATTGGCTATTGCCCATCGGGAA

  9. Step 4 C CGATAACGGGTAGCCCTT TAG1 ANTITAG1 A ACGGGTAGCCCTT TAG2 ANTITAG2 C CTT TAG3 ANTITAG3

  10. Design problem • It is desirable to have as many tags as possible. • If too many tags are used, cross-hybridization may happened. Tag1 AntiTag1 A C T A G A T C A A T G A T C

  11. Previous Work • Demonstration of a word design strategy for DNA computing on surfaces. Nucleic Acid Research, 1997 • Methods for sorting polynucleotides using oligonucleotide tags. US Patent, 1997. • Universal DNA microarray method for multiplex detection of low abundance point mutations. J. Mol. Bio., 1999.

  12. A simple thermodynamic model of DNA duplex formation

  13. Thermodynamic model(1) • DNA duplexes are held together by weak H-bonds between W-C complementary nucleotides. • The energy requires to melt DNA duplex is dependent on • Strand length • C-G contant (C G v.s. A T )

  14. Thermodynamic model(2) • Melting temperature • At t, half of U and V will be in a single-stranded form, and half will occur in duplexes. • TAT System

  15. TAG1 TAG1 TAG1 Anti-2 Anti-2 Anti-2 TAG1 Anti-1 Anti-1 Anti-1 TAG1 TAG1

  16. Thermodynamic model(3) • Melting temperature estimation: 2-4 rule (commonly used for short oligo-nucleotides) - approximately twice the number of A-T pairs plus four times the number of C-G pairs.

  17. Combinatorial Tag Design Problem • Given c and h, we call a set T of tags a valid c-h code if • Each tag has weight of h or more • Any substring of weight c or more occurs at most once. The weight w(s) of a tag is , where w(A)=w(T)=1, w(C) =w(G) =2.

  18. Combinatorial Tag Design Problem(cont.) • A valid c-h code corresponds to a solution of the TAT design problem. • Our goal is to find the maximum valid c-h code(i.e. a set that contains maximum tags) based on given c and h.

  19. TAG3 If weight of Substring = c Tag’s weight at least h 在此區間,不可能有 任何 substring 的結 合存在,且溫度不足 以破壞任何 Tag 與 其 AntiTag 之結合 c-1 TAG1 Anti-3 TAG1 TAG3 Anti-2

  20. Tag-AntiTag System design problem by Chi-hung Tsai

  21. TAT system design problem • To construct a TAT system with a maximum number of tag-antitag pairs such that the following properties are satisfied: • For each tag-antitag pair (U, Ū) the melting temperature tm(U, Ū)>= H • For any two distinct tag U and V, common substring x, tm(x,x) < C

  22. Definition • Weight w(s) of a string s = a1a2…ak is Σw(ai), where w(A)=w(T)=1, w(C)=w(G)=2 • Given c, h, we call a set of strings a valid c-h code if the following two conditions are satisfied: • Condition 1 Each tag has a weight of h or more. • Condition 2 Any substring of weight c or more occurs at most once.

  23. A valid 4-10 code with 12 tags GACCAAT CAGCTAT GTCGATA CTGGTTA CATTATCA GAAATTCT CTTAATGA GTATTTGT ATATAGTG TAAAACTC AATAAGAG TTTTACAC

  24. Definition: c-token • Definition: we call a string t a c-token if w(t) >= c, but t does not properly contain a suffix of weight >= c.

  25. Definition • Weight w(s) of a string s = a1a2…ak is Σw(ai), where w(A)=w(T)=1, w(C)=w(G)=2 • Given c, h, we call a set of strings a valid c-h code if the following two conditions are satisfied: • Condition 1 Each tag has a weight of h or more. • Condition 2’ any c-token occurs at most once.

  26. Tail weight token’s tail weight Tag T: GACCAAT 2 GAC C 2 CC C 1 CCA A 1 CAA A 1 CAAT T Tail weight of T: 7

  27. Lemma 1: Tag tail weight • All characters of T except the first two terminated a token and thus contribute their weight to the tail weight of T • The maximal prefix that does not contain a suffix of weight >= c has a total weight of at most c-1 • Lemma1: Any tag in a valid c-h code has a tail weight of at least h-c+1

  28. Definition • <n> denote the set of strings with weight n Є N • Gn denote the number of such strings • G1 = 2 A, T • G2 = 6 AA, AT, TA, TT, C, G • Gn = 2*Gn-2 + 2*Gn-1 for n>=3

  29. Token classes • Weight of 1(A or T) => W(weak) • Weight of 2(C or G) => S(strong) • We partition tokens into four classes • Token is terminated by either strong or a weak character • Token has a weight of either c or c+1

  30. Number of tokens and tail weight in a valid c-h code Max. occurr. Max. Token class in valid code tail weight <c-2>S 2*Gc-2 4*Gc-2 S<c-3>S 4*Gc-3 8*Gc-3 <c-1>W 2*Gc-1 2*Gc-1 S<c-2>W 2*Gc-2 2*Gc-2

  31. Lemma 2: Maximum Tail Weight • The total tail weight of all tags contained in a valid c-h code is at most 2*Gc-1+6*Gc-2+8*Gc-3

  32. Theorem 1: Upper Bound • From lemma 1, lemma 2 yields the following upper bound • Lemma1: Any tag in a valid c-h code has a tail weight of at least h-c+1 • The total tail weight of all tags contained in a valid c-h code is at most 2*Gc-1+6*Gc-2+8*Gc-3 • Any valid c-h code contains at most (2*Gc-1+6*Gc-2+8*Gc-3)/ h-c-1Tags

  33. Our Construction Using Circular Strings by Han-yu Chuang

  34. Construction overview • A method of constructing a nearly optimal c-h code for arbitrary values of c and h • The construction lower bound : The upper bound stated in Theorem 1:

  35. Construction overview(con’d) • Comparing the above two bounds,this method at least achieve a factor of approximately 0.89(h-c+1)/(h-c+3) relative to the upper bound • For example,when c=12 and h=30,this construction yields 12119 tags,which corresponds to 87.6% of the upper bound of 13840 one gets from Theorem 1.

  36. Construction overview(con’d) • Two stages: • Construct a set of circular strings in which each token occurs at most once. • Extract tags as substrings from the circular strings.

  37. Construction Stage 2 • In stage 2,the algorithm need to • To satisfy Condition 1(Each tag has a weight of h or more),each of the extracted substrings has a weight of h or more. • To satisfy Condition 2’(Any token occurs at most once),the overlap between two tags has a weight of at most c-1.

  38. Construction Stage 2(con’d) • In stage 2,a straightforward greedy algorithm iterates the following two operations.Starting at some position • Collect characters until their cumulative weight reaches or exceeds h,forming one tag. • Track back over as many characters as possible without collecting a weight of c or more.

  39. Construction Stage 2(con’d) • Stop criteria: Some overlap of weight >= c with the first extracted tag occurs, and the last retrieved tag is discarded. • Given the best start position,this algorithm produces the largest number of tags that are substrings of a given circular string and can be included in a valid code.

  40. Construction overview(con’d) • Illustration: c-1 or c-2 (h+1)-(c-2) h or h+1 C or c+1

  41. Construction overview(con’d) • Each circular string leads to at least tags. • Definition 3(Circular String Problem) Given the parameters c >0 and h >0,construct a set C of circular strings that contain any substring of weight >= c at most once,and maximize

  42. Construction Stage 1 • Meta-String μ and Bit-String β • Each character a Σ is identified with a pair(μ, β),where μ {W,S},and β {0,1}.

  43. Meta-String μ and Bit-String β • Each String s is identified by its pair of meta-string μ(s) and bit-string β(s). For example, We call s an instance of the meta-string μ.

  44. Meta-String μ and Bit-String β(con’d) 3. Each circular string in our construction will be an instance of a long circular meta-string that arises from repeating a shorter meta-string.

  45. De Bruijn sequence • To avoid generating several identical tokens from repetitions of a meta-string μ,this construction will ensure that each instance of the μ is paired with a different pattern in the bit-string. • For k N,a binary De Bruijn sequence of order k is a cyclic binary sequence of length 2k in which each possible substring of length k occurs exactly once. We denote it by Dk. • Reading Dk once,starting from a specific offset I relative to a fixed origin position,we obtain a linearization.We denote it by Dik.

  46. Circular String Construction • Define: If s is a string,we will denote k repetitions of s by sk. • Each cycle is based on a meta-string μ of weight c.

  47. General case • Let α be the shortest period of μW, i.e, μW =(α)p. • Set k=k(μ)=gcd(| α |,2| μ |)

  48. General case(con’d) • Meta-cycle MC(μ) := (α) 2| μ |/k • Bit-cycle BCi(μ) := (Di| μ |)| α |/k , i = 0,…,k-1 • MC(μ) has the same length with BCi(μ).So we take their lcm.

  49. Circular String Construction(con’d) • For every meta-string μ with w(μ) =c,our code contains the k cycles Ci(μ) =(MC(μ), BCi(μ)), i=0,…,k-1 • The set of Cycles we construct is

  50. Special case(example) • μW can’t be represented as concatenation of two or more identical substring.(no α) • gcd(| μ |+1,2| μ |)=1 • For meta-string μ that satisfy the conditions above,

More Related