140 likes | 230 Views
Explore the significance of indels in genomic distances through hybrid models, triangle inequality disruption, and DCJ-indel distances. Discover how to define distances without disrupting the triangle inequality and apply operations such as insertion and inversion. Gain insights on assigning weights to indels and enhancing distance calculations. Dive into the realm of genomics with innovative methodologies and research findings.
E N D
On the weight of indels in genomic distances MaríliaD. V. Braga, Raphael Machado, Leonardo C. Ribeiroand Jens Stoye ( Inmetro - Brazil / Bielefeld University - Germany ) RECOMB-CG 2011
Guidance • Hybrid models for genome rearrangements • Triangle inequality disruption • General framework to establish the triangle inequality • Tight bounds for DCJ-indel (and DCJ-substitution) distance Background Results
Definitions genome chromosome Marker d b c telomere w a s t A: ct dt at bt dt dh wt at st tt ch ah ch dh th ah wh sh dh dt bh ct vt vh tail head b c d a v B:
Genomic distance Inversion a b c c b d Some models: Classicalgenomic distances Hannenhalli & Pevzner 1995 (inv.+transloc.) Yancopouloset al. 2005 (DCJ) Bergeron et al. 2006 (DCJ) d c b Translocation b d c Organizational Operations a a b w Distances with indels El Mabrouk 2001 (inversion-indel distance) Yancopoulos et al. 2008 (“ghost-DCJ” distance) Braga et al. 2010 (DCJ-indel distance) Insertion Indel Operations • Indels in these models are applied to blocks of markers
Triangle Inequality When indel operations of multiple markers are allowed, the triangle inequality may be disrupted [Yancopoulos et al. 2008] dist= 3 inversions A = a b c d e B = a c d b e dist(A, B)≤ dist(A, C) +dist(C, B) dist= 1 indel dist= 1 indel C = a e Is there a distance definition that does not disrupt the triangle inequality?
Double cut and join with indels The adjacency graph AG(A, B): A: ct chbh btwat ahdt dh ahxzbh ct chdt bt dhat B: • Sorting A into B • Only common markers: • Minimum number of DCJs: dDCJ(A, B) = nAB - (# cycles+ # AB-paths/2) [Bergeron et al. 2006] • Including unique markers: • DCJ + indel operations: A-run A-run L1 L4 L2 Λ(P) = # of runs in C dDCJ-id(A, B) ≤ dDCJ(A, B) + λ (P) Λ(P) + 1 term related to the number of markers added or removed λ(P) = 2 L3 [WABI 2010] B-run
A posterioricorrection Fixing the triangle inequality – prior work [JCB 2011]: Applying an a posteriori correction, the triangular inequality holds for the function mid(A , B) = dDCJ-id(A , B) + ku(A , B) and for any constant k≥ 3/2, where u(A,B) = #unique markers in A and B. To improve the lower bound of k we study the worst case for the inequality disruption.
Evaluation of k Worst case (suppose unichromosomal genomes) General case maximum distance dDCJ-id = diameter A B A B Minimum distance dDCJ-id = 1 Minimum distance dDCJ-id = 1 C C = { }
Finding the diameter/lowest k 2. The number of vertices in the adjacency graph AG(A,B) is 2nAB + 2: number of common markers +1 ch A: at ah et ettct So, we have: dDCJ-id(A,B) ≤ |AG(A,B)| = 2nAB + 2 1. The DCJ distance is at least equal to the number of vertices of AG(A,B) dDCJ(P)= λ(P)≤ dDCJ-id(P) = dDCJ(P) + λ(P) ≤|P| dDCJ-id(A,B) ≤ ΣdDCJ-id(P) = Σ |P| = |AG(A,B)| 3. The corrected distance mid satisfies the triangular inequality if k ≥ 1: dDCJ-id (A,C) + k u(A,C) + dDCJ-id (B,C) + k u(B,C) ≥ dDCJ-id (A,B) + k u(A,B) 1 + 1 + k (2 nAB + nA + nB)≥2 nAB + 2 + k (nA + nB) 2knAB≥2nAB
Framework to assign weights to Indels Let w(ρ) be the weight of an operation ρ. • For any organizational operation: • w(ρ) = 1 • For indels: • w(ρ) = p + k m(ρ) where m(ρ) the number of markers inserted or deleted by ρ. a b w s Insertion m(ρ) = 2
Distance on Hybrid Model Assuming p=1 = k (m(ρ2) + m(ρ3) + . . . + m(ρn)) = k u(A,B) = dDCJ-indel Number of operations dHp,k(A,B) = dHp,0(A,B) + k u(A,B)
More plausible distances? 3 inversions a c d b e a b c d e 1 indel 1 indel a e „ghost-DCJ model“ DCJ-indelmodel (k=1) 3 3 a c d b e a b c d e a c d b e a b c d e 2 2 4 4 a e a e
Conclusion • DCJ-indel distance is a metric for • A posteriori distance correction is equivalent to the hybrid model • Similar results for DCJ-substitution distance(see talk by Marília Braga, Sunday) • Open: • p ≠ 1 • Other weight functions • Inversion-indel distance