Inferring Phylogeny using Permutation Patterns on Genomic Data

Inferring Phylogeny using Permutation Patterns on Genomic Data 1Md Enamul Karim 2Laxmi Parida 1Arun Lakhotia 1University of Louisiana at Lafayette 2IBM T. J. Watson Research Center

Phylogeny Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.

Phylogenetic data • Behavioral, morphological, metabolic, etc. • Molecular data: sequence data, gene-order data etc. gene-order data

Why gene order data? • Low error rate. • Rare evolutionary events unlikely to cause “silent" changes; can help inferring millions of years.

1 2 3 9 -8 –7 –6 –5 –4 10 1 2 3 9 4 5 6 7 8 10 1 2 3 –8 –7 –6 –5 -4 9 10 Genomes rearrangements 1 2 3 4 5 6 7 8 9 10 • Transposition • InvertedTransposition • Inversion

Breakpoint distance • Breakpoints are number of adjacencies present in one genome, but not in the other. • For some datasets, a close-to-linear relationship between the breakpoints and evolutionary eventsmay exist. • Can be used for building phylogeny (Blanchette et al.). 1 2 3 4 5 6 7 8 9 10 1 –3 –2 4 5 9 6 7 8 10

Limitations of breakpoint • The number of breakpoints created by a certain number of inversions may vary. • Also, transpositions generally create more breakpoints than inversions. • Computing the breakpoint phylogeny is NP-hard.

MPBE (Maximum Parsimony on Binary Encoding) • A heuristic for the breakpoint phylogeny (Cosner et al.). • All ordered pairs of signed genes appearing consecutively are coded as binary features. • Exponential time complexity, however, much faster than BPAnalysis.

Limitations • May fail to find feasible solutions to the breakpoint phylogeny problem.

Observation: The closer is the evolution history, the more permutations (of different granularity) are in common 1 2 3 4 5 6 7 8 9 10 1 2 3 –8 –7 –6 –5 –4 9 10 1 8 –3 –2 –7 –6 –5 –4 9 10

Maximal pi-pattern (Eres et al.) • Matches permutations at different granularity. • Polynomial time complexity.

pi-pattern • Example : For S = and k=2 Pattern with minimum k permutations abc acb acb cab cab All pi-patterns are: ac, bc, abc, abcc

Cover P1 covers P2=> • Every P1 has a P2 • Every P2 is within a P1 • Example • In S = acbcab abc covers ac

Maximal pi-pattern • pi-pattern which is not covered • Example • In S = acbcab pi-patterns: ac, bc, abc, abcc not covered by abcc Maximal pi-patterns: abc, abcc

Results

Phylogeny for simulated evolution on synthetic data

12 genera of Campanulaceaeand the outgroup tobacco

Tree1: MPBE tree

Tree2: Neighbor joining tree (using few different distances)

Tree3: Neighbor joining tree using permutation patterns • 167 Maximal pi-patterns(from 10769pi-patterns) used as binary feature • XOR Distance measure • Distance/Similarity matrix is created to find neighbor joining tree

Tree3 vs Tree2

Conclusion • Permutation patterns may preserve more evolutionary information. • Evolutionary events could be counted within permuted segments to develop a hybrid scheme. • Current approaches remain unable to handle unequal gene content, which could be solved using maximal pi-patterns.

Inferring Phylogeny using Permutation Patterns on Genomic Data

Inferring Phylogeny using Permutation Patterns on Genomic Data

Presentation Transcript

Using LEHD data to examine turnover patterns

Permutation

Genomic Data Privacy Protection Using Compressive Sensing

Permutation

Linked Data Patterns using the CRM

How does phylogeny influence ecological patterns?

Visualization of genomic data

Visualization of genomic data

On permutation boxed mesh patterns

Representing Data using Static and Moving Patterns

Phylogeny Reconstruction from Experimental Data

PERMUTATION

Inferring Genomic Sequences

Predicting Genetic Merit Using Genomic Data

Inferring structure from data

Delivering flexible applications using data-management patterns

MicroRNA Disease Predictions Based On Genomic Data

PERMUTATION CIRCUITS

Representing Data using Static and Moving Patterns

Inferring Transcriptional Regulation Using Transctiptomics

An Algorithm For Exploring Patterns In Clinical Genomic Data