1 / 13

M. Vlachos, B. Taneri, E. Keogh, P.S. Yu

M. Vlachos, B. Taneri, E. Keogh, P.S. Yu. IBM Research, NY Scripps Genome Center, San Diego University of California, Riverside. how can we visualize DNA data?.

chase-lee
Download Presentation

M. Vlachos, B. Taneri, E. Keogh, P.S. Yu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M. Vlachos, B. Taneri, E. Keogh, P.S. Yu IBM Research, NYScripps Genome Center, San DiegoUniversity of California, Riverside

  2. how can we visualize DNA data? GTTAATGTAGCTTAAATATTTATAAAGCAAAACACTGAAAATGTTTAGATGGGTTTAATTAACCCCATTGACATTAAAGGTTTGGTCCCAGCCTTTCTATTAGTTCTAAACAGACTTACACATGCGAGCATCTACATCCCAGTGAGAACGCCCTCTAAATCATCAAGGATCAAAAGGAGCGGGTATCAAGCACACTAACACTAGTAGCTCACAACGCCTCGCTTAGCCACACCCCCACGGGACACAGCAGTGATAAAAATTAAGCCATGAACGAAAGTTTGACTAAGTCATGTTTACAAGGGTTGGTAAACTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAATTAATAGAAACACGGCGTAAAGAGTGTTAAGGAGTCACGTAAAATAAAGTCAAGCCTTAATTAAGCTGTAAAAAGCCCTAATTAAAACTAAGCCAAACTACGAAAGTGACTTTAATATAATCTGATTACACGACAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCATAAACTCTAATAGTCACAAAACAAGACTACTCGCCAGAGTACTACTAGCAATAGCCTAAAACTCAAAGGACTTGGCGGTGCTTCATACCCCCCTAGAGGAGCCTGTTCTATAAACGATAAACCCCGATCAACCTCACCAACCCTTGCTACTCCAGTCTATATACCGCCATCTT…………. I think it’s time I bought that new pair of reading glasses… CAGCAAACCCTAAAAGGGAACGAAAGTAAGCATAACCATCCTACATAAAAACGTTAGGTCAAGGTGTAACCTATGGGTTGGGAAGAAATGGGCTACATTTTCTATATTAAGAACATTCCTTATACTCACACGAAAGTTTTTATGAAACTTAAAAACCAAAGGAGGATTTAGTAGTAAATCAAGAGCAGAGTGCTTGATTGAACAAGGCCATGGAGCACGCACACACCGCCCGTCACCCTCCTCAAGTACCCTAGCAAAGCCCCAGTTCGTTAACTCACGCCAAGCAATCATACGAGAGGAGACAAGTCGTAACAAGGTAAGCATACCGGAAGGTGTGCTTGGATGAATCAAGATATAGCTTAAACAAAGCATCTAGTTTACACCTAGAAGATTCCACACCCTGTGTATATCTTGAACCAATTCTAGCCCACACCCTCCCCACTTCTACTACTACAAACCAATCAAATAAAACATTCACCATACATTTTAAAGTATAGGAGATAGAAATTTAATTACCAGTGGCGCTATAGAGATAGTACCGTAAGGGAAAGATGAAAGAAAACCTAAAAGTAGTAAAAAGCAAAGCTTACCCCTTGTACCTTTTGCATAATGACTTAACTAGTAATAACTTAGCAAAGAGACCTTAAGTTAAATTACCCGAAACCAGACGAGCTACTTATGAGCAGTATTTAGAACGAAC…………... Thousands or millions of basepairs long • Humans cannot easily compare or visualize text • We understand and visualize better shapes • Can we find a way to visually represent bulks of DNA sequences? • How can we represent the relationships between DNA sequences in an accessible manner?

  3. dendrogram visualization • Dendrograms present a hierarchy of affinity/similarity • They still do not provide any solutions for the DNA representation • Dendrograms cannot capture pairwise relationships- They are lost during the grouping AATTGATAAAAATTAAGCCATGAACGAAAGTTTGACTAAGTCATGTTTACAAGGGTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAATAGAAACACGGCGTAAAGATAAGGAGTCACGTAAAATAAAGTCAAGCCTTAATTAAGCTGTAAAAAGCCCTAATTAAAACTAAGCCAAACTACGAAAGTGACTTTAATATAATCTGATTACATTGGTAAAC GATAAAAATTAAGCCATGAACGAAAGTTTGACTAAGTCATGTTTACAAGGGTTGGTAAACTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAATTAATAGAAACACGGCGTAAAGAGTGTTAAGGAGTCACGTAAAATAAAGTCAAGCCTTAATTAAGCTGTAAAAAGCCCTAATTAAAACTAAGCCAAACTACGAAAGTGACTTTAATATAATCTGATTACA GATAAAAATTAAGCCATGAACGAAAGTTTGACTAAGTCATGTTTACAAGGGTTGGTAAACTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAATTAATAGAAACACGGCGTAAAGATAAGGAGTCACGTAAAATAAAGTCAAGCCTTAATTAAGCTGTAAAAAGCCCTAATTAAAACTAAGCCAAACTACGAAAGTGACTTTAATATAATCTGATTACA • Other techniques: • HyperTree • PattVision

  4. what we propose … • Transform sequences into 2-dimensional trajectories • Compute elastic matching between DNA trajectories • Plot their relationships on the 2D plane using a spanning-tree mapping Easier to visualize and compare trajectories rather than strings Spanning Tree Visualization + Relative Distance towards pivot point Trajectory (and possible simplification) B A D W DNA string F R2 R …GTACTTAGCGATTTAAATTC… H L I

  5. converting DNA to trajectories • This process will convert a long string of nuclotides into a 2D trajectory that can be easily visualized • Similar DNA sequences will result into similar trajectories • Resulting trajectories can be downsampled or compressed for easier plotting • Given an initial point on the 2D space (e.g. [0,0]) • Start moving up/down/left/right based on the DNA letter you encounter A G T C Example: Trajectory(i)= Trajectory(i-1) + V …GAATTC… DNA Cow DNA Trajectory

  6. example Human vs Chimpanzee • Species with similar DNA content will also have very similar DNA trajectories • The elastic matching offered by the warping function can find flexible similarities • Dynamic Time Warping Primer • Use dynamic programming to solve the matching problem Human vs Bear

  7. is this representation meaningful? Eutheria Cetartiodactyla Carnivora Balaenoptera Ursus Hominidae Proboscidea Panines Human Chimpanzee BlueWhale FinbackWhale Indian Elephant African Elephant Polar Bear Dog Pygmy Chimpanzee American Bear Hippopotamus • The dendrogram on the pairwise distances between the trajectories is correct • It accurately captures the predominant views about affinity of species

  8. Advantages of the Mapping: • Important distances are exactly preserved • Local and global structure is preserved • Preservation of distances against the pivot point allows for the very powerful visualization spanning tree visualization • The distance between any tree points can be perfectly retained on the 2D space • For additional points we can retain the distance to the NN point + to one pivot point • Out of N2distances we can preserve a total of:3 + 2(n-3) distances

  9. visualization: humans & ‘relatives’ Using warping distance Using euclidean distance • Gibbon is erroneously placed closer to human compared to the orangutan. • Pivot point is the human • Species that diverged closer in time, are also placed closer on the 2D space.

  10. visualization: relationship between mammals Hippopotamus is indeed closer to whale than to any other species - CetartiodactylaB. M. Ursing and U. Arnason. Analyses of mitochondrial genomes strongly support a hippopotamus-whale clade. In Proc. of the Royal Society of London, Series B, vol 265: 2251-2255, 1998. Human is closer to pygmy chimpanzee than to regular chimpamzee.C. Lockwood, W. Kimbel, and J. Lynch. Morphometrics and hominoid phylogeny: Support for a chimpanzee-human clade and differentiation among great ape sub-species. In Proc. Natl. Acad. Sci. USA, 101(13), 4356-4360, 2004.

  11. spanning tree for non-metric distances Enclosed Reference Circles • When dealing with non-metric distance (like the DTW) the circles in the spanning-tree visualization method may not intersect. • So now we need to find the point in 2D space that is closer to the two center circles. • The reference circles can either enclose each other or be one outside the other. Disjoint Reference Circles C is the point closer to the centers A1 and A2 of the two reference circles.

  12. Can we utilize advanced compression and indexing schemes for the DNA trajectories in order to do fast prefiltering between millions of DNA sequences? A B C Organize the low-dimensional points into a hierarchical ‘index’ structure. extensions (search and indexing) Project all sequences into a new space, and search this space instead (eg project trajectory from 100-D space to 2-D space) Feature 1 Feature 2 query

  13. extensions (medical screening) • DNA trajectories can be a interesting approach to clinical screening and diagnostics • E.g. does this tissue/cell look more like a cancerous one or not? • By evaluating the distance in-between the DNA trajectories, can we evaluate the cancer stage of a tissue? Cancer Tissues • Perform clustering • Discover Classification Rules, e.g. through Nearest Neighbors X Is this tissue cancerous or not and at what stage? Normal Tissues

More Related