1 / 25

Merlin - M ultipoint E ngine for R apid L ikelihood In ference

Merlin is a multipoint engine for rapid likelihood inference, providing improved pedigree analysis, efficient computations, and memory optimization. Learn what's wrong with Genehunter and the advantages of using Merlin. Explore its algorithms and interface.

worth
Download Presentation

Merlin - M ultipoint E ngine for R apid L ikelihood In ference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merlin - Multipoint Engine for Rapid Likelihood Inference Ido Feldman Merlin - Multipoint Engine for Rapid Likelihood Inference

  2. Agenda • What’s wrong with Genehunter • What’s Merlin • Merlin from the user’s viewpoint • Merlin: Pro’s and Con’s • Algorithms and Ideas Merlin - Multipoint Engine for Rapid Likelihood Inference

  3. What’s wrong with Genehunter? • The running time / memory is exponential in the size of the pedigree. • Intractable for pedigrees larger than 22-23. • Ideal Solution: Finding an algorithm which is polynomial in pedigree size and markers. • No such luck so far… We’ll try to improve the constants. Merlin - Multipoint Engine for Rapid Likelihood Inference

  4. Merlin is an improved Genehunter • Pedigree error detection • Pedigree simplification • Smart inheritance vectors • Efficient and approximate computations Merlin - Multipoint Engine for Rapid Likelihood Inference

  5. Merlin – Getting Started • Input consists of 3 files: • Pedigree file • Data file (elaborated later) • Map file (contains marker data) Merlin - Multipoint Engine for Rapid Likelihood Inference

  6. Merlin Input – Pedigree Example Family Person Parent1 Parent2 Sex Diabetes Glucose HLA-DR HLA- DQ 1 1 0 0 1 1 x 3 3 4 3 1 2 0 0 2 1 3.000 4 4 1 1 1 3 0 0 1 1 8.000 1 2 x x 1 4 1 2 2 1 3.500 4 3 1 4 1 5 3 4 2 2 1.234 1 3 3 4 1 6 3 4 1 2 4.321 2 4 1 1 1 7 0 0 1 1 5.500 1 2 4 2 1 8 7 4 2 1 6.231 1 4 4 1 2 1 0 0 1 1 6.000 4 3 1 4 2 2 0 0 2 2 7.000 3 4 5 3 2 3 0 0 1 1 7.700 1 2 2 4 2 4 1 2 2 1 4.000 4 3 1 5 2 5 0 0 2 1 3.600 3 5 2 4 2 6 3 4 1 2 1.234 1 3 1 4 2 7 3 4 1 2 3.321 2 4 2 5 2 8 3 4 2 1 5.175 1 4 1 5 2 9 5 6 2 2 0.512 3 3 4 4 Merlin - Multipoint Engine for Rapid Likelihood Inference

  7. Merlin Input – Data file • Link column name to type. • A diabetes (A =Affected/Not Affected) • T glucose (T = Trait) • M HLA-DR (M = Marker) • M HLA-DQ • It’s possible to encode twins status Merlin - Multipoint Engine for Rapid Likelihood Inference

  8. Advantages and Disadvantages • Pros: • Merlin is VERY fast. • Multipoint IBD calculations are exact. • Cons: • Can’t handle very large pedigrees (but still better than Genehunter!) Merlin - Multipoint Engine for Rapid Likelihood Inference

  9. Merlin is fast! – Example pedigrees Merlin - Multipoint Engine for Rapid Likelihood Inference

  10. Merlin is fast! – Results Merlin - Multipoint Engine for Rapid Likelihood Inference

  11. Merlin needs less memory • Due to the smart storage of inheritance vectors, less memory is consumed in compare to Genehunter. • Example pedigree: Genehunter – 1024MB • Exact Merlin: 100MB • Approximate Merlin: 4MB-54MB • Automatic disk-swapping. Merlin - Multipoint Engine for Rapid Likelihood Inference

  12. Pedigree Simplifications • Tear families apart… 2 1 1 3 4 5 6 7 Merlin - Multipoint Engine for Rapid Likelihood Inference

  13. More Pedigree Simplifications • Remove unneeded people… 2 1 3 4 5 Merlin - Multipoint Engine for Rapid Likelihood Inference

  14. Input Error Correction C/D A/B • Discovers impossible genotypes. • Report of unlikely recombinations. • Mistakes are ambiguous  reports the most likely mistake. A/C A/B Merlin - Multipoint Engine for Rapid Likelihood Inference

  15. From IV to packed tree Inheritance Vector: 0 1 . . . n 1st Meiosis 0 0 1 Packed Tree: 1 1 2nd Meiosis 0 1 0 1 2 2 2 2 Merlin - Multipoint Engine for Rapid Likelihood Inference

  16. From packed trees to sparse trees • Idea: Prune unneeded sub trees. • Sub-trees with zero likelihood • Symmetric nodes – Seeing one is like seeing the rest • Pruning at level i removes a sub tree of size O(2n-i). • IV order is important! Merlin - Multipoint Engine for Rapid Likelihood Inference

  17. 0 1 . . . . Case 1: Zero Likelihood a/A A/A Any IV with IV[0]=0 is of zero likelihood! A/A Merlin - Multipoint Engine for Rapid Likelihood Inference

  18. 1 1 . . . . Case 2: Symmetric Nodes a/A A/A A vector with a IV[1]=0 have a twin with IV[1]=1 A/A Merlin - Multipoint Engine for Rapid Likelihood Inference

  19. L L L L L L L L 1 2 1 2 1 2 1 2 Sparse: Legend Node with zero likelihood Node identical to sibling L L L L 1 2 Likelihood for this branch 1 2 From Packed tree to Sparse Trees Packed: Merlin - Multipoint Engine for Rapid Likelihood Inference

  20. H1 H2 Hi X1 X2 Xi Every member in the matrix is of the form: {step i} P(x1,…,xi,hi) =  P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi) hi-1 Reminder: The forward algorithm Note that in Step i of the forward algorithm, we multiply a transition matrix of size 22n x 22n with vectors of size 22n. Merlin - Multipoint Engine for Rapid Likelihood Inference

  21. 1 2 3 4 5 6 1 1 1 2 1 1 1 3 1 11 4 1 1 5 1 1 6 1 1 Transition matrix is a bottleneck • Matrix-Vector Multiplication: θ(N2) • In our case, N=22n. • If the matrix was sparse (k<<N2), it was easy. Trivial Implementation: List of lists Merlin - Multipoint Engine for Rapid Likelihood Inference

  22. Multipoint analysis in dense maps • Idea: Close markers  Negligible chance for consecutive recombination. • Used for approximate solutions. • Allowing <3 recombinants give an almost exact solution. • But 3 times faster, and with half the memory. Merlin - Multipoint Engine for Rapid Likelihood Inference

  23. Summary • Detects data errors and unlikely data. • Simplifies pedgirees. • Use sparse trees to exploit symmetries and impossible data. • Use sparse matrix to ease matrix-vector multiplication. • Open source. Merlin - Multipoint Engine for Rapid Likelihood Inference

  24. More info • http://bioinformatics.well.ox.ac.uk/Merlin • “Merlin - rapid analysis of dense genetic maps using sparse gene flow trees", Gonçalo R. Abecasis, Stacey S. Cherny, William O. Cookson, and Lon R. Cardon. Nat Genet. 2002 Jan;30(1):97-101 Merlin - Multipoint Engine for Rapid Likelihood Inference

  25. Intermission… Merlin - Multipoint Engine for Rapid Likelihood Inference

More Related