1 / 25

Dan Geiger Computer Science Department, Technion

PEDTOOL: Gene hunting based on high-throughput computing. Dan Geiger Computer Science Department, Technion. חיפוש גנים החושפים או גורמים למחלות. מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות

osman
Download Presentation

Dan Geiger Computer Science Department, Technion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEDTOOL: Gene hunting based on high-throughput computing Dan Geiger Computer Science Department, Technion

  2. חיפוש גנים החושפים או גורמים למחלות מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות 4. הבנת תהליכים ביולוגיים בסיסיים כיצד ניתן לחפש ? 1. מציאת משפחות בהם קיימת מחלה המועברת מדור לדור 2. לקיחת בדיקת דם פשוטה ממספר חולים ובריאים 3. ניתוח מעבדתי של הדנא על כל הכרומוזומים 4. ניתוח באמצעים אלגוריתמים. אדגיש שלוש בעיות חישוביות.

  3. Usage of our system in Israeli Hospitals • Rabin Hospital, by Motti Shochat’s group • New locus for mental retardation (2003) • Infantile bilateral striatal necrosis (2004) • Soroka Hospital, by Ohad Birk’s group • Lethal congenital contractural syndrome (2004) • Congenital cataract (2005) • Rambam Hospital, by Eli Shprecher’s group • Congenital recessive ichthyosis (2005) • CEDNIK syndrome (2005) • Galil Ma’aravi Hospital, by Tzipi Falik’s group • Familial Onychodysplasia and dysplasia • Familial juvenile hypertrophy (2005)

  4. Identifygenes(104~105 bp) Resequencing(100 bp) Steps in Gene Hunting Linkageanalysis(106~107 bp)

  5. Male or female Recombinantgametes Recombination During Meiosis

  6. Family Pedigree

  7. Familial Onychodysplasia and dysplasia of distal phalanges (ODP) III-15 IV-10 IV-7

  8. Familial juvenile hypertrophy of the breast (JHB) IV-3

  9. . M1 M2 Chromosome pair: Marker Information Added(סמנים גנטיים)

  10. M1 M2 D1 D2 M3 M4 θ III-15 151,159 III-16 151,155 a h 202,209 202,202 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation- Two Point Analysis (Task 1) The first computational problem: find a value of θ that maximizes Pr(data|θ,Mode-Of-Iheritance) Data means here one marker data at a time. LOD score (to quantify how confident we are): Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)].

  11. Results of Two-Point Analysis T

  12. Results of Two-Point Analysis T

  13. Results of Two-Point Analysis T

  14. Maximum Likelihood Evaluation Approach (Task 2) Most probable Haplotype Configuration of some or all persons: Which alleles came from the mother and which from the father ? The second computational problem: argmax Pr(h1,h2,…,h 2n-1, h2n |data,θ,MOI) For each person, there are 2k possible haplotypes, where k is the number of markers considered.

  15. Results of Haplotyping Analysis(Affected persons)

  16. Results of Haplotyping Analysis(Healthy persons)

  17. M1 M2 D1 M3 M4 θ III-15 151,159 III-16 151,155 202,209 202,202 a h 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation Multipoint Analysis(Task 3) The third computational problem: find a value of θ that maximizes Pr(data|θ,MOI) Data now means considering several markers at once.

  18. 23 Results of Multipoint Analysis

  19. This problem is equivalent to finding the best order for sum-product operations for high dimensional matrices: The Computational Task • Computing P(data|θ) for a specific value of θ:

  20. Stochastic Greedy Ordering Algorithm(s) • Iteration i: • three indices yielding minimal table size are found. • a coin (biased according to the resulting table size) is flipped to choose between them. • The algorithm is repeated many times unless a low cost elimination sequence is found. Repeat these steps with several cost functions.

  21. But we can fix the value of the index m, namely, condition on m’s value, and do each part as a separate job: When intermediate tables become too large for a given RAM, computation virtually halts:

  22. The Pedtool System • Divides the computation of a single likelihood to hundreds of computers. • Uses Condor at UW-Madison research pool. • Simple user interface – used by novices • Able to compute a highly inbred pedigree with 250 individuals sent by NIH. Faster by 1-5 orders of magnitude over other linkage programs.

  23. Running times improvements bioinfo.cs.technion.ac.il/pedtool

  24. The Main Goals of future Research • Efficiency • Simplicity • Availability online to all Israeli researchers. • More functionalities bioinfo.cs.technion.ac.il/pedtool

  25. Acknowledgements Students: Ma’ayan Fishelson, Ph.D (Graduated 2004) Dmitry Rusakov, Ph.D (Graduated 2004) Anna Tzemach, M.Sc Nickolay Dovgolevsky, B.Sc (Graduated, 2004) Mark Silberstein, M.Sc Julia Stolin Edward Vitkin Collaborators from medical genetics: Motti Shochat and Tami Shochat (Rabin) Ohad Birk and Rivka Ophir (Soroka) Tzipi Falik and Morad Khayat (Galil Ma’aravi) Collaborators from distributed systems: Assaf Schuster Pedtool is to be hosted by DSL at the CS/Technion and supported by IBM, ISF, Israeli Science Ministry

More Related