1 / 24

Reconstructing the Evolutionary History of Complex Human Gene Clusters

Reconstructing the Evolutionary History of Complex Human Gene Clusters. Y. Zhang, G. Song, T. Vinar, E. D. Green, A. Siepel, W. Miller RECOMB 2008. Speaker: Fabio Vandin. Outline. Motivation Problem definition Simple algorithm for simple model SIS algorithm for complex model

Download Presentation

Reconstructing the Evolutionary History of Complex Human Gene Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconstructing the Evolutionary History of Complex Human Gene Clusters Y. Zhang, G. Song, T. Vinar, E. D. Green, A. Siepel, W. Miller RECOMB 2008 Speaker: Fabio Vandin

  2. Outline • Motivation • Problem definition • Simple algorithm for simple model • SIS algorithm for complex model • Results on human genome • Evaluation of SIS algorithm

  3. Gene cluster • Group of related genes • Probably formed by duplications: • followed by functional diversification • with deletions, cause human genetic diseases

  4. “History” of a gene cluster? ? 80% 85% 93% 89% 98% Percentage of similarity in self-alignment in the human genome

  5. Dot-plots of self-alignments Human UGT2 cluster

  6. Data Preparation • Atomic segments: • Self-alignment (forward and reverse-complement): pairs A ≈ B • Transitive closure property: A ≈ B, B ≈ C A ≈ C • Maximize each alignment • Collapsible segments:

  7. Computational Problem • Input: sequence of atomic (non-collapsible) segments • Output: (most probable)sequence of events (or the number of events) such that if we unwind these events in the input sequence, we obtain a sequence containing only a single atomic segment

  8. Simple Model • Assumptions: • Only duplications (possibly with reversal and tandem duplications) • No duplication inside the originating region • The most important:

  9. Simple Model (2) • Given assumptions 1 e 2, each event is of “identical” (reversed) regions of consecutive atomic segments ( copied to ) • Problem statement

  10. Simple Model (3) • Candidate definition • Is this definition reasonable?

  11. Simple Model (4) • Algorithm • Analysys

  12. Simple Model: limitations • Assumptions: • “large scale deletions are likely to occur” • “atomic boundary reuses violating assumption are not uncommon” • Same number of events, but multiple way of reconstructing the history • NB: not solved with SIS, but you can assign probability..

  13. Stochastic Model • Event : • Duplication (possibly with reversal) • Deletion (with restrictions) • History: • Target distribution of histories: • is the number of reused atomic boundaries and

  14. Algorithm for Stochastic Model • Sample histories from the target distribution and compute the mean value (of the function we are interested in): • sample from • given , estimate • e.g., gives the number of events • Problem: how to sample from the target distribution?

  15. Sequential Importance Sampling (SIS) • Goal: compute • Target distribution: • Trial distribution: • Sample’s weight: • Output:

  16. SIS for gene cluster history • Duplication

  17. SIS for gene cluster history (2) • Deletion • only without atomic boundary reuse • only if “the atomic segment pair flanking a deletion site appears elsewhere” in the seq.

  18. Application to Human Gene Clusters • human genome assembly hg18 self alignment: 457 duplicated regions • alignments: at least 500 bp, >= 70% identity • segments separated by no more than 500 Kbp • only long and non-trivial regions considered • 165 biomedically interesting clusters (~111 Mbp) • 5 divergence thresholds: GA, OWM, NWM, LG, DOG • 825 combinations of gene cluster and divergence threshold

  19. Application to Human Gene Clusters (2)

  20. Application to Human Gene Clusters (3)

  21. Application to Human Gene Clusters (4) • “… help prioritize the selection of notably interesting gene clusters for more detailed comparative genomics studies” • “… to compare cluster dynamics in certain lineages to observed phenotypic differences among primates” • “… such sequence data should reveal differences among primate species of possible relevance for selecting species for further biomedical studies”

  22. Evaluation of the SIS Algorithm • Estimate parameters from the human genome for the events (e.g., duplications): • 39% of duplication “reversed”, deletions=2% of duplications • Starting from 500 Kbp sequence, generate 10 genome clusters for N= 10,20,…,100 events

  23. Evaluation of the SIS Algorithm (2)

  24. Discussion • Main Limitations: • details of data preparation • choice of the parameters/distributions • evaluation of the SIS algorithm • Future Directions: • include other types of events • understand the stochastic model • how to evaluate a model? • how to evaluate an algorithm?

More Related