1 / 34

Impact of functional information on understanding (genetic) variation ENCODE thread 12 + a bit more

Impact of functional information on understanding (genetic) variation ENCODE thread 12 + a bit more. ENCODE Journal Club David Vandenbergh Marta Byrska -Bishop Jan 8, 2013. Titles of the 3 papers. Personal and population genomics of human regulatory variation. Vernot B et al.

nani
Download Presentation

Impact of functional information on understanding (genetic) variation ENCODE thread 12 + a bit more

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Impact of functional information on understanding (genetic) variationENCODE thread 12 + a bit more ENCODE Journal Club David Vandenbergh Marta Byrska-Bishop Jan 8, 2013

  2. Titles of the 3 papers Personal and population genomics of human regulatory variation. Vernot B et al. 2. Analysis of variation at transcription factor binding sites in Drosophila and humans. Spivakov M et al. 3. Systematic localization of common disease-associated variation in regulatory DNA. Maurano M et al.

  3. Personal and population genomics of human regulatory variation Vernot B et al.

  4. Preliminary points ~1.5% - protein-coding DNA ~up to 15% is estimated to be functionally constrained There is a lot of functional DNA in non-coding regions. Genetic variation in such regions likely makes a significant contribution to phenotypic variation and disease susceptibility among individuals.

  5. Aim of the paper Combine genome-wide DHSs from 138 cell and tissue types (3mln DHS, 8.4mln Dnase footprints) with whole-genome sequences of 53 geographically diverse individuals to better understand the patterns of regulatory variation in humans.

  6. Genomic pattern of variation in DNA sequence motifs Nucleotide diversity normalized by mutation rate Nucleotide diversity (π) for 732 known motifs Figure 3. Mutation rate has a potentially large contribution on observed level of nucleotide diversity

  7. Functional constraint between cell types is heterogeneous! There are big differences in nucleotide diversity between cell lines! 92 cell types with high quality DNase I data 28.2% 28.2% Neutrally evolving DNA 8% 6.4% 8% 6.4% Stronger functional constraint 31.1% 31.1% The core set of DHSs is subject to stronger purifying selection

  8. Ectopic activation of non-canonical cis-regulatory sequences contributes to the aberrant transcriptional changes in many cancers Distributions of singleton peaks when randomly sampling 29 or 5 cell types (Proportion of Dnase I peaks present in only 1 cell type)

  9. Analysis of variation at transcription factor binding sites in Drosophila and humans Spivakov M et al.

  10. Preliminary points • Gene expression is regulated by TF that are recruited to DNA CRMs. • The regulatory code at CRMs has an ambiguous relationship between sequence and function. • Some regulatory variants have been associated w/ changes in TF binding, gene expression, and disease phenotype, while others don’t cause any change in function. • Studying TFBS variability in the context of the same species may lead to insights into cis-regulatory logic.

  11. 3 approaches to investigate TFBS functional constraints based on variation data Analyzing TFBS position-by-position Utilizing genetic load model as a metric of TFBS variation which allows to investigate per-instance TFBS functional constraints Taking advantage of per-individual binding maps for a human CTCF to demonstrate the buffering of genetic variation at TFBS

  12. 1. Position-by-position

  13. Variation at functional binding sites is reduced as compared to reshuffled motif matches and flanking regions Individual variation of the binding sites for 15 Drosophila and 36 human TFs. Significance of the effect similar even though the SNP frequency differed approximately 11-fold between Drosophila and humans.

  14. There is a significant anti-correlation between variation frequency at motif positions and their information content Individual variation of the binding sites for 15 Drosophila and 36 human TFs.

  15. 2. Per instance Binding sites that tolerate a higher load are less functionally constrained Mutational load Population genetics metric that combines the frequency of mutation with predicted phenotypic consequences (reduction of PWM score) that it causes.

  16. TFBS proximal to TSS are more constrained compared to distal regulatory regions in humans exception Role of CTCF in establishing chromatin domains is especially important in proximity of gene promoters

  17. 3. Per individual

  18. Mutations can be buffered to maintain the levels of binding signal, especially at highly conserved sites Conservation consistently weakens the relationship between PWM score and the binding intensity. CTCF binding to evolutionary conserved sites may have a reduced dependence on sequence.

  19. Conclusions TFBS are functionally constrained, but mutations at them can be tolerated providing evidence for possible “buffering” effects.

  20. Systematic Localization of CommonDisease-Associated Variation inRegulatory DNA Maurano et al. (Stamatoyannopoulos lab), Science 337, 1190 (2012)

  21. Take home lessons From Maurano: 1. More than 75% of genetic variants associated with disease are concentrated in regulatory DNA marked by DNase I hypersensitive sites (DHSs) – within the site or within a region in complete Linkage Disequilibrium (LD). 2. There is tissue-selective enrichment of more weakly disease-associated variants within DHSs. 3. The authors suggest that there is, “pervasive involvement of regulatory DNA variation in common human disease.”

  22. General points from the paper 1. A large number of SNPs and genetic associations were assessed. 5134 Unique SNPs were examined in 5,654 significant genetic associations. This set covered 207 diseases and 447 quantitative traits 2. As quality of a SNP’s genetic association increased, the higher the likelihood to be located in a DHS. • Low: 2436 unreplicated SNPs; • Medium: 2374 “internally replicated” SNPs (confirmed in a second population in the initial publication); • High: 324 “externally replicated” SNPs (confirmed in an independentstudy) 3. GWAS variants localize to cell- and developmental stage–selective regulatory DNA. Tissue-specific regulation might be important in many genetic associations.

  23. SNPs that are associated with a trait are found in DHS’s in cells related to that trait

  24. More background conclusions Early developmental stages were over-represented in the tissue specificity. Of the 2,900 SNPs in DHS’s, 88% were in sites present in fetal tissue If looked at from the other direction, 58% of DHSs that have a genetic association are present in fetal tissue and persist into adult stages, 30% are present only in fetal stages. The adult-specific DHSs were in a limited set of tissues, but probably only because of the limited number that can be examined easily. Many common disorders have been linked withearly gestational exposures or environmental insults Disorders/Traits that were enriched in the fetal DHS’s: menarche, cardiovascular disease, and body mass index Disorders/Traits that were depleted in the fetal DHS’s: aging-related diseases, cancer, and inflammatory disorders with presumed (postnatal) environmental triggers

  25. Traits showing differential representation of SNPs in fetal DHS’s

  26. Long Distance Relationships can Work in Genomics An asterisk (*) indicates the highest-correlated gene is not the nearest gene. 40% of correlated DHS-gene pairs span >250 kb (Fig. 2B), and 80% represent pairings with distant promoters versus those of the nearest gene. These interactions typically extend beyond the range of LD (mean r2 = 0.06).

  27. A GWAS variant associated with platelet count is connected with the JAK2 gene (myeloproliferativedisorders) 222 kb away. Fig 2A

  28. 93.2% (2874) of GWAS SNPs in DHSs overlap a transcription factor recognition sequence. Allele-Specific DNase Hypersensitivity in a CEBP A Binding Site Fig 2C

  29. Common variants associated with specific diseases or trait classes were systematically enriched in the recognition sequences of transcription factors governing physiological processes

  30. Disease-associated variants cluster within transcriptional regulatory pathways. Fig 3A

  31. IRF9, associated with type I interferon induction and JAK/STAT signaling, is associated with TF’s related to many autoimmune disorders. Fig 3B

  32. The TF network associated with autoimmune diseases is significantly represented by GWAS variants in their binding sites

  33. De novo Identification of pathogenic cell types

  34. Conclusions When variants associated with disease are localized to a regulatory region, specific hypotheses can be generated as to the cause of the association. Modulation of transcription factor networks provides a model for the accumulation of many small genetic effects on a pathway as being of consequence to the disease or trait. The authors suggest that focusing on TFBSs identified in cell types related to the trait (e.g. CD3+ cells and MS) might be a way to target searches for causal variants.

More Related