1 / 32

ChIP-seq and its applications in GRN construction

ChIP-seq and its applications in GRN construction. Jin Chen 2012 Fall CSE891- 001. Layout. Genome-scale evidence from microarray measurements may be used to identify regulatory interactions between TFs and targets

ciqala
Download Presentation

ChIP-seq and its applications in GRN construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

  2. Layout • Genome-scale evidence from microarray measurements may be used to identify regulatory interactions between TFs and targets • Hu et al used a genetic approach to identify targets of transcription factors in Yeast and reconstruct a functional regulatory network • Reimand et al re-analyzed Hu’s data using improved statistical techniques

  3. Hu et al’s work • Grew each of 263 transcription factor knockout strains and compared mRNA expression of each of these strains with a wildtype strain using microarrays • Defined unrefined transcription factor target network as the cumulative set of significantly differentially expressed genes in each deletion strain. • There was overlap between transcription factor targets identified in the unrefined network and targets identified by ChIP-chip

  4. 2-level Refinement • First level of network refinement • If TF A activated TF B and gene M, B activated gene M, and if the confidence of A regulating gene M was lower than for B regulating gene M, then the regulation of gene M by A was presumed to be indirect and was therefore erased • Additional refinement step • Similar to previous step, except that the indirect edge that was removed bridged a three-step direct interaction series at the preceding level, resulting in a level 3 refined network • Note that the logical consistency for regulatory edges was maintained at all times

  5. Hu et al’s work • When the transcription factor bound to a promoter was deleted, the expression of the downstream gene was much more likely to be affected than the background • Expression from promoters that were detectably occupied by a single TF were even more likely to be affected by deletion of that potentially major or sole TF • Thus, there was significant overlap between binding targets defined by ChIP-chip and functional targets defined by TF deletion

  6. Hu et al’s work – problems • However, Hu et al ‘s study used relatively dated and insensitive approaches for microarray data processing • As a result the published P-values and target-gene ranking are likely to be unreliable • P-values were not corrected for multiple-testing • Lack of background and print-tip correction during normalization • Reimand et al re-analyzed the same dataset with the state-of-art software and obtained a much larger network Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  7. False Discovery Rate • False discovery rateis a statistical method used in multiple hypothesis testing to correct for multiple comparisons. q-valueis defined to be the FDR analogue of p-value • FDR is the expected proportion of false positives among all significant hypotheses • For example, if 1000 observations were experimentally predicted to be different, and FDR for these observations was 0.1, then 100 observations would be expected to be false • FDR is determined from the observed p-value distribution, and hence is adaptive to the amount of records

  8. Redo the Preprocessing • Microarrays were normalized using the VSN package, including print-tip and background correction • Differential expression was calculated using a moderated eBayes t-test as implemented in the LimmaBioconductor package • FDR cut-off of 0.05 was used to detect significant differential gene expression Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  9. Re-analyze TF binding data • DNA–protein interactions derived from ChIP-chip experiments were obtained and with a P_value<0.001 were considered • A set of ‘trusted’ position weight matrices (PWMs) for 72 regulatory factors were derivedby running the PROCSE and PhyloGibbs algorithms on a set of experimentally derived TF binding sites from SCPD • These PWMs were then used to scan multiple alignments of each intergenic region in Yeast with the orthologous regions of another four Yeast species Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  10. Re-analyze knockout expression and ChIP binding data • Overlap between TF-binding and TF knockout data • Collect binding sites for 142 TFs, comprising 5,188 ChIP-chip interactions and 17,091 motif predictions • Calculate the intersection between the list of differentially expressed genes from the TF knockout and targets identified by ChIP-chip or binding-site predictions • 2,230 regulation relations Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  11. Re-analyze knockout expression and ChIP binding data • Checked the expression levels of the TFs • Intuitively one expects the TF under consideration to have lower expression in the mutant strain compared with the wild type strain • confirms this for 155 TFs • 78 TFs display a negative fold change at statistically non-significant levels • 36 TFs are lethal Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  12. Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  13. Re-analyze knockout expression and ChIP binding data • Examine functional annotations of differentially expressed genes • As most TFs are considered to regulate distinct cellular processes, their target genes should be associated with a coherent set of molecular and biological functions • Used g:Profiler to identify GO, KEGG and Reactome pathway annotations • Across all TF knockouts, this analysis has a higher score than the original analysis Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  14. Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

  15. SUMMARY - exploring biological networks

  16. Topology Approaches • What’s the next after constructing biological networks? • First of all, simple approaches • Degree, betweenness, clustering coefficient, topological coefficient, shortest path • Shared neighbors, neighborhood connectivity, closeness centrality

  17. Clustering Coefficient • Clustering coefficient is a measure of degree to which nodes in a graph tend to cluster together • Clustering coefficient (local version): does my neighbors connect with each other? • Evidence suggests that in most real-world networks, nodes tend to create tightly knit groups characterized by a relatively high density of ties whereki is the number of neighbors of node i and ei is the number of connected pairs between all neighbors of node i Luciano da F. Costa, Francisco A. Rodrigues, Alexandre S. Cristino. Complex networks: the key to systems biology. Genet. Mol. Biol. vol.31 no.3. 2008; http://med.bioinf.mpi-inf.mpg.de/netanalyzer

  18. Average Clustering Coefficient Distribution Define function C(k) as the average clustering coefficient of all nodes with k links For many real networks C(k) ~ k–1 Nodes with only a few links have a high C(k) and belong to highly interconnected small modules By contrast, the highly connected hubs have a low C(k), with their role being to link different, and otherwise not communicating, modules

  19. closeness centrality • Closeness centrality is ameasure of how many steps is required to access every other node from a given node • Closeness centrality:How long it will take information to spread from a given node to other reachable nodes in the network? wheredG(i, t) is the length of the shortest path from i to t, and V is the set of nodes in G Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994

  20. Distribution of closeness centrality Closeness centrality are successful in distinguishing the important members of the community Its distribution resembles a normal curve, while the other centrality measures have a long tail distribution similar to a power law

  21. Limitations of simple approaches • Study each node/edge individually; cannot apply enrichment study • Topology study only; difficult to integrate other knowledge • Nodes with high scores <> key genes/proteins Study a group of genes simultaneously

  22. Advanced approaches • Dense subgraph detection • Network motif detection • Graph clustering • Graph classification • etc.

  23. Dense subgraph detection Software available at http://zhoulab.usc.edu/CODENSE/

  24. Dense subgraph detection • A subgraph is considered coherent and dense if and only if every edge is well supported, and its corresponding second-order graph is dense CODENSE

  25. Network Motif Detection

  26. Network Motif Detection

  27. Perform graph join operation to find repeated size-k graphs Join each tree with it’s cousins to produce frequent motif candidates Ck. & h2 h1 t4_1 & & h4 h5 h3 t4_2

  28. Graph Clustering • Graph clustering is an organization process with the goal to put similar nodes together; the result is a partition of the network into a set of communities • MCL algorithm is a fast and scalable unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs, available at http://www.micans.org/mcl Van Dongen, S. (2000) Graph Clustering by Flow Simulation. PhD Thesis, University of Utrecht, The Netherlands

  29. Graph Clustering Graph Graph Clusters

More Related