1 / 36

Report due: March 30 , electronically submit, pdf format. Requirements:

Report due: March 30 , electronically submit, pdf format. Requirements: 8 Pages, 1’’ margin, 1.5 line spacing not including figures/tables. Figures/tables need to be attached at the end of the document. Include (but not limited to) the following components: For research paper:

gerik
Download Presentation

Report due: March 30 , electronically submit, pdf format. Requirements:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Report due: March 30, electronically submit, pdf format. Requirements: 8 Pages, 1’’ margin, 1.5 line spacing not including figures/tables. Figures/tables need to be attached at the end of the document. Include (but not limited to) the following components: For research paper: Background and significance of the work. What’s the technical improvement of the work over previous works? What could have been done better? If you were the authors, what’s your next step to extend this work? For review: Summarize the main points; Give some details on preferred methods.

  2. General Characteristics Directed Acyclic Graph and Gene Ontology Defining distances on DAGs Network and expression data Testing on an existing network Reverse engineering of networks Networks in Bioinformatics

  3. Network / Graph A network is a set of vertices connected by edges. undirected edges  “undirected network” directed edges  “directed network”. Vertex-level characteristic: The number of connections to a vertex : “degree” Incoming edges  “in-degree” ki Outgoing edges  “out-degree” ko k=ki+ko ki ko Evolution of networks. S.N. Dorogovtsev, J.F.F. Mendes

  4. Network Network-level characteristics: Number of vertices: N Number of edges: L Number of loops: I For an undirected network: I=L-N+1 Degree: The distribution of vertex degrees

  5. Network Distribution of shortest path: ℓμνis the shortest path between nodes u and v The mean value is called the “diameter” of the network Clustering coefficient: For each vertex, the fraction of existing connections between nearest neighbors of the vertex: C(μ) ≡ y(μ)/[z(μ) (z(μ) − 1)/2], z(μ): Number of neighboring vertices y(μ): Number of edges between the neighboring vertices Clustering coefficient C is the mean of C(μ)

  6. Scale-free Network Scale-free network: The degree distribution follows the power law: Few nodes are of high degree, while most nodes are of low degree. Contrast: random edge generation yields Poisson distribution.

  7. Scale-free Network Quote from the figure legend: Both networks contain 130 nodes and 215 links. Red, the five nodes with the highest number of links; green, their first neighbours. Nature 406(6794):378.

  8. Scale-free Network Why does power-law degree distribution make intuitive sense? Some nodes serve as “hubs”. This makes sense for WWW, and for biological networks, where controllers like the transcription factors are well known. One way to generate a network with such distribution is the “rich get richer” model by Barabási and Albert (1999): Initiate a network, with degree ≥ 1 for each node; Add new node to the network, linking to existing nodes with probabilities: ki is the degree of the node.

  9. Scale-free Network These networks exhibit “high tolerance to random perturbations but are sensitive to targeted attack on the highly connected nodes”. Why called “scale-free” ? The property of the network in independent of the number of nodes. This largely started from the WWW network. A large number of real-world networks, including biological networks are found to have power law degree distribution. However: Questions arose: power law ≠ the same architecture

  10. Scale-free Network The protein-protein interaction network is a scale-free network. S. Wuchty, E. Ravasz and A.-L. Baraba¶si: The Architecture of Biological Networks

  11. Directed Acyclic Graph (DAG) Directed graph with no directed loops, i.e. from any node, no route to come back to the same node. The structure leads to partial ordering of the nodes: If an edge ij exists, node i is at higher level than node j.

  12. The Gene-Ontology knowledge-base Organize knowledge about genes in a directed acyclic graph. The lower the level, the more detailed knowledge. Each gene is annotated to the terms, reflecting people’s knowledge about it.

  13. The Gene-Ontology knowledge-base Similar thinking has been used on the tree of life and other areas Mol. BioSyst., 2014, 10, 86-92

  14. The Gene-Ontology knowledge-base Here’s how people’s knowledge about the gene ACE2 is summarized using the database. Based on these papers:

  15. Gene ontology and high-throughput data Gene ontology was necessitated by high-throughput data --- when thousands of genes are measured simultaneously, people must be able to combine the results with existing knowledge in a computationally efficient way.

  16. Gene ontology and high-throughput data • Two general types of considerations: • Does a GO term have first-order association with the clinical outcome? • Does the GO term change its interactions with other functional units in response to the clinical factor?

  17. Gene ontology and high-throughput data How to deal with dependency between (neighboring) GO terms ? General strategies: Treat all GO terms as independent units, test for significant changes one-by-one, and let biologists remove the redundant information. Using the GO structure to remove redundant terms, and only test a small informative subset of all GO terms. Test for independence conditioned on the results of descendant nodes.

  18. Gene ontology and high-throughput data Given a GO term, how to find whether it is up- or down- regulated in association with disease is an active research area. We list a few examples here. Difficulty: Within each GO term, a number of genes exist. These genes in fact operate in a network fashion in the cell. Competitions and feed back loops are common. The genes in one GO term don’t change in one direction. In association with a disease, some are up-regulated, some are suppressed, and some don’t change.

  19. Gene ontology and high-throughput data GO term: positive regulation of I-kappaB kinase/NF-kappaB cascade Disease: Oral cancer metastasis

  20. Gene ontology and high-throughput data Cutoff-based methods: General Idea: Test significance gene-by-gene. Select a threshold level, divide all genes into two groups: differentially expressed and non-differentially expressed. For each GO term, test the hypothesis that the differentially expressed genes are drawn from the pool of all genes independent of the GO term. Hypergeometric Binomial Chi-square test … … … … The arbitrary threshold has substantial impact on the results.

  21. Gene ontology and high-throughput data Cutoff-free methods: Try to avoid the use of arbitrary threshold. Usually use permutation tests to find significance. This ensures the correlation structure between the genes are preserved. With group of genes to analyze, the hypothesis becomes complicated. Different method may use different assumptions and test for different hypotheses.

  22. Gene ontology and high-throughput data Comparing the p-value (or correlation, or other statistics) distributions from one GO term to the overall distribution: • Kolmogorov–Smirnov goodness-of-fit test statistic for comparing two distributions • Anderson–Darling test statistic for testing for a uniform distribution • Wilcoxon rank-sum test statistic JOURNAL OF COMPUTATIONAL BIOLOGY. 13:798.

  23. GSEA. PNAS vol. 102 no. 43 15545-15550

  24. GSEA. PNAS vol. 102 no. 43 15545-15550

  25. GSDCA. Single gene set gene set pairs

  26. GSDCA.

  27. Testing on the network Goal: Utilize existing network to aid biomarker selection (“network marker”) disease mechanism finding predictive model building Data: A network between biological units Signal transduction network Genetic interaction network Protein-protein interaction network TF regulatory network …… Expression data

  28. Testing on the network • Example: Local over-representation • Pre-select significant genes • Search all ego-networks of predefined radius for over-represented ones • Equivalent to the overrepresentation analysis in gene set analysis. Ann. Appl. Stat. (Epub ahead of print)

  29. Testing on the network An example of machine-learning approach. MolSyst Biol. 2007; 3: 140.

  30. Testing on the network Network markers: Diamond – univariate significant MolSyst Biol. 2007; 3: 140.

  31. Testing on the network • Example: A Bayesian framework • Univariate test of all genes • Transform p-values to normal quantiles • Assume a gene is either “1” (disease related) or “0” (unrelated) • Use a network-based mixture model – neighboring genes are more likely to share status Ann. Appl. Stat. (Epub ahead of print)

  32. Reverse engineering of networks from microarray data Goal: infer genetic regulation network structure from microarray data Key assumption: The mRNA level measured by microarray truly reflects the activity of the regulator Sadly this is only true for ~20% of the regulators Methods incorporating more data/knowledge are developed

  33. Reverse engineering of networks from microarray data Margolin & Califano, Ann N Y Acad Sci. 2007,1115:51. Hesselberth et al. Genome Biology. 2006,7:R30.

  34. Reverse engineering of networks from microarray data Correlation Partial correlation (Gaussian graphic models) Expression data alone Mutual information Bayesian network Expression data + other information Known ranscription factor targets ChIP-chip and ChIP-seq Known interactions/pathways …

  35. Reverse engineering of networks from microarray data Differentiating mechanisms of co-regulation based on expression data alone is a daunting task. Margolin & Califano, Ann N Y Acad Sci. 2007,1115:51.

  36. Reverse engineering of networks from microarray data

More Related