1 / 130

CS 6293 Advanced Topics: Translational Bioinformatics

CS 6293 Advanced Topics: Translational Bioinformatics. Biological networks: Theory and applications. Lecture outline. Basic terminology and concepts in networks Some interesting results between network properties and biological functions Network clustering / community discovery

senta
Download Presentation

CS 6293 Advanced Topics: Translational Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 6293 Advanced Topics: Translational Bioinformatics Biological networks:Theory and applications

  2. Lecture outline • Basic terminology and concepts in networks • Some interesting results between network properties and biological functions • Network clustering / community discovery • Applications of network clustering methods

  3. Network • A network refers to a graph • An useful concept in analyzing the interactions of different components in a system

  4. Biological networks • An abstract of the complex relationships among molecules in the cell • Many types. • Protein-protein interaction networks • Protein-DNA(RNA) interaction networks • Genetic interaction network • Metabolic network • Signal transduction networks • (real) neural networks • Many others • In some networks, edges have more precise meaning. In some others, meaning of edges is obscure

  5. Protein-protein interaction networks • Yeast PPI network • Nodes – proteins • Edges – interactions The color of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown).

  6. Obtaining biological networks • Direct experimental methods • Protein-protein interaction networks • Yeast-2-hybrid • Tandem affinity purification • Co-immunoprecipitation • Protein-DNA interaction • Chromatin Immunoprecipitation (followed by microarray or sequencing, ChIP-chip, ChIP-seq) • High level of noises (false-positive and false-negative) • Computational prediction methods • Often cannot differentiate direct and indirect interactions

  7. Why networks? • Studying genes/proteins on the network level allows us to: • Assess the role of individual genes/proteins in the overall pathway • Evaluate redundancy of network components • Identify candidate genes involved in genetic diseases • Sets up the framework for mathematical models For complex systems, the actual output may not be predictable by looking at only individual components: The whole is greater than the sum of its parts

  8. Graphs • A graph G = (V, E) • V = set of vertices • E = set of edges = subset of V  V • Thus |E| = O(|V|2) 1 Vertices: {1, 2, 3, 4} Edges: {(1, 2), (2, 3), (1, 3), (4, 3)} 2 4 3

  9. Graph Variations (1) • Directed / undirected: • In an undirected graph: • Edge (u,v)  E implies edge (v,u)  E • Road networks between cities • In a directed graph: • Edge (u,v): uv does not imply vu • Street networks in downtown • Degree of vertex v: • The number of edges adjacency to v • For directed graph, there are in-degree and out-degree

  10. 1 1 2 4 2 4 3 3 In-degree = 3 Out-degree = 0 Degree = 3 Directed Undirected

  11. Graph Variations (2) • Weighted / unweighted: • In a weighted graph, each edge or vertex has an associated weight (numerical value) • E.g., a road map: edges might be weighted w/ distance 1 1 0.3 2 4 2 4 1.2 0.4 1.9 3 3 Weighted Unweighted

  12. Graph Variations (3) • Connected / disconnected: • A connected graphhas a path from every vertex to every other • A directed graph is strongly connectedif there is a directed path between any two vertices 1 2 4 Connected but not strongly connected 3

  13. Graph Variations (4) • Dense / sparse: • Graphs are sparsewhen the number of edges is linear to the number of vertices • |E|  O(|V|) • Graphs are densewhen the number of edges is quadratic to the number of vertices • |E|  O(|V|2) • Most graphs of interest are sparse • If you know you are dealing with dense or sparse graphs, different data structures may make sense

  14. Representing Graphs • Assume V = {1, 2, …, n} • An adjacency matrixrepresents the graph as a n x n matrix A: • A[i, j] = 1 if edge (i, j)  E = 0 if edge (i, j)  E • For weighted graph • A[i, j] = wij if edge (i, j)  E = 0 if edge (i, j)  E • For undirected graph • Matrix is symmetric: A[i, j] = A[j, i]

  15. Graphs: Adjacency Matrix • Example: 1 2 4 3

  16. Graphs: Adjacency Matrix • Example: 1 2 4 3 How much storage does the adjacency matrix require? A: O(V2)

  17. 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 0 Graphs: Adjacency Matrix • Example: A 1 2 3 4 1 1 2 2 4 3 3 4 Undirected graph

  18. 0 5 6 0 5 0 9 0 6 9 0 4 0 0 4 0 Graphs: Adjacency Matrix • Example: A 1 2 3 4 1 1 5 2 6 2 4 3 9 4 3 4 Weighted graph

  19. Graphs: Adjacency Matrix • Time to answer if there is an edge between vertex u and v: Θ(1) • Memory required: Θ(n2) regardless of |E| • Usually too much storage for large graphs • But can be very efficient for small graphs • Most large interesting graphs are sparse • E.g., road networks (due to limit on junctions) • For this reason the adjacency list is often a more appropriate representation

  20. Graphs: Adjacency List • Adjacency list: for each vertex v  V, store a list of vertices adjacent to v • Example: • Adj[1] = {2,3} • Adj[2] = {3} • Adj[3] = {} • Adj[4] = {3} • Variation: can also keep a list of edges coming into vertex 1 2 4 3

  21. Graph representations • Adjacency list 1 2 3 3 2 4 3 3 How much storage does the adjacency list require? A: O(V+E)

  22. A 1 2 3 4 1 0 1 1 0 2 1 0 1 0 3 1 1 0 1 4 0 0 1 0 Graph representations • Undirected graph 1 2 4 3 2 3 1 3 1 2 4 3

  23. A 1 2 3 4 1 0 5 6 0 2 5 0 9 0 3 6 9 0 4 4 0 0 4 0 Graph representations • Weighted graph 1 5 6 2 4 9 4 3 2,5 3,6 1,5 3,9 1,6 2,9 4,4 3,4

  24. Graphs: Adjacency List • How much storage is required? • For directed graphs • |adj[v]| = out-degree(v) • Total # of items in adjacency lists is  out-degree(v) = |E| • For undirected graphs • |adj[v]| = out-degree(v) • # items in adjacency lists is degree(v) = 2 |E| • So: Adjacency lists take (V+E) storage • Time needed to test if edge (u, v)  E is O(n)

  25. Tradeoffs between the two representations |V| = n, |E| = m Both representations are very useful and have different properties, although adjacency lists are probably better for most problems

  26. Structural properties of networks • Degree distribution • Average shortest path length • Clustering coefficient • Community structure • Degree correlation • Motivation to study structural properties: • Structure determines function • Functional structural properties may be shared by different types of real networks (bio or non-bio)

  27. Degree distribution P(k) • The probability that a selected node has exactly (or approximately) k links. • P(k) is obtained by counting the number of nodes N(k) with k = 1, 2… links divided by the total number of nodes N.

  28. Erdos-Renyi model • Each pair of nodes have a probability p to form an edge • Most nodes have about the same # of connections • Degree distribution is binomial or Poisson

  29. Real networks: scale-free • Heavy tail distribution • Power-law distribution • P(k) = k-r

  30. Comparing Random and Scale-free distribution • In the random network, the five nodes with the most links (in red) are connected to only 27% of all nodes (green). In the scale-free network, the five most connected nodes (red) are connected to 60% of all nodes (green) (source: Nature)

  31. Robust yet fragile nature of networks

  32. Shortest and mean path length • Distance in networks is measured with the path length • As there are many alternative paths between two nodes, the shortest path between the selected nodes has a special role. • In directed networks, • AB is often different from the BA • Often there is no direct path between two nodes. • The average path length between all pairs of nodes offers a measure of a network’s overall navigability. • most pairs of vertices in a biological network seem to be connected by a short path – small-world property

  33. Clustering coefficient • Your clustering coefficient: the probability that two of your friends are also friends • You have m friends • Among your m friends, there are n pairs of friends • The maximum is m * (m-1) / 2 • C = 2 n / (m^2-m) • Clustering coefficient of a network: the average clustering coefficient of all individuals

  34. Clustering Coefficient ith node has ki neighbors linking with it Ci=2Ei/ki(ki-1)=2/9 Ei is the actual number of links between ki neighbors maximal number of links between ki neighbors is ki(ki-1)/2 The probability that two of your friends are also friends • Clustering coefficient of a network: average clustering coefficient of all nodes

  35. Degree correlation • Do rich people tend to hang together with rich people (rich-club)? • Or do they tend to interact with less wealthy people? • Do high degree nodes tend to connect to low degree nodes or high degree ones?

  36. Some interesting findings from biological networks • Jeong, Lethality and centrality in protein networks. Nature411, 41-42 (3 May 2001) • Roger Guimerà and Luís A. Nunes Amaral, Functional cartography of complex metabolic networks. Nature433, 895-900 (24 February 2005) • Han, et. al. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature430, 88-93 (1 July 2004)

  37. Connectivity vs essentiality % of essential proteins Number of connections Jeong et. al. Nature 2001

  38. Community role vs essentiality • Effect of a perturbation cannot depend on the node’s degree only! • Many hub genes are not essential • Some non-hub genes are essential • Maybe a gene’s role in her community is also important • Local leader? Global leader? Ambassador? • Guimerà and Amaral, Nature433, 2005

  39. Community structure

  40. Role 1, 2, 3: non-hubs with increasing participation indices • Role 5, 6: hubs with increasing participation indices

  41. Dynamically organized modularity in the yeast PPI network • Protein interaction networks are static • Two proteins cannot interact if one is not expressed • We should look at the gene expression level • Han, et. al, Nature430, 2004

  42. Obtaining Data

  43. Distinguish party hubs from date hubs • Red curve – hubs • Cyan curve – nonhubs • Black curve – randomized • Partners of date hubs are significantly more diverse in spatial distribution than partners of party hubs

  44. Effect of removal of nodes on average geodesic distance Original Network On removal of date hubs On removal of party hubs Green – nonhub nodes Brown – hubs Red – date hubs Blue – party hubs The ‘breakdown point’ is the threshold after which the main component of the network starts disintegrating.

  45. Dynamically organized modularity Red circles – Date hubs Blue squares - Modules

  46. Han-Yu Chuang, Eunjung Lee, Yu-Tseung Liu, Doheon Lee, Trey Ideker, Network-based classification of breast cancer metastasis, Mol Syst Biol. 2007; 3: 140.

  47. Challenge: Predict Metastasis If metastasis is likely => aggressive adjuvant therapy How to decide the likelihood? Traditional predictive factors are not good

  48. Recently: Gene Marker Sets Examine genome-wide expression profiles Score individual genes for how well they discriminate between different classes of disease Establish gene expression signature Problem: # genes >> # patients

More Related