1 / 46

Network Analysis and Visualization

Network Analysis and Visualization. Qualifying Paper Presentation March 9 th , 2005 Ketan Mane School of Library and Information Science Indiana University, Bloomington. Contents. Motivation Network Analysis Network Visualizations Network Augmentation and Interaction

boyle
Download Presentation

Network Analysis and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Analysis and Visualization Qualifying Paper Presentation March 9th, 2005 Ketan Mane School of Library and Information Science Indiana University, Bloomington

  2. Contents • Motivation • Network Analysis • Network Visualizations • Network Augmentation and Interaction • Software Tools & Libraries • Challenges and Opportunities • SRS Browser • Graph Matching • Conclusions Ketan Mane: Qualifying Presentation

  3. Top Gun Witness A Few Good Men Star Wars Motivation Networks are explored in different domains Sociology (Author-citation, Movie-actor, Friendship, Company Business Relations) Technology (Internet, WWW, Power-Grid) Internet WWW Ketan Mane: Qualifying Presentation

  4. Yeast PIN Metabolic Pathways Motivation Biology • (Protein interaction, Metabolic pathways) Transportation(Road, Rail, Airway) Ecology(Food web) Ketan Mane: Qualifying Presentation

  5. Motivation Many networks are huge(# of entities & links) Examples from different domains PNAS Journal Dataset (1982 – 2001) = 47,043 papers (Source: PNAS) Melanoma Literature Dataset (1960 – Feb 2004) = 53,804 papers (Source: Medline, Keyword: Melanoma) Ketan Mane: Qualifying Presentation

  6. Motivation Many networks are huge(# of entities & links) Examples from different domains PNAS Journal Dataset (1982 – 2001) = 47,043 papers Melanoma Literature Dataset (1960 – Feb 2004) = 53,804 papers (Source: Medline, Keyword: Melanoma) Ketan Mane: Qualifying Presentation

  7. Motivation Many networks are huge(# of entities) Examples from different domains PNAS Journal Dataset (1982 – 2001) = 47,043 papers (Source: PNAS)Melanoma Literature Dataset (1960 – Feb 2004) = 53,804 papers (Source: Medline, Keyword: Melanoma) Problem Statement How to make sense of very large-scale (> 103 entities) datasets? Ketan Mane: Qualifying Presentation

  8. Networks Network Representation Set of entities & inter-relations between entities Network Network are represented as graphs: G(N,E). Node Edge Network Analysis Builds on and extends graph theory. Social network analysis & recent network studies by physicist have developed measures to characterize/quantify networks. Network Visualization Aims to exploit human visual perception and cognition by generating aesthetic visualizations that ease data interpretation. Ketan Mane: Qualifying Presentation

  9. Networks • Goal of this review • Discuss existing research in network analysis and visualization from domain as diverse as social science, scientometrics/bibliometrics, physics, biology, Web and Internet research. • Develop a global classification scheme to compare and contrast existing approaches. • Identification of good combinations of network analysis and visualization techniques that improve the communication of large and complex network structures and dynamics. • Exemplary test of these combinations. • Exemplary combinations of network analysis and visualization • Discovery and visualization of major nodes. • Identification and visual depiction of the ‘backbone’ of a network. • Analysis and visualization of sub-networks and their interrelations. • Automatic mapping of structurally equivalent nodes. Ketan Mane: Qualifying Presentation

  10. path loop Network Analysis Network Analysis is applied to: Identify local and global network structure (network dynamics is not covered here). Understand advantages and disadvantages of a certain network configurations. Graph Theory Is an area of mathematics that studies properties of graphs (e.g. path, loop, etc.) First cited reference of graph usage  Euler’s (1736) solution to Königsberg bridge problem. Ketan Mane: Qualifying Presentation

  11. Current Techniques for Network Analysis Basic Network Properties node relations and connectivity Average Degree Total count of network edges over all network nodes Indicates local node connectivity information Average Path Length Average of the shortest path over all node pairs in a network Indicates global node reachability information (higher values  nodes are less reachable) Ketan Mane: Qualifying Presentation

  12. Log – log scale Current Techniques for Network Analysis Basic Network Properties Degree Distribution Count of N nodes with “k links” Indicates network topology Clustering Coefficient Evaluates network in terms of neighbors of a node being also neighbors of each other Indicates network clustering information Ketan Mane: Qualifying Presentation

  13. Current Techniques for Network Analysis Other measures to identify network size, clusters, complexity are reviewed in the qualifying paper. In addition, there exist measures that can be categorized into one of five categories: Topological Indicators Functional Indicators Complexity Indicators Sub-group Indicators Robustness Indicators Ketan Mane: Qualifying Presentation

  14. Current Techniques for Network Analysis Topological Indicators Diameter Largest of all shortest paths between any pair of network nodes. Indicates network size Degree Distribution Count of N nodes with k links Indicates network topology - hubs Power Law Exponent Small values  hubs are important 2 < γ < 3  hubs contact with small number of nodes γ > 3  hubs are not important Indicates relevance of hubs Measures for network sub-group connectivity Alpha Index Ratio of cycles in a network to its total node count Indicates network connectivity(higher value  connected network) Gamma Index Edge to node ratio Indicates network connectivity or density or network progression Ketan Mane: Qualifying Presentation

  15. Current Techniques for Network Analysis Functional Indicators Structural Equivalence Set of nodes with common neighbors Indicates network nodes with same function (Owing to same connectivity pattern with immediate neighbors makes the nodes substitutable) Betweenness Centrality Nodes/Edges responsible for flow of information between all network nodes Indicates important network nodes (Removal of these nodes/edges can cause network disintegration) Ketan Mane: Qualifying Presentation

  16. Current Techniques for Network Analysis Complexity Indicators Beta Index Ratio of network edges over network nodes (different formulae from gamma index) Indicates network structural complexity (Tree/Simple Network  β <= 1 |Complex Networks  high β values) Number of Cycles Compute the number of independent cycles in a network Indicates network structural complexity (Tree/Simple Network  no cycles |Complex Networks  cycles present) Ketan Mane: Qualifying Presentation

  17. Current Techniques for Network Analysis Sub-group Indicators Cliques Set of nodes strongly connected to each other as compared to other network nodes Indicates network sub-groups Structural Holes Gap or no connection indicator between two network groups Indicates network communication between sub-groups (Weaker connection between groups  exchange of non-redundant information) Ketan Mane: Qualifying Presentation

  18. Current Techniques for Network Analysis Robustness Indicators Redundancy Functionally similar set of nodes Indicates network resilience Connectivity Robustness Indicates capacity to remain connected and functional in an event of failure Indicates network robustness Ketan Mane: Qualifying Presentation

  19. Other Techniques for Network Analysis Graph Matching Methods Support network comparison based on structural network features and, if available, additional node attributes. Two structural graph matching algorithms: Similarity Flooding Assumption – similar nodes share similar neighboring nodes Initial mapping based on node labels + iterative propagation of mapping scores Result is a one-to-many match, similarity scores range from 0 to 1 ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems) Treats each node as a concept. Relations are established based on internal similarity of node labels and external semantics (relative position of node as compared to other nodes) One-to-one match between network nodes is obtained Ketan Mane: Qualifying Presentation

  20. Current Techniques for Network Analysis Clustering Methods Facilitate detection of sub-groups There are many clustering algorithms. Two basic algorithms are shown here K-means clustering (Non-hierarchical approach) k  desired number of clusters Similarity/distance measure and a means to compute centroid values Iterations to categorize nodes to clusters with nearest centroid value Wards Clustering (Agglomerative hierarchical approach) Starts with single node cluster  singleton Combine successive clusters with least value of Euclidean Sum of Squares (ESS) Finally single cluster with all nodes is obtained. Ketan Mane: Qualifying Presentation

  21. Regular Random Network Topologies Network Types Random Networks Hierarchical Networks Scale-free Networks Small-World Ketan Mane: Qualifying Presentation

  22. Network Visualization Network Visualization Powerful approach to communicate and explore network data Network structure can be easily perceived Hand drawn friendship network among 4th graders – Moreno 1934 Hand drawn friendship network among 4th graders – Moreno 1934 ОGirls Δ Boys Graph layout algorithms are used to depict relations in a dataset. Ketan Mane: Qualifying Presentation

  23. Current Techniques for Network Visualization Tree Layout Methods Radial Layout Algorithm (nodes arranged on concentric circles) Reingold and Tilford Algorithm Circular Layout Algorithm (for no interconnected sub-groups) Ketan Mane: Qualifying Presentation

  24. Current Techniques for Network Visualization Force-Directed Layout Methods Considers network as physical system with nodes connected by spring edges (Eades, 84) Force variations available – gravity, magnetic and electric 2D / 3D versions available Additional Information available at: http://iv.slis.indiana.edu/sw/spring.html Ketan Mane: Qualifying Presentation

  25. Current Techniques for Network Visualization Planarization Layout Methods Prevent overlapping edges Orthogonal layout (nodes and edges in grid setting) Visual Representation (horizontal nodes and vertical edges) Quasi-Orthogonal layout (non-orthogonal edges, no grid setting) Ketan Mane: Qualifying Presentation

  26. Current Techniques for Network Visualization • In general, all layout algorithms • Produce meaningful and interpretable layouts for smaller datasets but don’t scale to large datasets. • Large dataset  Convergence time too high • Fail to indicate major properties of a network e.g. weak links, hubs, backbones Ketan Mane: Qualifying Presentation

  27. Current Techniques in Network Augmentation & Interaction Information visualization techniques  used to expose network and its data dimensions Large networks cause node occlusion problem Useful Technique: Producing 3D layout  additional dimension to position the nodes 3D layout causes user disorientation problem Useful Technique : Euclidean plane approach can be adopted in 3D visual layout Hyperbolic layout in 3D-Viewer Ketan Mane: Qualifying Presentation

  28. Current Techniques for Network Visualization Display of multiple data dimensions problem Useful Technique : Visual Encoding: Strength and type of nodes and edges Nodes color, shape, size | Edges  color, edge-width, texture

  29. Current Techniques for Network Visualization Other techniques Zooming and Panning Zoom  magnify a dense cluster | Pan  cover larger span of network. Loss of (focus + context)  rectified by Fish Eye view. Details on Demand To reduce information overload  Information seeking mantra (Overview, Zoom + Filter and Details-on-Demand) Applying Threshold Data filtering technique to prune the data for interesting information. Animation Applicable to datasets with time stamps. 3D Layout Supports dataset view from multiple angles. Visualizing dataset as a terrain (VxInsight). Other Mesh Reduction Methods Alternate processing technique for non-selected nodes. Collapse node clusters to super-nodes. Schematic view approach  major edges are highlighted. Ketan Mane: Qualifying Presentation

  30. Current Software Systems and Libraries Network Analysis Software Focused on computing different network properties Visualization support for interactivity Pajek, UCINET, NetMiner, GRADAP, NetVis, SoNIA, SNA Package (R), Visione, … Network Visualization Software Provides different layout algorithms. Common Algorithms – Fruchterman Rheingold, Kamada-kawaii, Circular layout 2D/3D visualization support available. Netdraw, Krackplot, Tulip, Graphviz (AT&T), Walrus (CAIDA), Biolayout (EMBL), … Libraries for Network Analysis and Visualization Provides frameworks in support of analysis and visualization. Support fast prototyping of applications. JUNG, Prefuse, yWorks, Graph Visualization Framework (GVF), LEDA, … Packages and libraries provide easy access to existing algorithms. But they rarely facilitate algorithm integration or coupling. Ketan Mane: Qualifying Presentation

  31. Challenges and Opportunities How to effectively combine network analysis and visualization? Existing design principles: Visual grammars guide usage of map elements. Gestalt principles support pattern detection. Image Source: Colin Ware Ketan Mane: Qualifying Presentation

  32. Challenges and Opportunities Highlight prominent network features e.g. major nodes, clusters, important terms Clear segregation of different regions Map of the Interoute i-21 network spanning much of Europe Ketan Mane: Qualifying Presentation

  33. Challenges and Opportunities • Layering Approach • To reduce information density • Visualize data at multiple levels Image Source: http://www.infovis.net/images/Image124.jpg Ketan Mane: Qualifying Presentation

  34. 1st Pilot Study Two studies were conducted to test and evaluate proposed combinations of network analysis and visualization • I. SRS Browser • (Visual Interface to the Sequence Retrieval System) • Sequence Retrieval System • Software package - distributed by LION Bioscience Inc. • Supports keyword-based search across 400 heterogeneous biological databases • Generates a rank-ordered list of matching results Ketan Mane: Qualifying Presentation

  35. 1st Pilot Study I. SRS Browser (Visual Interface to the Sequence Retrieval System) Tabulated format – does not communicate relations among retrieved entities Ketan Mane: Qualifying Presentation

  36. 1st Pilot Study • SRS Browser • Identifies and visualizes relations among retrieved entities global overview • Supports interactivity to make sense of the rich association networks of different entities in support of knowledge discovery and management. Interactive filtering of network to display sub-network Ketan Mane: Qualifying Presentation

  37. 1st Pilot Study Features of SRS Browser Ketan Mane: Qualifying Presentation

  38. 1st Pilot Study Results Accomplishments Applicable to association studies in multiple domains. Associations help to derive hypotheses. Global picture of the dataset and supports interactivity. Challenges Data Integration/Federation multiple databases, format issues, domain knowledge required. Association discoverymultiple databases increase complexity. Scalability  account for huge dataset display, threshold application, layout algorithms. Tools Integration  account for input formats, parameter access methodologies. Ketan Mane: Qualifying Presentation

  39. Simple String Match Similarity Flooding ABSURDIST 2nd Pilot Study II. Graph Matching To identify common properties, shared functionality, etc. Compare simple string match + Similarity Flooding + ABSURDIST Ketan Mane: Qualifying Presentation

  40. Simple String Match Similarity Flooding ABSURDIST 2nd Pilot Study II. Graph Matching To identify common properties, shared functionality, etc. Compare simple string match + Similarity Flooding + ABSURDIST Ketan Mane: Qualifying Presentation

  41. Similarity Flooding Simple String Match ABSURDIST 2nd Pilot Study II. Graph Matching dataset I: same structure and node labels dataset II: single node label common and same structure Structural matching algorithms suitable for non-isomorphic data structures Ketan Mane: Qualifying Presentation

  42. 2nd Pilot Study Opportunities Identifies overlap, dissimilarity among networks Applicable to multiple domains: Sociology (friendship networks), Biology (align phylogenies), Commercial (align database schema) Challenges Determinism stochastic approach, cannot guarantee same results Algorithm performance isomorphic structure affects performance, algorithms scalability Visualization complexity to show relations - many to many match one to one match Ketan Mane: Qualifying Presentation

  43. Pilot Results Graph Matching Opportunities Identifies overlap, dissimilarity among networks Applicable to multiple domains: Sociology (friendship nw), Biology (align phylogenies), Commercial (align database schema) Challenges Reproducing results stochastic approach, cannot guarantee same results Algorithm performance isomorphic structure affects performance, algorithms scalability Visualization complexity to show relations - one to many match one to one match Ketan Mane: Qualifying Presentation

  44. Summary • Network Analysis  quantitative measures for structural analysis of networks • Topological Indicators • Functional Indicators • Complexity Indicators • Sub-group Indicators • Robustness Indicators • Network Visualization  utilize visual perception to make sense of complex networks Diverse layout algorithms have been discussed • Tree layout algorithms • Force-directed layout algorithms • Planar layout algorithms • Network Augmentation and Interaction Techniques  support data exploration • Software Tools/ Libraries  easy access of algorithms • Combination of NA and NV  helps communicate the structure and attributes of large scale networks • Two Pilot Studies  demonstrate the opportunities and challenges of coupling network analysis and network visualization. Ketan Mane: Qualifying Presentation

  45. Acknowledgements For useful feedback on earlier versions of this review, I would like to thank Dr. Katy Börner, Dr. Sun Kim, Dr. Javed Mostafa, Shashikant Penumarthy, Elijah Wright, Peter Hook and Dr. Kevin Boyack. This work is supported by a National Science Foundation grant DUE-0333623 & National Science Foundation CAREER Grant IIS-0238261. Also for sharing a chapter on Information Visualization in ARIST 2004, I would like to thank Dr. Blaise Cronin. Ketan Mane: Qualifying Presentation

  46. Thank You

More Related