1 / 33

A Brief Overview on Some Recent Study of Graph Data

A Brief Overview on Some Recent Study of Graph Data . Yunkai Liu, Ph. D., Gannon University. Outlines. Graph Database vs. Traditional Database Data structure Some frequently-used measurements Overview of Graph Databases Graph Data on Social Networks Case study Graph Data on Biology

xander
Download Presentation

A Brief Overview on Some Recent Study of Graph Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief Overview on Some Recent Study of Graph Data Yunkai Liu, Ph. D., Gannon University

  2. Outlines • Graph Database vs. Traditional Database • Data structure • Some frequently-used measurements • Overview of Graph Databases • Graph Data on Social Networks • Case study • Graph Data on Biology • Case study • Graph Data on other areas

  3. What is the specialty of graph data in application • Basic Data Structure • G = (N, E) • Sometime edges are also named as links • Some difference / limitation • Directed graph • Contains a large amount of attribute categories in nodes • Contains limited amount of attributes categories in edges • Rarely using adjacent matrices; hash table and indices are widely used • Example – SN between us

  4. Some frequently-addressed graph properties • Homophily is the tendency to relate to people with similar characteristics (status, beliefs, etc.) • It leads to the formation of homogeneous groups (clusters) where forming relations is easier • Extreme homogenization can act counter to innovation and idea generation (heterophilyis thus desirable in some contexts) • Homophilousties can be strong or weak

  5. Some frequently-addressed graph properties • Transitivity is a property of ties: if there is a tie between A and B and one between B and C, then in a transitive network A and C will also be connected • Strong ties are more often transitive than weak ties; transitivity is therefore evidence for the existence of strong ties (but not a necessary or sufficient condition) • Transitivity and homophily together lead to the formation of cliques (fully connected clusters) • How to decide reasonable transitivity degree in graph models?

  6. Some frequently-addressed graph properties • Bridges are nodes and edges that connect across groups • Facilitate inter-group communication, increase social cohesion, and help spur innovation • They are usually weak ties, but not every weak tie is a bridge

  7. Some frequently-addressed graph properties -Degree centrality • A node’s (in-) or (out-)degree is the number of links that lead into or out of the node • In an undirected graph they are of course identical • Often used as measure of a node’s degree of connectedness and hence also influence and/or popularity • Useful in assessing which nodes are central with respect to spreading information and influencing others in their immediate ‘neighborhood’

  8. Some frequently-addressed graph properties -Paths • A path between two nodes is any sequence of non-repeating nodes that connects the two nodes • The shortest path between two nodes is the path that connects the two nodes with the shortest number of edges (also called the distance between the nodes) • All shortest paths • K-th shortest path

  9. Some frequently-addressed graph properties – Betweeness centrality • The number of shortest paths that pass through a node divided by all shortest paths in the network • Sometimes normalized such that the highest value is 1 • Shows which nodes are more likely to be in communication paths between other nodes • Also useful in determining points where the network would break apart.

  10. Some frequently-addressed graph properties – Closeness centrality • The mean length of all shortest paths from a node to all other nodes in the network (i.e. how many hops on average it takes to reach every other node) • It is a measure of reach, i.e. how long it will take to reach other nodes from a given starting node • Useful in cases where speed of information dissemination is main concern • Lower values are better when higher speed is desirable

  11. Some frequently-addressed graph properties – Eigenvector centrality • A node’s eigenvector centrality is proportional to the sum of the eigenvector centralities of all nodes directly connected to it • In other words, a node with a high eigenvector centrality is connected to other nodes with high eigenvector centrality • This is similar to how Google ranks web pages: links from highly linked-to pages count more • Useful in determining who is connected to the most connected nodes

  12. Others measurements • Reciprocity (degree of) • The ratio of the number of relations which are reciprocated (i.e. there is an edge in both directions) over the total number of relations in the network • A useful indicator of the degree of mutuality and reciprocal exchange in a network, which relate to social cohesion • Only makes sense in directed graphs

  13. Others measurements • Density • A network’s density is the ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes (which is n(n-1)/2, where n is the number of vertices, for an undirected graph) • It is a common measure of how well connected a network is (in other words, how closely knit it is) –a perfectly connected network is called a clique and has density=1 • A directed graph will have half the density of its undirected equivalent, because there are twice as many possible edges, i.e. n(n-1) • Density is useful in comparing networks against each other, or in doing the same for different regions within a single network

  14. Others measurements • Clustering • A node’s clustering coefficient is the density of its neighborhood(i.e. the network consisting only of this node and all other nodes directly connected to it) • The clustering coefficient for an entire network is the average of all coefficients for its nodes • Clustering indicative of the presence of different (sub-)communities in a network

  15. Others measurements • Average and longest distance • The longest shortest path (distance) between any two nodes in a network is called the network’s diameter • It also indicates how long it will take at most to reach any node in the network (sparser networks will generally have greater diameters) • The average of all shortest paths in a network is also interesting because it indicates how far apart any two nodes will be on average (average distance)

  16. What is Graph Database • Graph database started in 1970s • It is growing fast recently due to the development of computer science tech. • Some GD claimed that they can represent millions of nodes and billions of edges • GD is a part of NoSQL database

  17. Social Network Analysis (SNA) • News • In 2013 Feb, Facebook announced their new “graph search” app • Major questions • Networks: How to represent various social networks • Tie Strength: How to identify strong/weak ties in the network • Key Players: How to identify key/central nodes in network • Cohesion: How to characterize a network’s structure • Major application • Social study • National security • Micro-advertisement • …

  18. Some of my project • Meth-Hunter • Graph Data Management system • Graph Data warehouse protocol

  19. NodeXL - emails

  20. NodeXL - Facebook

  21. Graph Data in Biology • Multiple classes of bionetwork models exist, such as metabolic, protein-gene, or protein-protein interactions • Metabolic networks entail nodes as metabolites and edges as enzymes facilitating a specific reaction within the body or nature. • Protein-gene interactions involve understanding and mapping gene expression. • As with metabolic and gene expression, protein-protein interaction networks include nodes as proteins

  22. Graph Data in Biology • The structure of bio-network is important for us to understand the nature • The analysis part is similar with SNA, • The clique-finding is important and it may related with tumar.

  23. One case study – bionetwork alignment • Two previous models include Graemlin (General and robust alignment of multiple large interaction networks) and PHUNKEE (Pairing subgrapHs Using NetworK Environment Equivalence) • As Graemlin considers the entire network spectrum, the PHUNKEE algorithm considers only the most conserved portions between two graphs

  24. One case study – bionetwork alignment • Graemlinwas advantageous in that it could align multiple networks at a fast pace, however; all nodes and edges are considered whether or not they are similar to each other. • On the contrary, PHUNKEE considers only the most conserved portions of two graphs, taking into account that insertions and deletions may occur over time. However, the algorithm performs slowly, working in a step-by-step manner.

  25. One case study – bionetwork alignment • we realized that one method is not enough to determine the relationship between two graphs because of various factors from data. Thus, we create a comprehensive package for pairwise graph comparison. • The package includes two interfaces; one is for global alignment and another for local alignment. • Transitivity property is also considered in case of missing nodes or missing edges.

  26. The bionetworks of four species in our experiment.

  27. The comparisons between three species and Homo sapiens.

  28. A Cladogram for Rattusnorvegicus, Musmusculus and Saccharomyces cerevisiae

  29. Some Weird Part • The normalization of the data is a big challenge. It is easy to get a wrong conclusion, which is yeast is more close to human than mice. • It is just an example of graph mining in bioinformatics

  30. Other area of Graph Data • GIS • Financial / business • Public spending • Gaming • Some challenges of GD in CS • Cloud app and cloud computing • Visualization • Integrating with other databases

More Related