1 / 9

Applying gSPAN on Social Network Graphs

Applying gSPAN on Social Network Graphs. - Abhik Ray WSU ID: 11199134. Graph Mining. Extension of traditional data mining techniques to graph data. Focus on extracting patterns from relationships between entities rather than from entities themselves.

ranger
Download Presentation

Applying gSPAN on Social Network Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying gSPAN on Social Network Graphs -Abhik Ray WSU ID: 11199134

  2. Graph Mining • Extension of traditional data mining techniques to graph data. • Focus on extracting patterns from relationships between entities rather than from entities themselves. • Ex. of graph data are Social Networks (Facebook), Chemical Compounds, Biological Networks etc.

  3. Frequent Subgraph Discovery • Extension of Frequent Pattern Discovery. • Unsupervised Data Mining Approach. • Given a set of graphs GD find all subgraphs that are present in this set above a given frequency threshold. • Two main paradigms: • Candidate Generation Approach • Pattern Growth Approach

  4. gSpan Terminology • graph-based Substructure patternmining • Backward extension: An edge is added between two nodes already present in the subgraph being considered. • Forward extension: An edge is added between a node already in the subgraph and another not in the subgraph. • General graph pattern growth proceeds by taking each discovered subgraph ‘g’ and performing extensions recursively until all frequent subgraphs which have ‘g’ embedded in them have been discovered.

  5. gSpan • Creates a DFS for a frequent subgraph from a seed vertex. • Builds a linear order among the visited vertices by using subscripts. • The starting vertex becomes v0 and the ending vertex becomes vn(also called rightmost vertex). The path from v0 to vn is the rightmost path. • A new edge is now added between the rightmost vertex and any other vertex on the rightmost path or a new vertex is created and connected to any of the vertices on the rightmost path. • Duplicate generation is avoided by converting the DFS trees to DFS codes, choosing the minimum code and performing extensions only on that code.

  6. Experiments • Wiki Vote: Wikipedia Request For Adminship who-votes-on-whom dataset. • 0.005 random edge sample taken 100 times. • Vertices in the samples sorted and renumbered in sequential order. • Conversion to gSpan format • gSpan run on dataset with 10% frequency. • Top four subgraphs selected based on score, where scored = orderd * frequencyd

  7. 0 1 10 2 9 3 8 4 7 5 6 Results • Characteristics of social networks like triangle closing edges not found.

  8. Improvements • After each sample is taken, throw away the edges in that sample from the main graph. • Extend gSpan to handle directed edges. • Use more sophisticated sampling techniques like Forest Fire Sampling Techniques.

  9. ???? Thank you

More Related