slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing PowerPoint Presentation
Download Presentation
Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing

Loading in 2 Seconds...

play fullscreen
1 / 37

Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph. Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing.com George Karypis, Dept. of Computer Science, University of Minnesota. Outline. Problem definition

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing' - fairly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph

Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing.com

George Karypis, Dept. of Computer Science, University of Minnesota

outline
Outline
  • Problem definition
  • Prior work
  • Problem and Approach
  • Results
  • Issues and Conclusions
graph datasets
Graph Datasets
  • Flexible and powerful representation
    • Evidence extraction and link discovery (EELD)
    • Social Networks/Web graphs
    • Chemical compounds
    • Protein structures
    • Biological Pathways
    • Object recognition and retrieval
    • Multi-relational datasets
finding patterns in graphs many dimensions

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph.

In SIAM International Conference on Data Mining (SDM-04), 2004.

http://citeseer.ist.psu.edu/article/kuramochi04finding.html

Finding Patterns in GraphsMany Dimensions
  • Structure of the graph dataset
    • many small graphs
      • graph transaction setting
    • one large graph
      • single-graph setting
  • Type of patterns
    • connected subgraphs
    • induced subgraphs
  • Nature of the algorithm
    • Finds all patterns that satisfy the minimum support requirement
      • Complete
    • Finds some of the patterns
      • Incomplete
  • Nature of the pattern’s occurrence
    • The pattern occurs exactly in the input graph
      • Exact algorithms
    • There is a sufficiently similar embedding of the pattern in the graph
      • Inexact algorithms
  • MIS calculation for frequency
    • exact
    • approximate
    • upper bound
  • Algorithm
    • vertical (depth-first)
    • horizontal (breadth-first)
single graph setting

Size 6

Frequency = 1

Input Graph

Size 7

Frequency = 6

Single Graph Setting
  • Find all frequent subgraphs from a single sparse graph.
  • Choice of frequency definition
vs i g ra m vertical solution
vSIGRAM: Vertical Solution
  • Candidate generation by extension
    • Add one more edge to a current embedding.
    • Solve MIS on embeddings in the same equivalence class.
    • No downward-closure-based pruning
  • Two important components
    • Frequency-based pruning of extensions
    • Treefication based on canonical labeling
vs i g ra m connection table
vSIGRAM: Connection Table
  • Frequency-based pruning.
  • Trying every possible extension is expensive and inefficient.
    • A particular extension might have been tested before.
  • Categorize extensions into equivalent classes (in terms of isomorphism), and record if each class is frequent or not.
  • If a class becomes infrequent, never try it in later exploration.
parallelization
Parallelization
  • Two clear sources of parallelism in the algorithm
    • Amount of parallelism from each source not known in advance
  • The code is typical C code
    • structs, pointers, frequent mallocs/frees of small areas, etc.
    • nothing like the “Fortran”-like (dense linear algebra) examples shown for many parallel programming methods
  • Parallel structures need to accommodate dynamic parallelism
    • Dynamic specification of parallel work
    • Dynamic allocation of processors to work
  • Chose OpenMP taskq/task constructs
    • Proposed extensions to OpenMP standard
    • Support parallel work being defined in multiple places in a program, but be placed on a single conceptual queue and executed accordingly
    • ~20 lines of code changes in ~15,000 line program
  • Electric Fence was very useful in finding coding errors
algorithmic parallelism
Algorithmic Parallelism

vSiGraM (G, MIS_type, f)

1. F ← 

2. F1 ← all frequent size-1 subgraphs in G

3. for each F1 in F1 do

4. M(F1) ← all embeddings of F1

5. for each F1 in F1 do // high-level parallelism

6. F ← F vSiGraM-Extend(F1, G, f)

return F

vSiGraM-Extend(Fk, G , f)

1. F ← 

2. for each embedding m in M(Fk) do // low-level parallelism

3. Ck+1 ← Ck+1 {all (k+1)-subgraphs of G containing m}

4. for each Ck+1 in Ck+1 do

5. if Fk is not the generating parent of Ck+1 then

6. continue

7. compute Ck+1.freq from M(Ck+1)

8. if Ck+1.freq < f then

9. continue

10. F ← F vSiGraM-Extend(Ck+1, G, f)

11.return F

simple taskq task example
Simple Taskq/Task Example

main()

{

int val;

#pragma intel omp taskq

val = fib(12345);

}

fib(int n)

{

int partret[2];

if (n>2)

#pragma intel omp task

for(i=n-2; i<n; i++) {

partret[n-2-i] = fib(i);

}

return (partret[0] + partret[1]);

} else {

return 1;

}

}

high level parallelism with taskq task
High-Level Parallelism with taskq/task

// At the bottom of expand_subgraph, after all child

// subgraphs have been identified, start them all.

#pragma intel omp taskq

for (ii=0; ii<sg_set_size(child); ii++) {

#pragma intel omp task captureprivate(ii)

{

SubGraph *csg = sg_set_at(child,ii);

expand_subgraph(csg, csg->ct, lg, ls, o);

} // end-task

}

low level parallelism with taskq task
Low-Level Parallelism with taskq/task

#pragma omp parallel shared(nt, priv_es)

{

#pragma omp master

{

nt = omp_get_num_threads(); //#threads in par

priv_es = (ExtensionSet **)kmp_calloc(nt, sizeof(ExtensionSet *));

}

#pragma omp barrier

#pragma intel omp taskq

{

for (i = 0; i < sg_vmap_size(sg); i++) {

#pragma intel omp task captureprivate(i)

{

int th = omp_get_thread_num();

if (priv_es[th] == NULL) {

priv_es[th] = exset_init(128);

}

expand_map(sg, ct, ams, i, priv_es[th], lg);

}

}

}

} // end parallel section; next loop is serial reduction

for (i=0; i < nt; i++) {

if (priv_es[i] != NULL) {

exset_merge(priv_es[i],es);

}

}

kmp_free(priv_es);

}

Implementation due to Grant Haab and colleagues from Intel OpenMP library group

experimental results
Experimental Results
  • SGI Altix™ 32 Itanium2™ sockets (64 cores), 1.6GHz
  • 64 GBytes (though not memory limited)
  • Linux
  • No special dplace/cpuset configuration
  • Minimum frequencies chosen to illuminate scaling behavior, not provide maximum performance
performance of high level parallelism
Performance of High-level Parallelism
  • When sufficient quantity of work (i.e., frequency threshold is low enough)
    • Good speed-ups to 16P
    • Reasonable speed-ups to 30P
    • Little or no benefit above 30P
    • No insight into performance plateau
poor performance of low level parallelism
Poor Performance of Low-level Parallelism
  • Several possible effects ruled out
    • Granularity of data allocation
    • Barrier before master-only reduction
  • Source: highly variable times for register_extension
    • ~100X slower in parallel than serial, …
    • but different instances from execution to execution
    • Apparently due to highly variable run-times for malloc
    • Not understood
issues and conclusions
Issues and Conclusions
  • OpenMP taskq/task were straightforward to use in this program and implemented the desired model
  • Performance was good to a medium range of processor counts (best 26X on 30P)
  • Difficult to gain insight into lack of performance
    • High-level parallelism 30P and above
    • Low-level parallelism
aviation dataset
Aviation Dataset
  • Generally, vSIGRAM is 2-5 times faster than hSIGRAM (with exact and upper bound MIS)
  • Largest pattern contained 13 edges.
citation dataset
Citation Dataset
  • But, hSIGRAM can be more efficient especially with upper bound MIS (ub).
  • Largest pattern contained 16 edges.
vlsi dataset
VLSI Dataset
  • Exact MIS never finished.
  • Longest pattern contained 5 edges (constraint).
comparison with subdue
Comparison with SUBDUE
  • Similar results with SEuS
summary
Summary
  • With approximate and exact MIS, vSIGRAM is 2-5 times faster than hSIGRAM.
  • With upper bound MIS, however, hSIGRAM can prune a larger number of infrequent patterns.
    • The downward closure property plays the role.
  • For some datasets, using exact MIS for frequency counting is just intractable.
  • Compared to SUBDUE, SIGRAM finds more and longer patterns in shorter amount of runtime.
thank you
Thank You!
  • Slightly longer version of this paper is also available as a technical report.
  • SIGRAM executables will be available for download soon from http://www.cs.umn.edu/~karypis/pafi/
complete frequent subgraph mining existing work so far
Complete Frequent Subgraph Mining—Existing Work So Far
  • Input: A set of graphs (transactions) + support threshold
  • Goal: Find all frequently occurring subgraphs in the input dataset.
    • AGM (Inokuchi et al., 2000), vertex-based, may not be connected.
    • FSG (Kuramochi et. al., 2001), edge-based, only connected subgraphs
    • AcGM (Inokuchi et al., 2002), gSpan (Yan & Han, 2002), FFSM (Huan et al., 2003), etc. follow FSG’s problem definition.
  • Frequency of each subgraph 

The number of supporting transactions.

    • Does not matter how many embeddings are in each transaction.
what is the reasonable frequency definition
What is the reasonable frequency definition?
  • Two reasonable choices:
    • The frequency is determined by the total number of embeddings.
      • Not downward closed.
      • Too many patterns.
      • Artificially high frequency of certain patterns.
    • The frequency is determined by the number of edge-disjoint embeddings (Vanetik et al, ICDM 2002).
      • Downward closed.
      • Since each occurrence utilizes different sets of edges, occurrence frequencies are bounded.
      • Solved by finding the maximum independent set (MIS) of the embedding overlap graph.
embedding overlap and mis
Edge-disjoint embeddings

{ E1, E2, E3 }

{ E1, E2, E4 }

Create an overlap graph and solve MIS

Vertex  Embedding

Edge  Overlap

Embedding Overlap and MIS

E2

E1

E3

E4

ok definition is fine but
OK. Definition is Fine, but …
  • MIS-based frequency seems reasonable.
  • Next question: How to develop mining algorithms for the single graph setting.
how to handle single graph setting
How to Handle Single Graph Setting?
  • Issue 1: Frequency counting
    • Exact MIS is often intractable.
  • Issue 2: Choice of search scheme
    • Horizontal (breadth-first)
    • Vertical (depth-first)
issue 1 mis based frequency
Issue 1: MIS-Based Frequency
  • We considered approximate (greedy) and upperbound MIS too.
    • Approximate MIS may underestimate the frequency.
    • Upper bound MIS may overestimate the frequency.
  • MIS is NP-complete and not be approximated.
    • Practically simple greedy scheme works pretty well.
      • Halldórsson and Radhakrishnan. Greed is good, 1997.
issue 2 search scheme
Issue 2: Search Scheme
  • Frequent subgraph mining 

Exploration in the lattice of subgraphs

  • Horizontal
    • Level-wise
    • Candidate generation and pruning
      • Joining
      • Downward closure property
    • Frequency counting
  • Vertical
    • Traverse the lattice as if it were a tree.
hs i g ra m horizontal method
hSIGRAM: Horizontal Method
  • Natural extension of FSG to the single graph setting.
  • Candidate generation and pruning.
    • Downward closure property 

Tighter pruning than vertical method

  • Two-phase frequency counting
    • All embeddings by subgraph isomorphism
      • Anchor edge list intersection, instead of TID list intersection.
      • Localize subgraph isomorphism
    • MIS for the embeddings
      • Approximate and upper bound MIS give subset and superset respectively.
tid list recap

Lattice of Subgraphs

T1

size k + 1

size k

T2

TID( ) = { T1, T3 }

T3

TID( ) = { T1, T2, T3 }

TID List Recap

TID( ) = { T1, T2, T3 }

TID( )  TID( ) ∩ TID( ) ∩ TID( )

= { T1, T3 }

anchor edges

Lattice of Subgraphs

size k + 1

size k

Anchor Edges
  • Each subgraph must appear close enough together.
  • Keep one edge for each.
    • Complete embeddings require too much memory.
    • Localize subgraph isomorphism.
treefication
Treefication
  • : a node in the search space (i.e., a subgraph)
  • Based on subgraph/supergraph relation
  • Avoid visiting the same node in the lattice more than once.

Treefied Lattice

Lattice of Subgraphs

size k + 1

size k

size k - 1