Loading in 2 Seconds...

Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing

Loading in 2 Seconds...

- 93 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing' - fairly

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph

Steve Reinhardt, Interactive Supercomputing sreinhardt@interactivesupercomputing.com

George Karypis, Dept. of Computer Science, University of Minnesota

Outline

- Problem definition
- Prior work
- Problem and Approach
- Results
- Issues and Conclusions

Graph Datasets

- Flexible and powerful representation
- Evidence extraction and link discovery (EELD)
- Social Networks/Web graphs
- Chemical compounds
- Protein structures
- Biological Pathways
- Object recognition and retrieval
- Multi-relational datasets

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph.

In SIAM International Conference on Data Mining (SDM-04), 2004.

http://citeseer.ist.psu.edu/article/kuramochi04finding.html

Finding Patterns in GraphsMany Dimensions- Structure of the graph dataset
- many small graphs
- graph transaction setting
- one large graph
- single-graph setting
- Type of patterns
- connected subgraphs
- induced subgraphs
- Nature of the algorithm
- Finds all patterns that satisfy the minimum support requirement
- Complete
- Finds some of the patterns
- Incomplete
- Nature of the pattern’s occurrence
- The pattern occurs exactly in the input graph
- Exact algorithms
- There is a sufficiently similar embedding of the pattern in the graph
- Inexact algorithms

- MIS calculation for frequency
- exact
- approximate
- upper bound
- Algorithm
- vertical (depth-first)
- horizontal (breadth-first)

Frequency = 1

Input Graph

Size 7

Frequency = 6

Single Graph Setting- Find all frequent subgraphs from a single sparse graph.
- Choice of frequency definition

vSIGRAM: Vertical Solution

- Candidate generation by extension
- Add one more edge to a current embedding.
- Solve MIS on embeddings in the same equivalence class.
- No downward-closure-based pruning
- Two important components
- Frequency-based pruning of extensions
- Treefication based on canonical labeling

vSIGRAM: Connection Table

- Frequency-based pruning.
- Trying every possible extension is expensive and inefficient.
- A particular extension might have been tested before.
- Categorize extensions into equivalent classes (in terms of isomorphism), and record if each class is frequent or not.
- If a class becomes infrequent, never try it in later exploration.

Parallelization

- Two clear sources of parallelism in the algorithm
- Amount of parallelism from each source not known in advance
- The code is typical C code
- structs, pointers, frequent mallocs/frees of small areas, etc.
- nothing like the “Fortran”-like (dense linear algebra) examples shown for many parallel programming methods
- Parallel structures need to accommodate dynamic parallelism
- Dynamic specification of parallel work
- Dynamic allocation of processors to work
- Chose OpenMP taskq/task constructs
- Proposed extensions to OpenMP standard
- Support parallel work being defined in multiple places in a program, but be placed on a single conceptual queue and executed accordingly
- ~20 lines of code changes in ~15,000 line program
- Electric Fence was very useful in finding coding errors

Algorithmic Parallelism

vSiGraM (G, MIS_type, f)

1. F ←

2. F1 ← all frequent size-1 subgraphs in G

3. for each F1 in F1 do

4. M(F1) ← all embeddings of F1

5. for each F1 in F1 do // high-level parallelism

6. F ← F vSiGraM-Extend(F1, G, f)

return F

vSiGraM-Extend(Fk, G , f)

1. F ←

2. for each embedding m in M(Fk) do // low-level parallelism

3. Ck+1 ← Ck+1 {all (k+1)-subgraphs of G containing m}

4. for each Ck+1 in Ck+1 do

5. if Fk is not the generating parent of Ck+1 then

6. continue

7. compute Ck+1.freq from M(Ck+1)

8. if Ck+1.freq < f then

9. continue

10. F ← F vSiGraM-Extend(Ck+1, G, f)

11.return F

Simple Taskq/Task Example

main()

{

int val;

#pragma intel omp taskq

val = fib(12345);

}

fib(int n)

{

int partret[2];

if (n>2)

#pragma intel omp task

for(i=n-2; i<n; i++) {

partret[n-2-i] = fib(i);

}

return (partret[0] + partret[1]);

} else {

return 1;

}

}

High-Level Parallelism with taskq/task

// At the bottom of expand_subgraph, after all child

// subgraphs have been identified, start them all.

#pragma intel omp taskq

for (ii=0; ii<sg_set_size(child); ii++) {

#pragma intel omp task captureprivate(ii)

{

SubGraph *csg = sg_set_at(child,ii);

expand_subgraph(csg, csg->ct, lg, ls, o);

} // end-task

}

Low-Level Parallelism with taskq/task

#pragma omp parallel shared(nt, priv_es)

{

#pragma omp master

{

nt = omp_get_num_threads(); //#threads in par

priv_es = (ExtensionSet **)kmp_calloc(nt, sizeof(ExtensionSet *));

}

#pragma omp barrier

#pragma intel omp taskq

{

for (i = 0; i < sg_vmap_size(sg); i++) {

#pragma intel omp task captureprivate(i)

{

int th = omp_get_thread_num();

if (priv_es[th] == NULL) {

priv_es[th] = exset_init(128);

}

expand_map(sg, ct, ams, i, priv_es[th], lg);

}

}

}

} // end parallel section; next loop is serial reduction

for (i=0; i < nt; i++) {

if (priv_es[i] != NULL) {

exset_merge(priv_es[i],es);

}

}

kmp_free(priv_es);

}

Implementation due to Grant Haab and colleagues from Intel OpenMP library group

Experimental Results

- SGI Altix™ 32 Itanium2™ sockets (64 cores), 1.6GHz
- 64 GBytes (though not memory limited)
- Linux
- No special dplace/cpuset configuration
- Minimum frequencies chosen to illuminate scaling behavior, not provide maximum performance

Performance of High-level Parallelism

- When sufficient quantity of work (i.e., frequency threshold is low enough)
- Good speed-ups to 16P
- Reasonable speed-ups to 30P
- Little or no benefit above 30P
- No insight into performance plateau

Poor Performance of Low-level Parallelism

- Several possible effects ruled out
- Granularity of data allocation
- Barrier before master-only reduction
- Source: highly variable times for register_extension
- ~100X slower in parallel than serial, …
- but different instances from execution to execution
- Apparently due to highly variable run-times for malloc
- Not understood

Issues and Conclusions

- OpenMP taskq/task were straightforward to use in this program and implemented the desired model
- Performance was good to a medium range of processor counts (best 26X on 30P)
- Difficult to gain insight into lack of performance
- High-level parallelism 30P and above
- Low-level parallelism

Aviation Dataset

- Generally, vSIGRAM is 2-5 times faster than hSIGRAM (with exact and upper bound MIS)
- Largest pattern contained 13 edges.

Citation Dataset

- But, hSIGRAM can be more efficient especially with upper bound MIS (ub).
- Largest pattern contained 16 edges.

VLSI Dataset

- Exact MIS never finished.
- Longest pattern contained 5 edges (constraint).

Comparison with SUBDUE

- Similar results with SEuS

Summary

- With approximate and exact MIS, vSIGRAM is 2-5 times faster than hSIGRAM.
- With upper bound MIS, however, hSIGRAM can prune a larger number of infrequent patterns.
- The downward closure property plays the role.
- For some datasets, using exact MIS for frequency counting is just intractable.
- Compared to SUBDUE, SIGRAM finds more and longer patterns in shorter amount of runtime.

Thank You!

- Slightly longer version of this paper is also available as a technical report.
- SIGRAM executables will be available for download soon from http://www.cs.umn.edu/~karypis/pafi/

Complete Frequent Subgraph Mining—Existing Work So Far

- Input: A set of graphs (transactions) + support threshold
- Goal: Find all frequently occurring subgraphs in the input dataset.
- AGM (Inokuchi et al., 2000), vertex-based, may not be connected.
- FSG (Kuramochi et. al., 2001), edge-based, only connected subgraphs
- AcGM (Inokuchi et al., 2002), gSpan (Yan & Han, 2002), FFSM (Huan et al., 2003), etc. follow FSG’s problem definition.
- Frequency of each subgraph

The number of supporting transactions.

- Does not matter how many embeddings are in each transaction.

What is the reasonable frequency definition?

- Two reasonable choices:
- The frequency is determined by the total number of embeddings.
- Not downward closed.
- Too many patterns.
- Artificially high frequency of certain patterns.
- The frequency is determined by the number of edge-disjoint embeddings (Vanetik et al, ICDM 2002).
- Downward closed.
- Since each occurrence utilizes different sets of edges, occurrence frequencies are bounded.
- Solved by finding the maximum independent set (MIS) of the embedding overlap graph.

Edge-disjoint embeddings

{ E1, E2, E3 }

{ E1, E2, E4 }

Create an overlap graph and solve MIS

Vertex Embedding

Edge Overlap

Embedding Overlap and MISE2

E1

E3

E4

OK. Definition is Fine, but …

- MIS-based frequency seems reasonable.
- Next question: How to develop mining algorithms for the single graph setting.

How to Handle Single Graph Setting?

- Issue 1: Frequency counting
- Exact MIS is often intractable.
- Issue 2: Choice of search scheme
- Horizontal (breadth-first)
- Vertical (depth-first)

Issue 1: MIS-Based Frequency

- We considered approximate (greedy) and upperbound MIS too.
- Approximate MIS may underestimate the frequency.
- Upper bound MIS may overestimate the frequency.
- MIS is NP-complete and not be approximated.
- Practically simple greedy scheme works pretty well.
- Halldórsson and Radhakrishnan. Greed is good, 1997.

Issue 2: Search Scheme

- Frequent subgraph mining

Exploration in the lattice of subgraphs

- Horizontal
- Level-wise
- Candidate generation and pruning
- Joining
- Downward closure property
- Frequency counting
- Vertical
- Traverse the lattice as if it were a tree.

hSIGRAM: Horizontal Method

- Natural extension of FSG to the single graph setting.
- Candidate generation and pruning.
- Downward closure property

Tighter pruning than vertical method

- Two-phase frequency counting
- All embeddings by subgraph isomorphism
- Anchor edge list intersection, instead of TID list intersection.
- Localize subgraph isomorphism
- MIS for the embeddings
- Approximate and upper bound MIS give subset and superset respectively.

T1

size k + 1

size k

T2

TID( ) = { T1, T3 }

T3

TID( ) = { T1, T2, T3 }

TID List RecapTID( ) = { T1, T2, T3 }

TID( ) TID( ) ∩ TID( ) ∩ TID( )

= { T1, T3 }

size k + 1

size k

Anchor Edges- Each subgraph must appear close enough together.
- Keep one edge for each.
- Complete embeddings require too much memory.
- Localize subgraph isomorphism.

Treefication

- : a node in the search space (i.e., a subgraph)
- Based on subgraph/supergraph relation
- Avoid visiting the same node in the lattice more than once.

Treefied Lattice

Lattice of Subgraphs

size k + 1

size k

size k - 1

Download Presentation

Connecting to Server..