1 / 35

On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data

On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data. Peer- Timo Bremer 3 Jacqueline Chen 1 Hemanth Kolla 1. Janine Bennett 1 William McLendon III 1 Guarav Bansal 2.

elmo
Download Presentation

On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Peer-Timo Bremer3 Jacqueline Chen1 Hemanth Kolla1 Janine Bennett1 William McLendon III1 Guarav Bansal2 1Sandia National Laboratories, 2Intel, 3Lawrence Livermore National Laboratory Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Approved for Unlimited Unclassified Release, SAND # 2012-9242 C

  2. HPC resources generate large, complex, multivariate data sets Recent data sets generated by S3D, developed at the Combustion Research Facility, Sandia National Laboratories Details: Lifted Ethylene Jet • 1.3 billion grid points • 22 chemical species, vector, & particle data • 7.5 million cpu hours on 30,000 processors • 112,500 time steps (data stored every 375th) • 240 TB of raw field data + 50 TB particle data Efficiently characterizing & tracking intermittent features defined by multiple variables poses significant research challenges!

  3. Our contribution: a framework for characterizing complex events in large-scale multivariate data • Introduce attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features • Defined by multiple variables • Spanning an arbitrary number of time steps • Representation achieves drastic data reductions • Provide a mechanism for querying ARGs • Identify events conditioned on a variety of metrics • Demonstrate results on large-scale combustion simulation data

  4. Related work Topology: Segment domain into features according to function behavior Level-set behavior: Reeb graph, contour tree, and variants [Carr et al. 2003, Pascucci et al. 2007, Mascarenhas et al 2006, van Krevald et al 2004] Gradient behavior: Morse and Morse-Smale Complex [Edelsbrunner 2003, Gyulassy et al 2007, 2008, Gunther et al 2011] Multivariate feature analysis: Many correlation-based feature definitions [Gosink et al 2007, Chen et al 2011, Jaenicke et al 2007, Sauber et al 2006, Schneider et al 2008, Bennett et al 2011] Feature tracking graphs: Capture spatial-temporal relationships [Edelsbrunner et al 2004, Bremer et al 2010, Muelder et al 2009, Widanagamaachchi et al 2012] Graph search algorithms: Identify patterns in large-scale graphs [Barret et al 2007, Berry et al 2007, Gregor et al 2005, Siek et al 2002] MTGL

  5. What is an attributed relational graph (ARG)? • ARG nodes correspond to spatial features • Each ARG node encodes • Feature type • Time step • Optional per feature statistics • ARG edges encode relationship between features • Spatial overlap metric • Supports feature tracking over time

  6. ARG Nodes: Segment domain into relevant features • Many options for segmenting the domain into features • Often features of interest are defined by a threshold around minima or maxima of a particular variable y f x

  7. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  8. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  9. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  10. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  11. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  12. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  13. ARG Nodes: Refine the tree to increase granularity of possible segmentations y f x

  14. ARG Nodes: Features are defined as all sub-trees above a user-specified threshold y y f f x x

  15. ARG Edges: An overlap-based metric is used to encode feature behavior over time t = 2 t = 1 y f t = 3 t = 4 x

  16. ARG Edges: The same metric is used to encode relationships between different types of features t = 1 t = 2 t = 3 t = 4

  17. ARG Edges: Relationships can span multiple time steps t = 2 t = 1 t = 3 t = 4

  18. ARG Edges: Edge labels indicate degree of overlap between associated features 25 11

  19. Once the ARG is constructed, we can search for patterns of interest co-occurrence multi-way co-occurrence time-lag features

  20. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching • MTGL: Multi-Threaded Graph Library • Open source software • https://software.sandia.gov/trac/mtgl • Given ARG and template • Filter: Remove all edges in ARG that cannot belong to template • Match: Find all possible template matches in filtered ARG

  21. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk Template pattern

  22. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  23. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  24. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  25. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  26. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  27. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  28. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  29. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  30. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  31. Case study: identification of deflagration fronts in HCCI combustion data • Turbulent auto-ignitive mixture of Di-Methyl Ether under homogeneous charge compression ignition (HCCI) conditions • Deflagration fronts: spatially collocated extrema of chemical reaction rates and diffusive fluxes Reaction rate of OH Diffusion of OH

  32. Case study: ARG representation encodes complex relationships very compactly • Raw output data size: 78.2 GB (grid size = 560x 560 x 560) • 703 MB/variable * 6 variables for 19 time steps • Meta-data: computed in parallel on ORNL’s Lens system • 3 feature families: • Each encoding size, minimum, maximum, mean, and variance of 6 different variables • Data dependent costs O(minutes) per time step • Structure geometries only needed for ARG construction (not queries) • Size of ARG: 504 KB • Under 1GB required for fully flexible exploration and search on commodity hardware • O(seconds) for searches

  33. Case study: Searching the ARG A subset of the full ARG (full size is 6563 nodes and 8903 edges) A subset of the deflagration fronts identified

  34. Conclusion & future work • Introduced attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features • Provided a mechanism for querying ARGs • Demonstrated results on large-scale combustion simulation data • Some domain knowledge required to construct ARG • Which variables define features of interest • Range of potential time-lags between features • Opportunities for future work • GUI tool for specifying search template patterns • Leveraging per-feature statistics in queries • Linked views of ARG, search results, domain visualization • Dynamic ARGs • Don’t require feature thresholds to be specified in advance • Instead these are runtime parameters to be explored

  35. Questions? Janine Bennett jcbenne@sandia.gov Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

More Related