on the use of graph search techniques for the analysis of extreme scale combustion simulation data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data PowerPoint Presentation
Download Presentation
On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data

Loading in 2 Seconds...

  share
play fullscreen
1 / 35
yitta

On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data - PowerPoint PPT Presentation

76 Views
Download Presentation
On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Peer-Timo Bremer3 Jacqueline Chen1 Hemanth Kolla1 Janine Bennett1 William McLendon III1 Guarav Bansal2 1Sandia National Laboratories, 2Intel, 3Lawrence Livermore National Laboratory Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Approved for Unlimited Unclassified Release, SAND # 2012-9242 C

  2. HPC resources generate large, complex, multivariate data sets Recent data sets generated by S3D, developed at the Combustion Research Facility, Sandia National Laboratories Details: Lifted Ethylene Jet • 1.3 billion grid points • 22 chemical species, vector, & particle data • 7.5 million cpu hours on 30,000 processors • 112,500 time steps (data stored every 375th) • 240 TB of raw field data + 50 TB particle data Efficiently characterizing & tracking intermittent features defined by multiple variables poses significant research challenges!

  3. Our contribution: a framework for characterizing complex events in large-scale multivariate data • Introduce attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features • Defined by multiple variables • Spanning an arbitrary number of time steps • Representation achieves drastic data reductions • Provide a mechanism for querying ARGs • Identify events conditioned on a variety of metrics • Demonstrate results on large-scale combustion simulation data

  4. Related work Topology: Segment domain into features according to function behavior Level-set behavior: Reeb graph, contour tree, and variants [Carr et al. 2003, Pascucci et al. 2007, Mascarenhas et al 2006, van Krevald et al 2004] Gradient behavior: Morse and Morse-Smale Complex [Edelsbrunner 2003, Gyulassy et al 2007, 2008, Gunther et al 2011] Multivariate feature analysis: Many correlation-based feature definitions [Gosink et al 2007, Chen et al 2011, Jaenicke et al 2007, Sauber et al 2006, Schneider et al 2008, Bennett et al 2011] Feature tracking graphs: Capture spatial-temporal relationships [Edelsbrunner et al 2004, Bremer et al 2010, Muelder et al 2009, Widanagamaachchi et al 2012] Graph search algorithms: Identify patterns in large-scale graphs [Barret et al 2007, Berry et al 2007, Gregor et al 2005, Siek et al 2002] MTGL

  5. What is an attributed relational graph (ARG)? • ARG nodes correspond to spatial features • Each ARG node encodes • Feature type • Time step • Optional per feature statistics • ARG edges encode relationship between features • Spatial overlap metric • Supports feature tracking over time

  6. ARG Nodes: Segment domain into relevant features • Many options for segmenting the domain into features • Often features of interest are defined by a threshold around minima or maxima of a particular variable y f x

  7. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  8. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  9. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  10. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  11. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  12. ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest y f x

  13. ARG Nodes: Refine the tree to increase granularity of possible segmentations y f x

  14. ARG Nodes: Features are defined as all sub-trees above a user-specified threshold y y f f x x

  15. ARG Edges: An overlap-based metric is used to encode feature behavior over time t = 2 t = 1 y f t = 3 t = 4 x

  16. ARG Edges: The same metric is used to encode relationships between different types of features t = 1 t = 2 t = 3 t = 4

  17. ARG Edges: Relationships can span multiple time steps t = 2 t = 1 t = 3 t = 4

  18. ARG Edges: Edge labels indicate degree of overlap between associated features 25 11

  19. Once the ARG is constructed, we can search for patterns of interest co-occurrence multi-way co-occurrence time-lag features

  20. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching • MTGL: Multi-Threaded Graph Library • Open source software • https://software.sandia.gov/trac/mtgl • Given ARG and template • Filter: Remove all edges in ARG that cannot belong to template • Match: Find all possible template matches in filtered ARG

  21. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk Template pattern

  22. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  23. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  24. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  25. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  26. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  27. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  28. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  29. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  30. Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

  31. Case study: identification of deflagration fronts in HCCI combustion data • Turbulent auto-ignitive mixture of Di-Methyl Ether under homogeneous charge compression ignition (HCCI) conditions • Deflagration fronts: spatially collocated extrema of chemical reaction rates and diffusive fluxes Reaction rate of OH Diffusion of OH

  32. Case study: ARG representation encodes complex relationships very compactly • Raw output data size: 78.2 GB (grid size = 560x 560 x 560) • 703 MB/variable * 6 variables for 19 time steps • Meta-data: computed in parallel on ORNL’s Lens system • 3 feature families: • Each encoding size, minimum, maximum, mean, and variance of 6 different variables • Data dependent costs O(minutes) per time step • Structure geometries only needed for ARG construction (not queries) • Size of ARG: 504 KB • Under 1GB required for fully flexible exploration and search on commodity hardware • O(seconds) for searches

  33. Case study: Searching the ARG A subset of the full ARG (full size is 6563 nodes and 8903 edges) A subset of the deflagration fronts identified

  34. Conclusion & future work • Introduced attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features • Provided a mechanism for querying ARGs • Demonstrated results on large-scale combustion simulation data • Some domain knowledge required to construct ARG • Which variables define features of interest • Range of potential time-lags between features • Opportunities for future work • GUI tool for specifying search template patterns • Leveraging per-feature statistics in queries • Linked views of ARG, search results, domain visualization • Dynamic ARGs • Don’t require feature thresholds to be specified in advance • Instead these are runtime parameters to be explored

  35. Questions? Janine Bennett jcbenne@sandia.gov Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.