1 / 31

Modeling Data Product Generation

This thesis explores the value of an EOFS (End-of-Field-System) in product generation, highlighting its benefits, limitations, and the various aspects involved in modeling data products. Topics include data product definitions, quality analysis and translation, product generation and documentation, remote computation, and more.

wharrison
Download Presentation

Modeling Data Product Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Data Product Generation Bill Howe Dave Maier

  2. Data Product Management Thesis: The value of an EOFS is the number of products it provides Limits on #’s of products • Amount of oversight for current products • Time to create a new product • Resources required to generate products Modeling Data Product Generation

  3. Modeling Data Products Data Product Definitions (DPDs) or “recipes” • initially for documentation • “blueprint” for manual construction Modeling Data Product Generation

  4. Beyond Documentation • Quality Analysis and Translation • calculate quality metrics from DPDs (e.g., resolution) • translate DPDs into executable network of Infopipes (meeting a quality standard) Modeling Data Product Generation

  5. Product Generation and Documentation • management and scheduling of product suitebased on input avail, resources, dissem. req. • job shop  assembly line • adaptive eventually; priorities, feedback to sensors and models • Performance Optimization • algebraic optimization • common subresults & shared scans on groups of products Modeling Data Product Generation

  6. Remote Computation • “product kit”: final product built at consumer site • remote “product factory” Modeling Data Product Generation

  7. Exercise: Fill in the Acronym CORMORANT • COlumbia • River • Modeling, • Observation, • Retrieval?? & • Archive… Modeling Data Product Generation

  8. Roadmap • Vision • Status • Past • Graphical Diagram • Process Modeling • Type System • Current • Abstract Grids • Grid Functions Modeling Data Product Generation

  9. Graphical System Description • Studied relevant files and codes to model: • Producers and consumers • Control flow • Data flow • Benefits: • understanding within the project • communication outside the project • Drawbacks: • only a ‘snapshot’ • very literal • no scheduling help... Modeling Data Product Generation

  10. Brittle Scheduling • Contentious codes cause crashes • Annotate the diagram with cron job information? • But, it would be nice to capture real executions of all system components for careful study Modeling Data Product Generation

  11. Instrumenting CORIE • Model the executionsof codes using a relational database • Monitor CORIE activity using SGI’s FAM technology • Try to identify bottlenecks, problem spots, and resource consumption properties • Status: we’re poised to perform further testing; some security concerns have been raised Modeling Data Product Generation

  12. More than just processes... • The model is too close of a fit • Let’s start at a higher level... Modeling Data Product Generation

  13. A Candidate Type System • Relevant types: • TimeSeries (TS) • ElementField (EF) / NodeField (NF) • DepthField (DF) • Ex: salt.63 = TS (EF (DF Salinity)) fort.21 = EF Depth findmax63 = TS (EF (DF a))  TS (EF a) Modeling Data Product Generation

  14. Grid Vol select(sal<30) subgrid(Ocean) sum(grid) plumevol + Elev Vol select(sal<30) subgrid(Ocean) sum(grid) Abstract Data Product Recipes • But consider compute_plumevol: • This informal recipe seems appropriate regardless of the specifics of our data representation • This information should be captured somewhere! • Currently it’s obfuscated by c codes, and • tightly coupled with the TS (EF (DF a)) structure Modeling Data Product Generation

  15. Topological Grid • A more general grid Gd is a collection of k-cells of dimension k, k in {0..d} • A grid function GF is a mapping from a k-cell to a value of type T GF : k-cellT Modeling Data Product Generation

  16. Imagine a big 4d grid representing our current best data experimental ELCIRC vers forecast hindcast hindcast missing Grid Functions (GF) map grid locations to values 15º C 23.4 psu Modeling Data Product Generation

  17. GF Magnitude GF Vorticity GF Velo N’hood Grid Functions We can derive new grid functions from our original set GF Salt GF Velocity GF Temp GF Elev GF Neighbors Modeling Data Product Generation

  18. Benefits • Say we have recipes that involve • a grid, • some grid functions, and • some operators • So what? Well, • We can reason about data product outputs • We can optimize recipe execution Modeling Data Product Generation

  19. GF Salt applytoall(vort) GF ??? Reasoning about Types GF Velocity applytoall(vort) GF Vorticity High level recipes can detect this kind of error before wasting compute resources Modeling Data Product Generation

  20. an invalid transect a valid transect Reasoning about Schema GF1 subgrid(Ocean) GF2 type(GF1) = type(GF2), but schema(GF1)  schema(GF2 ) since GF2 is defined over a smaller grid than GF1 • By tracking schema information through complex recipes we can: • check for errors • estimate resource requirements (big schema require big buffers) Modeling Data Product Generation

  21. Reasoning about Quality • Say we have operators coarsen and refine which lower resolution via grouping and raise resolution via interpolation, respectively type(GF1) = type(GF2), schema(GF1) = schema(GF2), but qual(GF1)  qual(GF2) GF1 coarsen refine GF2 Modeling Data Product Generation

  22. ... GF Elev ... computevol subgrid(Ocean) GF Vol ... GF Area ... GF Elev subgrid(Ocean) ... computevol GF Vol ... GF Area subgrid(Ocean) Optimize via Algebraic Manipulations Different sequences of operators can give equivalent results These are equivalent, but the second avoids computing volume over the entire grid Modeling Data Product Generation

  23. F GF Bool T T T - GF (Maybe Salt) 23 22 24 {KCell} {c1, c2, c3} Optimize via Choice of Implementation GF Salt select(s < 30) ? Modeling Data Product Generation

  24. A Node’s neighbors don’t often change, so we can avoid re-computing this result Optimize via Shared Intermediate Results GF Velocity GF Velo N’hood GF Vorticity GF Neighbors GF Salt N’hood GF Salinity GF Salt Gradient Modeling Data Product Generation

  25. Other niceties... • We don’t have to re-implement everything to realize benefits • But eventually we’ll want to wag the dog! • A collection of recipes can help... • communicate the product catalog • provide provenance • Derive new recipes from parts of old ones • support for product lines Modeling Data Product Generation

  26. Summary • Modeling the current CORIE • Graphical System Description • pmon • Modeling the future CORIE • Grid Functions • Recipes • Reasoning • Optimization Modeling Data Product Generation

  27. Milestones • RPE this spring • Specify existing data products using the model • Perform checks on existing production plans • Type • Schema / Resources • Quality Modeling Data Product Generation

  28. Modeling Data Product Generation

  29. A Thorough Experiment Management Schema Modeling Data Product Generation

  30. A Good Start... task definition task instance (with parameters) task execution Modeling Data Product Generation

  31. pmon Architecture pmon (Process Monitor) Database Web Server fam (File Alteration Monitor) imon, dnotify, or polling, depending on kernel patch Filesystem pacct (stopped process stats) /proc (running process info) acct (process accounting) Process to Monitor Linux Kernel Modeling Data Product Generation

More Related