1 / 13

SDM workshop Strawman report History and Progress and Goal

SDM workshop Strawman report History and Progress and Goal. History. Original plan Identify Scientific Applications Data Management needs Focus on different application types: simulations, experiments/observations Identify Data Management technologies

mahon
Download Presentation

SDM workshop Strawman report History and Progress and Goal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SDM workshop Strawman report History and Progress and Goal

  2. History • Original plan • Identify Scientific Applications Data Management needs • Focus on different application types: simulations, experiments/observations • Identify Data Management technologies • Identify other relevant Computer Science technologies • Identify Gaps, Cost, Priorities • In Extended EOC we came up with draft report • Based on extensive discussions of application needs • Identified the scientific investigation process (workflow) • Identified technologies needed • Assigned writing to individuals

  3. Section 2:Application sciences motivation and needs • Astrophysics • Biology • Climate Modeling • Combustion • Fusion Energy Science • High Energy and Nuclear Physics • Nanotechnology

  4. Section 3:The scientific investigation process • Distributed Scientific Workflows • Scientific Data Management Phases • Data Generation • Data Analysis • Data Visualization • Foundation of scientific data management technology • Workflow, dataflow, data transformation • Storage, data movement, grid, networks • Metadata management and cataloging • Efficient access and query, data integration • Integrated analysis environment, visualization • Requirements of supportive technologies • Networking • Visualization

  5. Scientific Workflow Cycle Data Generation workflow workflow Scientific Data Management Data Visualization Data Analysis workflow

  6. Section 4:Data Management Technologies and Gap Analysis 1) Workflow, dataflow, data transformation • Workflow specification • Workflow execution in distributed systems • Monitoring of long-running workflows • Adapting components to the framework Workflow layers • Control-flow layer • Application and Software Tools layer • I/O System layer • Storage and Network Resource layer

  7. Astrophysical Simulation Workflow Cycle Application Layer Start New Simulation? Run Simulation batch job on capability system Continue Simulation? Simulation generates checkpoint files Archive checkpoint files to HPSS Migrate subset of checkpoint files to local cluster Vis & Analysis on local Beowulf cluster Parallel I/O Layer Parallel HDF5 Storage Layer PVFS or LUSTRE HPSS GPFS MSS, Disks, & OS

  8. Section 4:Data Management Technologies and Gap Analysis 2) Storage, data movement, grid, networks • Dynamic data storage and caching • Robust terabyte-scale data movers • Dataflow automation between components • Multi-resolution data movement 3) Metadata management and cataloging • Unified data models and API’s • Annotation, ontologies and provenance • Metadata requirements for workflows

  9. Section 4:Data Management Technologies and Gap Analysis 4) Efficient access and query, data integration • Parallel and random I/O • Large-scale feature-based Indexing • Query processing over files • Data integration 5) Integrated analysis environment, visualization • A single environment for packaged tools and user software • A single environment for a variety of tools: statistical software, cluster analysis, … • Coupling with visualization tools • Work with parallel I/O

  10. Section 5:Prioritization, Cost, and Management • Prioritization process • Reasons based on current barriers and needs • Reasons based on long term projections • Practical budgeting considerations • Research and development • Hardening and packaging • Deployment and maintenance • Recommendations and program planning • Prioritization • Cost • Management Structure

  11. Gap & Cost Matrix Deployment and maintenance Research and Development Hardening and Packaging • Workflow, dataflow, data transformation • Storage, data movement, grid, networks • Metadata management and cataloging • Efficient access and query, data integration • Integrated analysis environment, visualization

  12. Discussion items Deployment and maintenance Research and Development Hardening and Packaging • Control flow tier • Granularity of tasks, sub-workflows • Task Invocation mechanisms-Web Services, Corba, Wrappers, Callbacks • Human tasks: Notifications and alerts, steering • Dataflow streaming granularity • Work Tier • Workflow engine for scientific applications • Dataflow management • Effect of dataflow on the control flow • Failure detection and recovery • Performance and bottleneck issues

  13. The End

More Related