slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SDM workshop Strawman report History and Progress and Goal PowerPoint Presentation
Download Presentation
SDM workshop Strawman report History and Progress and Goal

Loading in 2 Seconds...

play fullscreen
1 / 13

SDM workshop Strawman report History and Progress and Goal - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

SDM workshop Strawman report History and Progress and Goal. History. Original plan Identify Scientific Applications Data Management needs Focus on different application types: simulations, experiments/observations Identify Data Management technologies

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SDM workshop Strawman report History and Progress and Goal' - mahon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

SDM workshop

Strawman report

History and Progress

and Goal

history
History
  • Original plan
    • Identify Scientific Applications Data Management needs
    • Focus on different application types: simulations, experiments/observations
    • Identify Data Management technologies
    • Identify other relevant Computer Science technologies
    • Identify Gaps, Cost, Priorities
  • In Extended EOC we came up with draft report
    • Based on extensive discussions of application needs
    • Identified the scientific investigation process (workflow)
    • Identified technologies needed
    • Assigned writing to individuals
section 2 application sciences motivation and needs
Section 2:Application sciences motivation and needs
  • Astrophysics
  • Biology
  • Climate Modeling
  • Combustion
  • Fusion Energy Science
  • High Energy and Nuclear Physics
  • Nanotechnology
section 3 the scientific investigation process
Section 3:The scientific investigation process
  • Distributed Scientific Workflows
  • Scientific Data Management Phases
    • Data Generation
    • Data Analysis
    • Data Visualization
  • Foundation of scientific data management technology
    • Workflow, dataflow, data transformation
    • Storage, data movement, grid, networks
    • Metadata management and cataloging
    • Efficient access and query, data integration
    • Integrated analysis environment, visualization
  • Requirements of supportive technologies
    • Networking
    • Visualization
scientific workflow cycle
Scientific Workflow Cycle

Data

Generation

workflow

workflow

Scientific Data

Management

Data

Visualization

Data

Analysis

workflow

section 4 data management technologies and gap analysis
Section 4:Data Management Technologies and Gap Analysis

1) Workflow, dataflow, data transformation

  • Workflow specification
  • Workflow execution in distributed systems
  • Monitoring of long-running workflows
  • Adapting components to the framework

Workflow layers

  • Control-flow layer
  • Application and Software Tools layer
  • I/O System layer
  • Storage and Network Resource layer
astrophysical simulation workflow cycle
Astrophysical Simulation Workflow Cycle

Application

Layer

Start New

Simulation?

Run Simulation

batch job on capability system

Continue

Simulation?

Simulation

generates

checkpoint

files

Archive

checkpoint files

to HPSS

Migrate subset

of checkpoint

files to local

cluster

Vis & Analysis

on local

Beowulf cluster

Parallel

I/O Layer

Parallel HDF5

Storage

Layer

PVFS

or

LUSTRE

HPSS

GPFS

MSS, Disks, & OS

section 4 data management technologies and gap analysis8
Section 4:Data Management Technologies and Gap Analysis

2) Storage, data movement, grid, networks

  • Dynamic data storage and caching
  • Robust terabyte-scale data movers
  • Dataflow automation between components
  • Multi-resolution data movement

3) Metadata management and cataloging

  • Unified data models and API’s
  • Annotation, ontologies and provenance
  • Metadata requirements for workflows
section 4 data management technologies and gap analysis9
Section 4:Data Management Technologies and Gap Analysis

4) Efficient access and query, data integration

  • Parallel and random I/O
  • Large-scale feature-based Indexing
  • Query processing over files
  • Data integration

5) Integrated analysis environment, visualization

  • A single environment for packaged tools and user software
  • A single environment for a variety of tools: statistical software, cluster analysis, …
  • Coupling with visualization tools
  • Work with parallel I/O
section 5 prioritization cost and management
Section 5:Prioritization, Cost, and Management
  • Prioritization process
    • Reasons based on current barriers and needs
    • Reasons based on long term projections
  • Practical budgeting considerations
    • Research and development
    • Hardening and packaging
    • Deployment and maintenance
  • Recommendations and program planning
    • Prioritization
    • Cost
  • Management Structure
gap cost matrix
Gap & Cost Matrix

Deployment

and

maintenance

Research

and

Development

Hardening

and

Packaging

  • Workflow, dataflow, data transformation
  • Storage, data movement, grid, networks
  • Metadata management and cataloging
  • Efficient access and query, data integration
  • Integrated analysis environment, visualization
discussion items
Discussion items

Deployment

and

maintenance

Research

and

Development

Hardening

and

Packaging

  • Control flow tier
    • Granularity of tasks, sub-workflows
    • Task Invocation mechanisms-Web Services, Corba, Wrappers, Callbacks
    • Human tasks: Notifications and alerts, steering
    • Dataflow streaming granularity
  • Work Tier
    • Workflow engine for scientific applications
    • Dataflow management
    • Effect of dataflow on the control flow
    • Failure detection and recovery
    • Performance and bottleneck issues