1 / 26

Beth Plale Director, Center for Data and Search Informatics School of Informatics

Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search. Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University, US. Introduction.

brilliant
Download Presentation

Beth Plale Director, Center for Data and Search Informatics School of Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University, US

  2. Introduction • Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. • In this talk we describe key data management aspects of the project - those projects being carried out in the Center for Data and Search Informatics at Indiana University

  3. Infrastructure is portal based - that is, all services are available through a web server

  4. Gateway Services Proxy Certificate Server (Vault) Application Deployment Workflow engine Resource Registry Community & User Metadata Catalog Events & Messaging Resource Broker Core Grid Services Security Services Information Services Self Management Resource Management Execution Management Data Services Resource Virtualization (OGSA) Compute Resources Data Resources Instruments & Sensors e-Science Gateway Architecture User’s Grid Desktop Grid Portal Server [1]Service Oriented Architectures for Science Gateways on Grid Systems, Gannon, D., et al.; ICSOC, 2005

  5. Typical weather forecast runs as workflow Visualization Pre-Processing Assimilation Forecast Terrain data files ETA, RUC, GFS data IDV viz arpstrn Ext2arps-ibc Ext2arps-lbc Surface data files WRF Radar data (level II) arpssfc 88d2arps arps2wrf wrf2arps ADAS assimilation Radar data (level III) arpsplot Surface, upper air mesonet & wind profiler data nids2arps Satellite data ~400 Data Products Consumed & Produced –transformed– during Workflow Lifecycle mci2arps

  6. To set up workflow experiment, we select a workflow (not shown) then set model parameters here

  7. Supported community data collections

  8. Data Integration Local view: crosswalk point of presence supports crawling, publishes difference list as LEAD Metadata Schema (LMS) documents CASA radar Collection, Months (ftp) Globally integrated view: Data Catalog Service Oklahoma Boolean search query Latest 3 days Unidata IDD Distribution (XML web server) • Crawler crawls catalogs; • Builds index of results; • Web service API; • Boolean search query with spatial/temporal support Indiana List of results as LEAD Metadata Schema documents Web service API Level II and III radar, latest 3 days (XML web server) Colorado ETA, NCEP, NAM, METAR, etc. (XML web server) Index XMLDB native XML database and Lucene for index Colorado crosswalks

  9. LEAD Personal Workspace • CyberInfrastructure extends user’s desktop to incorporate vast data analysis space. • As users go about doing scientific experiments, the CI manages back-end storage and compute resources. • Portal provides ways to explore this data and search and discover it. • Metadata about experiments is largely automatically generated, and highly searchable. • Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

  10. Searching for experiments using model configuration parameters: 2 attributes selected

  11. Searching for experiments based on model parameters: 4 returned experiments; one displayed

  12. How forecast model configuration parameters stored in personal catalog Forecast model configuration file handed off to plugin that shreds XML document into queriable attributes associated with experiment

  13. Data.In.1 Data.Out.1 Application A Data.In.2 Config.A What & Why of Provenance • Derivation history of a data product • What (when, where) application created the data • Its parameters & configuration • Other input data used by application • Workflow is composed from building blocks like these. So provenance for data used in workflow gives workflow trace • Data Provenance::Data.Out.1 • Process: Application_A • Timestamp: 2006-06-23T12:45:23 • Host: tyr20.cs.indiana.edu … • Input: Data.In.1, Data.In.2 • Config: Config.A

  14. The What & Why of Provenance • Trace Workflow Execution • What services were used during workflow execution? • Validate if all steps of execution successful? • Audit Trail • What resources were used during workflow execution? • Data Quality & Reuse • What applications were used to derived data products? • Which workflows use a certain data product? • Attribution • Who performed the experiment? • Who owns the workflow & data products? • Discovery • Locate data generated by a workflow • Locate workflows containing App-X that succeeded

  15. Message Bus WS-EventingService API Query for Workflow, Process, & Data Provenance Karma Provenance Service Provenance Browser Client Provenance Listener Provenance Query API Activity DB Subscribe & Listen to Activity Notifications WS-Messenger Notification Broker Workflow–Started & –Finished Activities Publish Provenance Activities as Notifications Application–Started & –Finished, Data–Produced & –Consumed Activities Workflow Engine Workflow Instance 10 Data Products Consumed & Produced by each Service Orchestration Service 1 Service 2 Service 9 Service 10 … 10C 10P 10P 10C 10P/10C 10P/10C Collection Framework A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., ICWS Conference, 2006

  16. Generating Karma Provenance Activities • Instrument applications to publish provenance • Simple Java Library available to • Create provenance activities • Publish activities as messages • Jython “wrapper” scripts use library to publish provenance & invoke application • Generic Factory toolkit easily converts applications to web service • Built-in provenance instrumentation

  17. Sample Sequence of Activities appStarted(App1) info(‘App1 starting’) fileReceiveStarted(File1) -- do gridftp get to stage input file File1 -- fileReceiveFinished(File1) fileConsumed(File1) computationStarted(Code1) -- call Fortran code Code1 to process input files -- computationFinished(Code1) fileProduced(File2) fileSendStarted(File2) -- do gridftp put to save output file File2 -- fileSendFinished(File2) publishURL(File2) appFinishedSuccess(App1, File2) | appFinishedFailed(App1, ERR) flush()

  18. Performance perturbation

  19. Scalability Study4 [4]Performance Evaluation of the Karma Provenance Framework for Scientific Workflows, Simmhan, Y., et al.; IPAW Workshop, 2006

  20. Resource monitoring as two-planes of control

  21. Resource adaptation illustrated (1) Resource Changes DAG Workflow Configuration Service Resource Management Services Portal Workflow Replan the workflow Create Services LEAD BPEL Workflow Engine App. Factory Launch Services Application Service (per task) Run job Job notification Run workflow one step at a time Stop the earlier workflow Resource has failed, need to reschedule remaining parts of workflow Workflow and File Status myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Event Broker Sensor Actuator

  22. Resource adaptation illustrated (2) Implement Adverse Weather Policy DAG Workflow Configuration Service Resource Management Services Portal Workflow Plan resources for sub-components Create Services LEAD BPEL Workflow Engine App. Factory Launch Services Application Service (per task) Run job Job notification Run workflow one step at a time Implement strict deadline scheduling Change priorities for users e.g. Lavanya’s workflow gets lower priority Workflow and File Status Event Broker Sensor Actuator myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Weather change

  23. Resource adaptation illustrated (3) Services DAG Portal Workflow Configuration Service Resource Management Services Workflow Create Services LEAD BPEL Workflow Engine App. Factory Launch Services Application Service (per task) Run job Job notification Run workflow one step at a time Sensor Actuator “Replicate Service” “Service Overloaded” myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Event Broker

  24. Recent LEAD Highlight Spring 2007 Weather Challenge Forecast contest - February - March 2007 • Students ran ….. Statistics from the Challenge • Approximately 50 participants • 6696 jobs submitted to Teragrid (52925 TG SU's), and • Generated about 2.6 TB of data which is archived at Indiana University and available though each participating user’s personal workspace catalog. • Computational models run on Teragrid resources. Portal and persistent back-end services run at Indiana University. Data storage resources (45 TB) for user-generated data products provided by Indiana University.

  25. Future Work • Optimizations and refinements: file movement, revisit metadata schema, improve crosswalks with eye to reduced maintenance • Personal predictor - packaging LEAD framework into single 8-16 core multicore machine for the individual purchase

  26. Thanks to the whole LEAD team, and the National Science Foundation for their support.For more information, feel free to contact me at plale@indiana.edu or go to http://www.leadportal.org

More Related