1 / 21

Cyberinfrastructure for Coastal Forecasting and Change Analysis

Cyberinfrastructure for Coastal Forecasting and Change Analysis. Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University. Context. New Award from Office of Cyberinfrastructure (OCI) Under Cyberinfrastructure for Environmental Observatories Program

amos-young
Download Presentation

Cyberinfrastructure for Coastal Forecasting and Change Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University

  2. Context • New Award from Office of Cyberinfrastructure (OCI) • Under Cyberinfrastructure for Environmental Observatories Program • September 2006 – August 2009, total amount $1,400,000 • Involves 2 Computer Scientists and 2 Environmental Scientists • G. Agrawal (PI) – Grid Middleware • H. Ferhatosmanoglu – Databases • K. Bedford: Great Lakes Now/Forecasting • R. Li: Coastal Erosion Analysis

  3. Coastal Forecasting and Change Detection (Lake Erie)

  4. Project Premise • Limitation of Current Environmental Observation Systems • Tightly coupled systems • No reuse of algorithms • Very hard to experiment with new algorithms • Closely tied to existing resources • Our claim • Emerging trends towards web-services and grid-services can help

  5. Challenges • Existing Grid Middleware Systems have not considered • Processing of Streaming Data • Data Integration Issues • The applications involved needs techniques for multi-modal data fusion, query planning, and data mining • Need to implement them as grid or web-services

  6. Proposed Infrastructure and Collaboration

  7. Application Details: Great Lakes Now/ForeCasting • GLOS: Great Lakes Observing System • Co-designer/project manager: K. Bedford, a co-PI on this project • Collaboration with NOAA • Limitations: Hard-wired • Cannot incorporate new streams or algorithms • Create an Implementation using our Middleware for Streaming Data

  8. Application Details: Coastal Erosion Prediction and Analysis • Focus: Erosion along Lake Erie Shore • Serious problem • Substantial Economic Losses • Prediction requires data from • Variety of Satellites • In-situ sensors • Historical Records • Challenges • Analyzing distributed data • Data Integration/Fusion

  9. Middleware Developed at Ohio State • Automatic Data Virtualization Framework • Enabling processing and integration of data in low-level formats • GATES (Grid-based AdapTive Execution on Streams) • Processing of distributed data streams • FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid) • Supporting scalable data analysis on remote data

  10. Automatic Data Virtualization: Motivation • Access mechanisms for remote repositories • Complex low-level formats make accessing and processing of data difficult • Main desired functionality • Ability to select, down-load, and process a subset of data • Sensor Data • Again, low level data • Need to convert formats • Need a flexible architecture

  11. Data Virtualization An abstract view of data dataset Data Virtualization Data Service • By Global Grid Forum’s DAIS working group: • A Data Virtualization describes an abstract view of data. • A Data Service implements the mechanism to access and process data • through the Data Virtualization

  12. Our Approach: Automatic Data Virtualization • Automatically create data services • A new application of compiler technology • A metadata descriptor describes the layout of data on a repository • An abstract view is exposed to the users • Two implementations: • Relational /SQL-based • XML/XQuery based

  13. Streaming Data Model • Continuous data arrival and processing • Emerging model for data processing • Sources that produce data continuously: sensors, long running simulations • Critical In Environmental Observatories • Active topic in many computer science communities • Databases • Data Mining • Networking ….

  14. Need for a Grid-Based Stream Processing Middleware • Application developers interested in data stream processing • Will like to have abstracted • Grid standards and interfaces • Adaptation function • Will like to focus on algorithms only • GATES is a middleware for • Grid-based • Self-adapting Data Stream Processing

  15. Adaptation for Real-time Processing • Analysis on streaming data is approximate • Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) • Sampling Rate • Size of summary structure • Application developers can expose these parameters and a range of values

  16. FREERIDE-G: Supporting Distributed Data-Intensive Science ? Compute Cluster User Data Repository Cluster

  17. Challenges for Application Development • Analysis of large amounts of disk resident data • Incorporating parallel processing into analysis • Processing needs to be independent of other elements and easy to specify • Coordination of storage, network and computing resources required • Transparency of data retrieval, staging and caching is desired

  18. FREERIDE-G Goals • Support High-End Processing • Enable efficient processing of large scale data mining computations • Ease Use of Parallel Configurations • Support shared and distributed memory parallelization starting from a common high-level interface • Hide Details of Data Movement and Caching • Data staging and caching (when feasible/appropriate) needs to be transparent to application developer

  19. Data Analysis Services • Multi-model Multi-Sensor Data Integration • Built on our Data Virtualization Framework • Query Planning Service • Feature Extraction: Integration with Grid Metadata Catalogs • Remote Mining of Spatio-Temporal Data • Built using FREERIDE-G • Mining algorithms for Data Streams • Built using GATES

  20. Recap

  21. Looking For • Feedback on our approach • Synergy with other efforts • Lessons learnt by others

More Related