1 / 31

Cyberinfrastructure Challenges for Environmental Observatories

Cyberinfrastructure Challenges for Environmental Observatories. Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA

graham
Download Presentation

Cyberinfrastructure Challenges for Environmental Observatories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructure Challenges for Environmental Observatories Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA January 9, 2007 National Center for Supercomputing Applications

  2. Background • NSF Office of Cyberinfrastructure is funding NCSA and SDSC to: • Work with leading edge communities to develop cyberinfrastructure to support science and engineering • Incorporate successful prototypes into a persistent cyberinfrastructure • NCSA runs the CLEANER Project Office, which is leading planning for the WATERS Network, one of 3 NSF proposed environmental observatories • Co-Directors: Barbara Minsker, Jerald Schnoor (U of Iowa), Chuck Haas (Drexel U) • To support WATERS planning, NCSA’s Environmental CyberInfrastructure Demonstrator (ECID) project is creating a prototype CI • Driven by requirements gathering and close community collaborations National Center for Supercomputing Applications

  3. WATERS NetworkWATer and Environmental Research Systems Network • Joint collaboration between the CLEANER Project Office and CUAHSI, Inc, sponsored by ENG & GEO Directorates at the National Science Foundation (NSF) • CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research • CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science • Planning underway to build a nationwide environmental observatory network using NSF’s Major Research Equipment and Facility Construction (MREFC) funding • Target construction date: 2011 • Target operation date: 2015

  4. WATERS DRAFT VISION The WATERS Network will transform our understanding of the Earth’s water and related biogeochemical cycles across multiple spatial and temporal scales to enable forecasting and management of critical water processes affected by human activities.

  5. WATERS DRAFT GRAND CHALLENGES • To detect the interactions of human activities and natural perturbations with the quantity, distribution and quality of water in real time. • To predict the patterns and variability of processes affecting the quantity and quality of water at scales from local to continental. • To achieve optimal management of water resources through the use of institutional and economic instruments.

  6. Network Design Principles: • Enable multi-scale, dynamic predictive modeling for water, sediment, • and water quality (flux, flow paths, rates), including: • Near-real-time assimilation of data • Feedback for observatory design • Point- to national-scale prediction • Network provides data sets and framework to test: • Sufficiency of the data • Alternative model conceptualizations • Master Design Variables: • Scale • Climate (arid vs humid) • Coastal vs inland • Land use, land cover, population • density • Energy and materials/industry • Land form and geology Nested (where appropriate) Observatories over Range of Scales: Point Plot (100 m2) Subcatchment (2 km2) Catchment (10 km2) – single land use Watershed (100–10,000 km2) – mixed use Basin (10,000–100,000 km2) Continental Environmental Field Facilities (EFFs) Observatory Scale

  7. CI Requirements Gathering • Interviews at conferences and meetings (Tom Finholt and staff, U. of Michigan) • Usability studies (NCSA, Wentling group) • Community survey (Finholt group) • AEESP and CUAHSI surveyed in 2006 as proxies for environmental engineering and hydrology communities • 313 responses out of 600 surveys mailed (52.2% response rate) • Key findings are driving ECID cyberenvironment development National Center for Supercomputing Applications

  8. Nonstandard/ inconsistent units/formats • Metadata problems • Other obstacles What is the single most important obstacle to using data from different sources? • 55% concerned about insufficient credit for shared data • N=278 National Center for Supercomputing Applications

  9. What three software packages do you use most frequently in your work? • *Other: • MS Word • MS PowerPoint • Statistics applications (e.g., Stata, R, S-Plus) • SigmaPlot • PHREEQC • MathCAD • FORTRAN compiler • Mathematica • GRASS GIS • Groundwater models • Modflow Majority are not using high-end computational tools. National Center for Supercomputing Applications

  10. Factors influencing technology adoption Ease of use, good support, and new capabilities are essential. National Center for Supercomputing Applications

  11. What are the three most compelling factors that would lead you to collaborate with another person in your field? Community seeks collaborations to gain different expertise. National Center for Supercomputing Applications

  12. WATERS CI Challenges • Clearly, the first requirement for observatory CI is that the community must gain access to observatory data • However, simply delivering the data through a Web portal is not going to allow the observatories to reach their full potential and meet the community’s requirements National Center for Supercomputing Applications

  13. WATERS CI Challenges, Cont’d. • Understanding data quality and getting credit for data sharing requires an integrated provenance system to track what has been done with the data • Enabling users who do not have strong computational skills to work with the flood of environmental data requires: • Easy-to-use tools for manipulating large data sets, analyzing them, and assimilating them into models • Workflow integrators that allow users to integrate their tools and models with real-time streaming environmental data • The vast community of observatory users & the resources they generate create a need for knowledge networking tools to help them find collaborators, data, workflows, publications, etc. • To address these requirements, cyberenvironments are needed National Center for Supercomputing Applications

  14. Environmental CI Architecture: Research Services Integrated CI ECID Project Focus: Cyberenvironments Supporting Technology Data Services Workflows & Model Services Knowledge Services Meta-Workflows Collaboration Services Digital Library HIS Project Focus Analyze Data &/or Assimilate into Model(s) Link &/or Run Analyses &/or Model(s) Create Hypo-thesis Obtain Data Discuss Results Publish Research Process National Center for Supercomputing Applications

  15. Cyberenvironments • Couple traditional desktop computing environments coupled with the resources and capabilities of a national cyberinfrastructure • Provide unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary and geographical boundaries. • ECID is demonstrating how cyberenvironments can: • Support observatory sensor and event management, workflow and scientific analyses, and knowledge networking, including provenance information to track data from creation to publication. • Provide collaborative environments where scientists, educators, and practitioners can acquire, share, and discuss data and information. • The cyberenvironments are designed with a flexible, service-oriented architecture, so that different components can be substituted with ease National Center for Supercomputing Applications

  16. SSO ECID CyberEnvironment Components CyberCollaboratory: Collaborative Portal CI:KNOW: Network Browser/ Recommender CyberIntegrator: Exploratory Workflow Integration CUAHSI HIS Data Services Tupelo Metadata Services Single Sign-On Security (coming) Community Event Management/Processing National Center for Supercomputing Applications

  17. CyberIntegrator • Studying complex environmental systems requires: • Coupling analyses and models • Real-time, automated updating of analyses and modeling with diverse tools • CyberIntegrator is a prototype workflow executor technology to support exploratory modeling and analysis of complex systems. Integrates the following tools to date: • Excel • IM2Learn image processing and mining tools, including ArcGIS image loading • D2K data mining • Java codes, including event management tools • Matlab & Fortran codes to be added soon. Additional tools will be included based on high priority needs of beta users. National Center for Supercomputing Applications

  18. CyberIntegrator Architecture Example of CyberIntegrator Use: Carrie Gibson created a fecal coliform prediction model in ArcGIS using Model Builder that predicts annual average concentrations. Ernest To rewrote the model as a macro in Excel to perform Monte Carlo simulation to predict median and 90th percentile values. CyberIntegrator’s goal: Reduce manual labor in linking these tools, visualizing the results, and updating in real time. National Center for Supercomputing Applications

  19. Real-Time Simulation of Copano Bay TMDL with CyberIntegrator CyberIntegrator Excel Executor Im2Learn Executor 1 2 3 4 Streamflows to Distributions (Excel) Fecal Coliform Concentrations Model (Excel) Load Shapefiles (Im2Learn) Geo-reference and Visualize Results (Im2Learn) USGS Daily Streamflows (web services) Shapefiles For Copano Bay call data National Center for Supercomputing Applications

  20. Sensor Anomaly Detection Scenario Listens for data events & creates event when anomaly discovered. User subscribes to anomaly detector workflows Alerts user to anomaly detection, along with other events (logged-in users, new documents, etc.) Dashboard Event Manager Anomalies Anomaly Detector 1 Anomalies Anomaly Detector 2 CCBay Sensor Map Sensor data Shares workflow to server Sensor Data CC Bay Sensor Monitor Page Sensor map shows nearby related sensors so user can check data. Anomaly detector is faulty. CI-KNOW recommends alternate anomaly detector from Chesapeake Bay observatory. CyberIntegrator loads recommended workflow. User adjusts parameters to CCBay Sensor. CI-KNOW Network CyberIntegrator National Center for Supercomputing Applications

  21. CyberDashboard Desktop Application Raw Data Anomaly Subscription JMS Broker (ActiveMQ 4.0.1) JMS JMS Data and Anomaly Subscriptions Anomaly Publication Data Subscriptions JMS JMS JMS Sensor Page Reference CyberCollaboratory URL Workflow Service CyberIntegrator Workflow Workflow Reference CyberIntegrator Workflow URL Recommender Network Web Service CyberIntegrator SOAP Workflow Publication/ Retrieval Web Services CI-KNOW SOAP ECID Managed Data/Metadata Tupelo RDBMS Provenance User Subscriptions Workflow Templates Semantic Content Event Topics Cyberenvironment Technologies Metadata Data Anomalies National Center for Supercomputing Applications

  22. ECID & Corpus Christi Bay (CCBay) WATERS Observatory Testbed • CCBay WATERS Observatory Testbed is one of 10 observatory testbeds recently funded by NSF • Collaboration of environmental engineering, hydrology, biology, and information technology researchers • Goal of the testbed: • Integrate ECID and HIS technology to create end-to-end environmental information system • Use the technology to study hypoxia in CCBay • Use real-time data streams from diverse monitoring systems to predict hypoxia one day ahead • Mobilize manual sampling crews when conditions are right National Center for Supercomputing Applications

  23. Sensors in Corpus Christi Bay National Datasets (National HIS) Regional Datasets (Workgroup HIS) USGS NCDC TCOON Dr. Paul Montagna TCEQ SERF NCDC station TCOON stations TCEQ stations Hypoxic Regions Montagna stations USGS gages SERF stations National Center for Supercomputing Applications

  24. CCBay Environmental Information System CCBay Sensors Event-Triggered Workflow Execution Dashboard Alert Event-drivenResearch Anomaly Detector Hypoxia Predictor Storage for LaterResearch CyberIntegrator: Forecast CyberCollaboratory: Contact Collaborators National Center for Supercomputing Applications

  25. D2K workflows Visualize Hypoxia Risk Water Quality Model Fortran numerical models Hypoxia Model Integrator Visualize Hydrodynamics Replace or Remove Errors Anomaly Detection Hypoxia Machine Learning Models Hydrodynamic Model Update Boundary Condition Models Data Archive CCBay Near-Real-Time Hypoxia Prediction Sensor net C++ code IM2Learn workflows National Center for Supercomputing Applications

  26. CCBay CI Challenges • Automating QA/QC in a real-time network • David Hill is creating sensor anomaly detectors using statistical models (autoregressive models using naïve, clustering, perceptron, and artificial neural network approaches; and multi-sensor models using dynamic Bayesian networks) • While statistical models can identify anomalies, it is sometimes difficult to differentiate sensor errors from unusual environmental phenomena • Getting access to the data, which are collected by different groups, stored in multiple formats in different locations • The project is defining a common data dictionary and units and will build Web services to translate National Center for Supercomputing Applications

  27. CCBay CI Challenges, Contd. • Integrating data into diverse models • Calibration uses historical data, typically done by hand • Near-real-time updating needs automated approaches • Models are complex and derivative-based calibration approaches would be difficult to implement • Model integration • Grids change from one type of model to another – defining a common coarse grid, with finer grids overlaid where needed • Data transformers must be built between models National Center for Supercomputing Applications

  28. Conclusions • Creating CI for environmental data is challenging but the benefits in enabling larger-scale, near-real-time research will be enormous • The ECID Cyberenvironment demonstrates the benefits of end-to-end integration of cyberinfrastructure and desktop tools, including: • HIS-type data services • Workflow • Event management • Provenance and knowledge management, and • Collaboration for supporting environmental researchers, educators, and outreach partners • This creates a powerful system for linking observatory operations with flexible, investigator-driven research in a community framework (i.e., the national network). • Workflow and knowledge management support testing hypotheses across observatories • Provenance supports QA/QC and rewards for community contributions in an automated fashion. National Center for Supercomputing Applications

  29. Acknowledgments • Contributors: • NCSA ECID team (Peter Bajcsy, Noshir Contractor, Steve Downey, Joe Futrelle, Hank Green, Rob Kooper, Yong Liu, Luigi Marini, Jim Myers, Mary Pietrowicz, Tim Wentling, York Yao, Inna Zharnitsky) • Corpus Christi Bay Testbed team (PIs: Jim Bonner, Ben Hodges, David Maidment, Barbara Minsker, Paul Montagna) • Funding sources: • NSF grants BES-0414259, BES-0533513, and SCI-0525308 • Office of Naval Research grant N00014-04-1-0437 National Center for Supercomputing Applications

More Related