cyberinfrastructure challenges for environmental observatories l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Cyberinfrastructure Challenges for Environmental Observatories PowerPoint Presentation
Download Presentation
Cyberinfrastructure Challenges for Environmental Observatories

Loading in 2 Seconds...

play fullscreen
1 / 31

Cyberinfrastructure Challenges for Environmental Observatories - PowerPoint PPT Presentation


  • 163 Views
  • Uploaded on

Cyberinfrastructure Challenges for Environmental Observatories. Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Cyberinfrastructure Challenges for Environmental Observatories' - danton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cyberinfrastructure challenges for environmental observatories

Cyberinfrastructure Challenges for Environmental Observatories

Barbara Minsker

Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications;

Professor, Dept of Civil & Environ. Engineering;

University of Illinois, Urbana, IL, USA

January 9, 2007

National Center for Supercomputing Applications

background
Background
  • NSF Office of Cyberinfrastructure is funding NCSA and SDSC to:
    • Work with leading edge communities to develop cyberinfrastructure to support science and engineering
    • Incorporate successful prototypes into a persistent cyberinfrastructure
  • NCSA runs the CLEANER Project Office, which is leading planning for the WATERS Network, one of 3 NSF proposed environmental observatories
    • Co-Directors: Barbara Minsker, Jerald Schnoor (U of Iowa), Chuck Haas (Drexel U)
  • To support WATERS planning, NCSA’s Environmental CyberInfrastructure Demonstrator (ECID) project is creating a prototype CI
    • Driven by requirements gathering and close community collaborations

National Center for Supercomputing Applications

waters network wat er and e nvironmental r esearch s ystems network
WATERS NetworkWATer and Environmental Research Systems Network
  • Joint collaboration between the CLEANER Project Office and CUAHSI, Inc, sponsored by ENG & GEO Directorates at the National Science Foundation (NSF)
    • CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research
    • CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science
  • Planning underway to build a nationwide environmental observatory network using NSF’s Major Research Equipment and Facility Construction (MREFC) funding
    • Target construction date: 2011
    • Target operation date: 2015
slide4

WATERS DRAFT VISION

The WATERS Network will transform our understanding of the Earth’s water and related biogeochemical cycles across multiple spatial and temporal scales to enable forecasting and management of critical water processes affected by human activities.

slide5

WATERS DRAFT GRAND CHALLENGES

  • To detect the interactions of human activities and natural perturbations with the quantity, distribution and quality of water in real time.
  • To predict the patterns and variability of processes affecting the quantity and quality of water at scales from local to continental.
  • To achieve optimal management of water resources through the use of institutional and economic instruments.
slide7

Network Design Principles:

  • Enable multi-scale, dynamic predictive modeling for water, sediment,
  • and water quality (flux, flow paths, rates), including:
    • Near-real-time assimilation of data
    • Feedback for observatory design
    • Point- to national-scale prediction
  • Network provides data sets and framework to test:
    • Sufficiency of the data
    • Alternative model conceptualizations
  • Master Design Variables:
  • Scale
  • Climate (arid vs humid)
  • Coastal vs inland
  • Land use, land cover, population
  • density
  • Energy and materials/industry
  • Land form and geology

Nested (where appropriate) Observatories

over Range of Scales:

Point

Plot (100 m2)

Subcatchment (2 km2)

Catchment (10 km2) – single land use

Watershed (100–10,000 km2) – mixed use

Basin (10,000–100,000 km2)

Continental

Environmental Field Facilities (EFFs)

Observatory Scale

ci requirements gathering
CI Requirements Gathering
  • Interviews at conferences and meetings (Tom Finholt and staff, U. of Michigan)
  • Usability studies (NCSA, Wentling group)
  • Community survey (Finholt group)
    • AEESP and CUAHSI surveyed in 2006 as proxies for environmental engineering and hydrology communities
    • 313 responses out of 600 surveys mailed (52.2% response rate)
    • Key findings are driving ECID cyberenvironment development

National Center for Supercomputing Applications

what is the single most important obstacle to using data from different sources

Nonstandard/ inconsistent units/formats

  • Metadata problems
  • Other obstacles
What is the single most important obstacle to using data from different sources?
  • 55% concerned about insufficient credit for shared data
  • N=278

National Center for Supercomputing Applications

what three software packages do you use most frequently in your work
What three software packages do you use most frequently in your work?
  • *Other:
  • MS Word
  • MS PowerPoint
  • Statistics applications (e.g., Stata, R, S-Plus)
  • SigmaPlot
  • PHREEQC
  • MathCAD
  • FORTRAN compiler
  • Mathematica
  • GRASS GIS
  • Groundwater models
  • Modflow

Majority are not using high-end computational tools.

National Center for Supercomputing Applications

factors influencing technology adoption
Factors influencing technology adoption

Ease of use, good support, and new capabilities are essential.

National Center for Supercomputing Applications

slide13
What are the three most compelling factors that would lead you to collaborate with another person in your field?

Community seeks collaborations to gain different expertise.

National Center for Supercomputing Applications

waters ci challenges
WATERS CI Challenges
  • Clearly, the first requirement for observatory CI is that the community must gain access to observatory data
  • However, simply delivering the data through a Web portal is not going to allow the observatories to reach their full potential and meet the community’s requirements

National Center for Supercomputing Applications

waters ci challenges cont d
WATERS CI Challenges, Cont’d.
  • Understanding data quality and getting credit for data sharing requires an integrated provenance system to track what has been done with the data
  • Enabling users who do not have strong computational skills to work with the flood of environmental data requires:
    • Easy-to-use tools for manipulating large data sets, analyzing them, and assimilating them into models
    • Workflow integrators that allow users to integrate their tools and models with real-time streaming environmental data
  • The vast community of observatory users & the resources they generate create a need for knowledge networking tools to help them find collaborators, data, workflows, publications, etc.
  • To address these requirements, cyberenvironments are needed

National Center for Supercomputing Applications

environmental ci architecture research services
Environmental CI Architecture: Research Services

Integrated CI

ECID Project Focus: Cyberenvironments

Supporting Technology

Data

Services

Workflows & Model Services

Knowledge Services

Meta-Workflows

Collaboration Services

Digital Library

HIS Project Focus

Analyze Data &/or Assimilate into Model(s)

Link &/or Run Analyses &/or Model(s)

Create Hypo-thesis

Obtain Data

Discuss Results

Publish

Research Process

National Center for Supercomputing Applications

cyberenvironments
Cyberenvironments
  • Couple traditional desktop computing environments coupled with the resources and capabilities of a national cyberinfrastructure
  • Provide unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary and geographical boundaries.
  • ECID is demonstrating how cyberenvironments can:
    • Support observatory sensor and event management, workflow and scientific analyses, and knowledge networking, including provenance information to track data from creation to publication.
    • Provide collaborative environments where scientists, educators, and practitioners can acquire, share, and discuss data and information.
  • The cyberenvironments are designed with a flexible, service-oriented architecture, so that different components can be substituted with ease

National Center for Supercomputing Applications

ecid cyberenvironment components

SSO

ECID CyberEnvironment Components

CyberCollaboratory:

Collaborative Portal

CI:KNOW: Network Browser/

Recommender

CyberIntegrator:

Exploratory Workflow Integration

CUAHSI HIS Data Services

Tupelo

Metadata Services

Single Sign-On Security (coming)

Community Event

Management/Processing

National Center for Supercomputing Applications

cyberintegrator
CyberIntegrator
  • Studying complex environmental systems requires:
    • Coupling analyses and models
    • Real-time, automated updating of analyses and modeling with diverse tools
  • CyberIntegrator is a prototype workflow executor technology to support exploratory modeling and analysis of complex systems. Integrates the following tools to date:
    • Excel
    • IM2Learn image processing and mining tools, including ArcGIS image loading
    • D2K data mining
    • Java codes, including event management tools
  • Matlab & Fortran codes to be added soon. Additional tools will be included based on high priority needs of beta users.

National Center for Supercomputing Applications

slide20

CyberIntegrator Architecture

Example of CyberIntegrator Use:

Carrie Gibson created a fecal coliform prediction model in ArcGIS using

Model Builder that predicts annual average concentrations.

Ernest To rewrote the model as a macro in Excel to perform Monte Carlo

simulation to predict median and 90th percentile values.

CyberIntegrator’s goal: Reduce manual labor in linking these tools, visualizing the

results, and updating in real time.

National Center for Supercomputing Applications

real time simulation of copano bay tmdl with cyberintegrator
Real-Time Simulation of Copano Bay TMDL with CyberIntegrator

CyberIntegrator

Excel Executor

Im2Learn Executor

1

2

3

4

Streamflows to

Distributions

(Excel)

Fecal Coliform

Concentrations

Model

(Excel)

Load

Shapefiles

(Im2Learn)

Geo-reference

and Visualize Results

(Im2Learn)

USGS Daily

Streamflows

(web services)

Shapefiles

For Copano

Bay

call

data

National Center for Supercomputing Applications

sensor anomaly detection scenario
Sensor Anomaly Detection Scenario

Listens for data events & creates event when anomaly discovered.

User subscribes to anomaly detector workflows

Alerts user to anomaly detection, along with other events (logged-in users, new documents, etc.)

Dashboard

Event Manager

Anomalies

Anomaly Detector 1

Anomalies

Anomaly Detector 2

CCBay Sensor Map

Sensor data

Shares workflow to server

Sensor Data

CC Bay Sensor Monitor Page

Sensor map shows nearby related sensors so user can check data. Anomaly detector is faulty. CI-KNOW recommends alternate anomaly detector from Chesapeake Bay observatory.

CyberIntegrator loads recommended workflow. User adjusts parameters to CCBay Sensor.

CI-KNOW Network

CyberIntegrator

National Center for Supercomputing Applications

cyberenvironment technologies

CyberDashboard

Desktop Application

Raw Data

Anomaly Subscription

JMS Broker

(ActiveMQ 4.0.1)

JMS

JMS

Data and Anomaly

Subscriptions

Anomaly Publication

Data Subscriptions

JMS

JMS

JMS

Sensor Page Reference

CyberCollaboratory

URL

Workflow Service

CyberIntegrator Workflow

Workflow Reference

CyberIntegrator Workflow

URL

Recommender Network

Web Service

CyberIntegrator

SOAP

Workflow Publication/

Retrieval

Web Services

CI-KNOW

SOAP

ECID Managed Data/Metadata

Tupelo

RDBMS

Provenance

User Subscriptions

Workflow Templates

Semantic Content

Event Topics

Cyberenvironment Technologies

Metadata

Data

Anomalies

National Center for Supercomputing Applications

ecid corpus christi bay ccbay waters observatory testbed
ECID & Corpus Christi Bay (CCBay) WATERS Observatory Testbed
  • CCBay WATERS Observatory Testbed is one of 10 observatory testbeds recently funded by NSF
    • Collaboration of environmental engineering, hydrology, biology, and information technology researchers
  • Goal of the testbed:
    • Integrate ECID and HIS technology to create end-to-end environmental information system
    • Use the technology to study hypoxia in CCBay
      • Use real-time data streams from diverse monitoring systems to predict hypoxia one day ahead
      • Mobilize manual sampling crews when conditions are right

National Center for Supercomputing Applications

slide25

Sensors in Corpus Christi Bay

National Datasets (National HIS)

Regional Datasets (Workgroup HIS)

USGS

NCDC

TCOON

Dr. Paul Montagna

TCEQ

SERF

NCDC station

TCOON stations

TCEQ stations

Hypoxic Regions

Montagna stations

USGS gages

SERF stations

National Center for Supercomputing Applications

ccbay environmental information system
CCBay Environmental Information System

CCBay Sensors

Event-Triggered Workflow Execution

Dashboard Alert

Event-drivenResearch

Anomaly Detector

Hypoxia Predictor

Storage for LaterResearch

CyberIntegrator: Forecast

CyberCollaboratory: Contact Collaborators

National Center for Supercomputing Applications

ccbay near real time hypoxia prediction

D2K workflows

Visualize Hypoxia Risk

Water Quality Model

Fortran numerical models

Hypoxia Model Integrator

Visualize Hydrodynamics

Replace or Remove Errors

Anomaly Detection

Hypoxia Machine Learning Models

Hydrodynamic Model

Update Boundary Condition Models

Data

Archive

CCBay Near-Real-Time Hypoxia Prediction

Sensor net

C++ code

IM2Learn workflows

National Center for Supercomputing Applications

ccbay ci challenges
CCBay CI Challenges
  • Automating QA/QC in a real-time network
    • David Hill is creating sensor anomaly detectors using statistical models (autoregressive models using naïve, clustering, perceptron, and artificial neural network approaches; and multi-sensor models using dynamic Bayesian networks)
    • While statistical models can identify anomalies, it is sometimes difficult to differentiate sensor errors from unusual environmental phenomena
  • Getting access to the data, which are collected by different groups, stored in multiple formats in different locations
    • The project is defining a common data dictionary and units and will build Web services to translate

National Center for Supercomputing Applications

ccbay ci challenges contd
CCBay CI Challenges, Contd.
  • Integrating data into diverse models
    • Calibration uses historical data, typically done by hand
    • Near-real-time updating needs automated approaches
    • Models are complex and derivative-based calibration approaches would be difficult to implement
  • Model integration
    • Grids change from one type of model to another – defining a common coarse grid, with finer grids overlaid where needed
    • Data transformers must be built between models

National Center for Supercomputing Applications

conclusions
Conclusions
  • Creating CI for environmental data is challenging but the benefits in enabling larger-scale, near-real-time research will be enormous
  • The ECID Cyberenvironment demonstrates the benefits of end-to-end integration of cyberinfrastructure and desktop tools, including:
    • HIS-type data services
    • Workflow
    • Event management
    • Provenance and knowledge management, and
    • Collaboration for supporting environmental researchers, educators, and outreach partners
  • This creates a powerful system for linking observatory operations with flexible, investigator-driven research in a community framework (i.e., the national network).
    • Workflow and knowledge management support testing hypotheses across observatories
    • Provenance supports QA/QC and rewards for community contributions in an automated fashion.

National Center for Supercomputing Applications

acknowledgments
Acknowledgments
  • Contributors:
    • NCSA ECID team (Peter Bajcsy, Noshir Contractor, Steve Downey, Joe Futrelle, Hank Green, Rob Kooper, Yong Liu, Luigi Marini, Jim Myers, Mary Pietrowicz, Tim Wentling, York Yao, Inna Zharnitsky)
    • Corpus Christi Bay Testbed team (PIs: Jim Bonner, Ben Hodges, David Maidment, Barbara Minsker, Paul Montagna)
  • Funding sources:
    • NSF grants BES-0414259, BES-0533513, and SCI-0525308
    • Office of Naval Research grant N00014-04-1-0437

National Center for Supercomputing Applications