10 likes | 94 Views
Integrating CUAHSI HIS Cyberinfrastructure with Open Source DataTurbine Streaming Data Middleware. Thomas Whitenack, Sameer Tilak, Ilya Zaslavsky, Tony Fountain San Diego Supercomputer Center, UCSD, San Diego, CA. What is DataTurbine. Abstract. DataTurbine in Environmental Observing Projects.
E N D
Integrating CUAHSI HIS Cyberinfrastructure with Open Source DataTurbine Streaming Data Middleware Thomas Whitenack, Sameer Tilak, Ilya Zaslavsky, Tony Fountain San Diego Supercomputer Center, UCSD, San Diego, CA What is DataTurbine Abstract DataTurbine in Environmental Observing Projects The CUAHSI HIS (Hydrologic Information System) project has focused on consistent management of observations data available from government agencies as well as data published by academic investigators. Management of real time data is an important component of the CUAHSI HIS project. The Open Source DataTurbine Initiative is an NSF-supported effort that focuses on providing open source streaming data middleware to multiple environmental observation projects. The Open Source DataTurbine team has been collaborating with the CUAHSI HIS team in the development of applications that demonstrate the utility of DataTurbine for managing streaming data from hydrologic stations. In this project, DataTurbine is used to acquire and stream data from a range of sensors connected to a Campbell Datalogger, into a database system. The schema of this database follows the CUAHSI Observations Data Model (ODM). A Java application (DataTurbine sink program) was developed to retrieve realtime data from the DataTurbine server and populate the ODM's Values table. The streaming data were configured for access via the CUAHSI HIS GetValues service, making them available for querying from many CUAHSI HIS client applications. The DataTurbine middleware is capable of connecting with multiple types of sensors from different vendors and exposing streaming data in a uniform way. The ability to efficiently explore and integrate data streams from different distributed real time and archival sources, to create a comprehensive dynamic portrait of the state of environment in a given area, is an important component of the vision for environmental observatories. • Solution for accessing both streaming and static data, from different vendor systems, via a common interface • Released under Apache 2.0 Open Source License • Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec on PCs • Supported by NASA SBIR, 15 years in development • NSF invested in supporting open-source development of the DataTurbine: SDCI project, 2007-09. • Additional support from the Moore foundation, and from multiple observatory projects • It is one of just a handful comprehensive solutions for managing streaming data Integration of Heterogeneous Devices Live data and video from Santa Margarita Ecological Reserve (NEON) DataTurbine Capabilities Integrating DataTurbine in HIS • DataTurbine server system requirements • A computer running Linux, Windows, Unix, OSX or similar, with a working JVM version 1.1 or later. Different brands of JVMs should be fine (e.g. Sun, IBM, Jrocket, etc) • Enough memory to hold the data you want, and enough disk to contain the archive you want • A network connection that's fast enough and reliable enough for your needs. • Apple minis have been tested as minimal servers. With 2GB of memory, they're fast and cheap. However, since all that matters is the JVM, you can use whatever you prefer. • In general, more memory is good. A 32-bit JVM can use up to 3.5GB, and with a 64-bit JVM you can address as much as you can afford. • If you have extreme needs, consider a 64-bit Sun box. Good results with their Niagara-architecture T2000. • Note: 3 DataTurbine servers are available for public use: each with 4Gb RAM and shared 7.5 Tb archive space on RAID5 – see www.dataturbine.org About CUAHSI HIS The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities in the US and abroad. As part of its mission, CUAHSI supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi-institution effort focused on consistent management of observations data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by individual investigators. CUAHSI HIS develops service-oriented architecture for hydrologic research and education, to enable publication, discovery, retrieval, analysis and integration of hydrologic data. The project team has defined a common information model for organizing hydrologic observation data, designed a common exchange protocol (Water Markup Language) and developed a collection of SOAP web services (WaterOneFlow services) that provide uniform access to different federal, state and local hydrologic data repositories. This system is now implemented as a collection of Hydrologic Information Servers deployed at NSF-supported Hydrologic Observatory test beds. • Step 1:Download and install a DataTurbine server • download and double-click jar file to install, or use a public DataTurbine server Existing Deployments Live video from Kenting coral reef, off of SE Taiwan (CREON project) • Step 2:connect the DataTurbine with sensors • Configure the sensors • According to RBNB Simple API (SAPI) create a “source” program to insert stream data into DataTurbine server: • - SAPI documentation and example of source programs are included in the download • - for many sensors and vendors, source programs already exist (e.g. for LoggerNet) • Step 3: make sure the DataTurbine server receives the streams • Install Real-time Data Viewer (RDV), a common DataTurbine client • Write a small JNLP (Java Network Launch Protocol) file defining command line parameters and JVM options for launching RDV. Examples: http://it.nees.org/software/rdv/RDV.jnlp,http://geo.sdsc.edu/jnlp/RDV.jnlp • Launch RDV and verify the data streams NEES data with a user-authored wireframe data viewer added to RDV • Step 4: configure RBNB output to CUAHSI ODM • Setup ODM database instance and populate it is with metadata (using any of the data loading tools developed in HIS: ODM Data Loader, SDL, SSIS scripts. • Configure a Java program ‘stream2db’ (an RBNB DataTurbine sink program) to automatically insert data values into ODM database when the new data arrives in RBNB server (mapping sensor channels into table and column names in ODM) • Open the ODM instance in SQL Management Studio and verify that that the DataValues table has been populated with values from the sensors • Configure ODM web services over the ODM instance, register then in Central HIS, or in regional HIS Server (and visualize in DASH, Data Access System for Hydrology) NEON – Ecology http://neoninc.org GLEON – Hydroecology http://gleon.org/ CREON – Coral reefs http://www.coralreefeon.org/ MoveBank – Animal tracking http://www.princeton.edu/~wikelski/research/index.htm Bridges and Civil Infrastructure – Engineering http://healthmonitoring.ucsd.edu/ NEES – Earthquake Engineering http://it.nees.org/ PRAGMA – Pacific Rim Applications and Grid Middleware Assembly http://pragma-grid.net NASA data, integration with Google Earth and DataTurbine Conclusion Links CUAHSI HIS Service Oriented Architecture: General Outline The Open Source DataTurbine can be seamlessly integrated in CUAHSI HIS cyberinfrastructure, providing an efficient, scalable and fault-tolerant solution for streaming observations data to HIS components. Further work will include streaming large volumes of observations data from real-time stations maintained by government agencies, managing multimedia streams, and integration of Real-time Data Viewer with CUAHSI HIS online clients such as Hydroseek and DASH. CUAHSI HIS: http://www.cuahsi.org/his/ HIS Wiki @ SDSC : http://river.sdsc.edu/wiki Open Source DataTurbine: http://www.dataturbine.org Getting RBNB DataTurbine Software: http://code.google.com/p/dataturbineReal-time Data Viewer (RDV): http://code.google.com/p/rdv/ RBNB data displayed in CUAHSIDASH application (simulated air temperature stream)