1 / 24

Data Provenance in Remote Environmental Monitoring

Data Provenance in Remote Environmental Monitoring. Dr. Christian Skalka, University of Vermont, USA. Data Provenance in Remote Environmental Monitoring (REM). REM = automated collection of data from the natural environment in remote settings. Central points:

april
Download Presentation

Data Provenance in Remote Environmental Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA

  2. Data Provenance in Remote Environmental Monitoring (REM) REM = automated collection of data from the natural environment in remote settings. Central points: • Data provenance is fundamental to REM. • Data source, times, ownership are intrinsic. • REM hardware and software architectures pose unique challenges for establishing provenance. • Heterogeneous, distributed, low-power systems.

  3. Outline Two REM case studies and problem statements: • Snowpack monitoring (SnowMAN) • The SnowMAN project summary. • Microcosmic provenance issues, challenges. • SnowMAN provenance “coping mechanisms”. • Sagehen Creek Field Station network • Overview of project setting. • Macrocosmic provenance issues, challenges. • Possible approaches to central challenges.

  4. How Much Snow is Out There? • Snow/Water Equivalent (SWE): measurement of water content in snowpack • Not the same as snow height.

  5. How Much Snow is Out There? • Regional snowpack profiles are critically important to natural resource planning, public safety. • Real world measurement is complicated by terrain, forest canopies, wind, exposure. • Accurate realtime SWE measurement is a “holy grail” of REM.

  6. The UVM SnowMAN Project • A new approach to SWE measurement • Use modern computer technology for data acquisition and retrieval • A multi-modal approach to SWE approximation • Lightweight, low cost, robust, adaptable • Improved spatial and temporal resolution

  7. Multimodal Sensor Fusion • Algorithms on sensing nodes combine multiple sensing technologies of variable power cost: • Snow height via ultrasound (cheap) • Snow density via microwave absorption (moderate) • Snow density via gamma ray attenuation (expensive)

  8. SnowMAN System Architecture • Multiple data gathering-and-processing nodes connected via a Wireless Sensor Network (WSN) • Arduino-based on-site gateway provides datalogging via SD card, data processing • Remote data retrieval via TCP/IP over cellmodem

  9. Provenance Issues in SnowMAN • Data reported by sensors meaningless without provenance information: • Time of sampling event • Location of sample • Type and ADC conversion formula of sensor • Refinement of multimodal fusion algorithm requires history/cause of sampling event.

  10. Provenance Challenges in SnowMAN • Low-bandwidth requirements in WSNs • Messages must be small, infrequent. • Volatility of low-cost devices • WSN node failures require data reliability solutions • Heterogeneous network architecture • Data formats must be converted in network communications • Time synchronization

  11. Managing Provenance in SnowMAN • Reliability ensured by datalogging on gateway, replication within WSN. • Requires data source, time to be stored with readings. • Provenance information reported with data readings. • Component of packet format; not onerously large. • Data converted at “protocol boundaries”. • 802.15.4 to RS232 to TCP/IP to SQL. • Time synchronization handled by simple protocols. • Low precision sufficient; cellmodem provides “true” time.

  12. Outstanding Provenance Issues in SnowMAN • How to verify that data is converted properly at protocol boundaries? • How to encode history of multi-modal readings, for analysis and refinement of algorithms? • How to detect errors in data readings, due to sensor, time synchronization, node failure?

  13. REM in Macrocosm: Sagehen Creek Field Station Sagehen Creek Field Station and Experimental Forest located near Truckee, CA • Research and Teaching Facility of UC Berkeley • 9,000 acres of undisturbed wilderness, extensive REM technology

  14. REM in Macrocosm: Sagehen Creek Field Station • Literally hundreds of various sensor devices • Temperature, wind, humidity • Streamflow, Stream temperature • Snow height, SWE • Video • 9 hubs with (programmable) dataloggers, power, wireless transmission • Goal: wireless connectivity to field house and internet, off-site data warehousing • Multiple user, administration groups

  15. Sagehen Creek Field Station

  16. Provenance Issues at Sagehen • Inherits microcosmic issues (time, location, sensor modality essential to data). • Video triggering events should be reported. • Group data ownershipnow important to report (and maintain through data cycle). • Sagehen provenance should be credited in myriad end-uses of data. • Diagnostics of network functionality and services.

  17. Provenance Challenges at Sagehen Inherits microcosmic challenges, but: • Increased sampling rates, network traffic • Time synchronization much more complex • GPS auto-location for some sensors, manual for others • Much greater diversity of devices, communications mediums (wired, wireless) • More protocol boundaries • Multimedia

  18. Sagehen Provenance Issues: Scalability Sagehen network modeled as source-to-sink dataflow, from sensors to end-users. • Sources extensible by user groups • New sensors, sensor networks (e.g. WSNs) • New remote datalogging/replication architecture • Sink usable by end-user groups • Arbitrary visualization technologies • Diverse research and education applications

  19. Sagehen Network: The Current Reality • Establishing data communications backbone over IEEE802.11 wireless LAN. • Limited data collection over network (one-hop) via canned proprietary software. • Most data collection being done manually from dataloggers. • Sensors hardwired to dataloggers, no WSNs in the field. • Some one-hop connectivity between hubs.

  20. Sagehen Network: The Vision • Seamless source-to-sink dataflow. • From sensors in the field to off-site, permanent data warehouse. • Also accessible onsite at remote hubs (reliable). • Wireless sensor network capabilities in the field. • Attribution of data to source groups and Sagehen. • Easy extensibility of network at source end, to allow addition of new sensors (and WSNs).

  21. Some Ideas for Supporting Provenance in the Sagehen Software Architecture Treating data like messages on a protocol stack. • Stack defined across device (protocol) boundaries: • Sensor data is “raw”, collects more provenance information as it moves towards the sink. • Higher layers of provenance (time, ownership) encapsulate lower layers. • Allows compositional (principled) treatment of cross-protocol data transformation.

  22. Some Ideas for Supporting Provenance in the Sagehen Software Architecture Watermarking data to establish Sagehen and group ownership. • Easily done for video media. • Video retrieved only from the internet; watermarking performed on traditional platform. • Watermarking sensor data?? • Need to preserve data may not tolerate traditional techniques. • In-the-field retrieval requires in-the-field watermarking.

  23. Conclusion • Remote environmental monitoring requires provenance for correct interpretation of data. • REM networks heterogeneous, some components computationally “weak”. • Power, cost restrictions. • Protocol hodgepodge! • Adapting to REM environment a unique challenge for provenance in software.

  24. Conclusion Two case studies: • SnowMAN: lightweight, low cost SWE monitoring. • Sagehen Creek Field Station: REM in macrocosm. http:www.cs.uvm.edu/~skalka http://sagehen.ucnrs.org/

More Related