A Key-Value Based Persistence Model for Sensor Networks: A Case Study of NetBEAMS
This thesis presents a comprehensive key-value based persistence model tailored for sensor networks, specifically focusing on the NetBEAMS infrastructure for environmental monitoring. By analyzing the motivations behind data persistence, relevant literature, and empirical technology selection, it offers a taxonomy of data persistence strategies. The study evaluates the challenges and solutions in managing sensor data, including real-time data streaming, archival storage, and metadata representation, with conclusions drawn from experimental results and future research directions highlighted.
A Key-Value Based Persistence Model for Sensor Networks: A Case Study of NetBEAMS
E N D
Presentation Transcript
A Key-Value-based Persistence Model for Sensor Networks By: Marcello Alves de Sales Junior Masters of Science in Computer Science Advisor: Prof. Arno Puder, Ph.D. Committee Chair: Prof. Marguerite Murphy, Ph.D. Department of Computer Science
Outline Motivation and Literature Review A Taxonomy for Data Persistence NetBEAMS: A Case Study Empirical Analysis for Technology Selection DSP Data Persistence: Design and Architecture Experimental Results: Correct Behavior and Performance Conclusions and Future Works
Motivation • Persistence for NetBEAMS Sensor Network Infrastructure • Component-based sensor network for environmental monitoring (Puder et al) • Biologists are the main users of the system • What types of database systems? • Use the traditional relational data model? • Bai et al proposes programming languages (domain specific)
State of the Art in Data Persistence • Sensor Networks [Akyildiz et al] • Infrastructure (Topology) and Node Types • Size and Lifetime • Sensor Networks Nodes [Romer et al] • Deployment and Mobility • Size, Resources, Cost, Energy, Heterogeneity • Communication Mode • Coverage and Connectivity • [wp] [snc]
Persistence Storage for Sensor Networks • • How the Collected Data is Used • Real-time Data Stream • Data Archival • Storage Location for Collected Data • Local or External • Data-Centric • Query Processing Used • In-Network • Centralized • Data Volume Produce [ns]
Data Models and Query Engines • Tabular Data Model: flat files • Comma-separated values • Text Comparison • No Index • Relational Data Model: binary files • Data Normalization • Structured Query Language (SQL) • Lee et alproposed new operators for SQL • Structured Data Model: structured XML documents • XML Schema: document structure • XML Xpath: data retrieval • Database System Data Sink
How Collected Data is Described? • Data Stream:334 55.45 -23.44 119.394 44 1 22 | 5/ • Ledlie et al proposes the use of Data Provenance • Metadata: Data about data • What was collected? • Temperature = 54.3 : data • Scale = ’fahrenheit’ : metadata • When was the data collected? • Valid Time = Collected at 10:34am • Transaction Time = Time of Arrival • From where was the data collected? • GPS Coordinates: (12.342, -145.304) • Site: ‘lower-pier’ : metadata •
Problem: recent oil spill in the San Francisco Bay (Oct 2009) [sfb09] • Correlations between the collected data and the oil spill • Describing historical data events • Data Annotation • Liu et al annotates video frames fromsensor cameras • Descriptive Metadata • YouTube Video Tag • Tags for Web 2.0 “Junk” Data [an]
2. Data Persistence in Sensor Networks: a Proposed Taxonomy
2. A Taxonomy for Data Persistence • Taxonomy(Greek τάξις, taxis (meaning 'order', 'arrangement') and νόμος, nomos ('law' or 'science').) [Wikipedia]: • Practice and science of classification • Represented by hierarchical diagrams • Relationships between the root and branches Taxon Taxa
3. NetBEAMS: A Case Study • NetBEAMS: Data collection using Data Sensor Platform (DSP) • Automates operation of SF-BEAMS • SF-BEAMS: single-star sensor network – data archival • Nodes geographicallyfixed • Single-hop communication • Production intervals: 1, 6 or 15 minutes • Heterogeneous Devices • Coverage: Tiburon coast • 1 Data Sink (RTC Labs)
3. NetBEAMS: A Case Study • NetBEAMS Gateway Node • YSI Sonde + Gumstix Embedded System + GSM Modem Centralized Data SinkRTC Labs
Device Used by NetBEAMS • YSI 6600EDS V2: COTS Water Quality Monitoring • 13 Measurement parameters • 1 Year worth of raw data • Max 23.99 Mb at 1/min • 483,840 samples per year • 5 YSI in current deployment [ysi]
SF-BEAMS Classification
NetBEAMS Data Collection Scenario Missing Component!!!! 12.20 192 179 55 88.40 0.09 0.084 0.059 7.98 -79.6 99.5 8.83 0.4 8.7 Collected Data DSP Messages
Non-Functional Requirements • Open-Source • Free of charge • Easy to Scale (Data Partitioning) • Accessibility (API) • Cope with RTC Small Volume of Data
4. Technology Selection Empirical Analysis
4. Empirical Analysis for Technology Selection • Technologies used by the literature reviewed • MySQL: Jacob Nikomused it in Linux cluster for sensor networks; • TinyDB: Madden et al and Lee et al used it or sensor networks; • DB2: Sow et al used as a hybrid approach of XML and Relational • models to persist and query biometric events; • mongoDB: Buyya et al reported it in new trends in persistence in the cloud.
Use traditional Relational Databases • Tony Bain questions the adoption of the Relational Model • Traditional approach: 30 years • Accommodates changes? • Try adding entities • Try adding properties • Changes to the schema • Maintain schema normalized • Change Software Layers
Schema-less: Key-Value Pair Data Model • Data Collections: “denormalized” data • No Data Integrity = Data located on same physical space Annotation Observation Provenance
Tiburon, CA Berkeley, CA South Bay, CA • KVP Databases: better supports horizontal data partitioning • Shenker et alsurveysData-Centric Storage • Targeted Query vs Global Query Collected Sensor Data - Region 1 – Master Shard Collected Sensor Data - Region 2 – Master Shard Collected Sensor Data - Region 3 – Master Shard Projection Collected Sensor Data - Region 1 – Shard 2 Collected Sensor Data - Region 3 – Shard 2 Count Operation
5. DSP Data Sensor Platform: Design and Architecture
5. DSP Data Persistence: Design and Architecture • Persistence Scenario for NetBEAMS – Solution
Data Model Design: mongoDB Document Instance Where When • Data Manipulation: Programming Language Abstraction • ”Dot Notation” • sensor.location.latitude= 37.89 • time.transaction = Dec 17, 2009 • observation.pH = 7.11 What
Adding DSP Data Component Adding mongoDB
Deployment of the DSP Data Persistence • As External Storage Single Server • As Data-Centric Distributed Server
6. Experimental Results: Correct Behavior and Performance
6. Experimental Results: Correct Behavior and Performance • Goal: Simulate RTC Environment • Experiment Setup - Infrastructure • Key-Value definition • Randomly Generated YSI Sonde Data (R0) • Simulates Different Types of Storages using Virtualization; • Workload • Compatible data volume used by RTC • 1 YSI = 483,840 documents = First Round • 5 YSI = 5 * 483,840 = 2,419,200 = Consecutive Rounds
Scenarios • Use Cases as Agile User Stories – Persona, Action, Result • (R1) ”As a marine biologist, I would like to search observations by filtering values of the sensor device’s properties such as water temperature and salinity on December 17, 2009, so that I can find associated values to the observation.”; • db.SondeDataContainer.find( { observation.Salinity : 0.01, • observation.WaterTemperature : 46.47, • time.valid: new Date( 2009, 12, 17) } ) Programming Language mongoDB Abstraction to Access Data
Scenarios • •(U1) ”As a estuarine ecologist, I would like to annotate observations from the time the “oil spill” occurred in the San Francisco Bay, so that I can maintain historical evidence of the impact of such event.” • • db.SondeDataContainer.update( { • time.valid : { $gte:new Date(2009,10,12) , • $lt:new Date(2009,11,13) }} , • {$set : {tag: "oil spill"}} • ) Programming Language mongoDB Abstraction to Access Data
Implementation fulfills all the taxonomies • 1.35GB Claimed Disk Space • ~25,091 Inserts/min • Retrieval ~milliseconds • Update Varies (Depends on Partition Size, Dataset) • Simpler Implementation of Use Cases • Data accessibility • Different APIs, different languages • Key-Value Data Model • No schema changes to modify data design • Trade-off between Disk Storage (commodity) and performance
Data-Centric approach • Scales in terms of disk space available • Decreased processing time • Less data in a shard, faster query processing • Novel approach: alternative to existing ones • New Data Model Taxonomy
7. Conclusions and Future Works • How Important is Data Collection • Environmental Sensor Networks: Hazard Alerts • How to describe data: Data Provenance guidelines • Important descriptions: annotations, tags • Contributions • Data Persistence in Sensor Networks Taxonomies • Novel Approach: KVP data model for sensor networks data • Implementation for External and Data-Centric Storages • Technology ready for Cloud Computing
Future Works • Data-Centric Deployment with MapReduce Application • Sorting, subsets
Future Works • RTC gathers data by time period; Data are mostly repeated • Wang et al surveyed efficient schedulers for Sensor Networks; • Yin et al and Chen et al showed the use of Data Clustering before sending data to data sink; • Creation of a DSP Data Clustering before persisting data; • Research Problems • In-network storage/query using KVP databases • Partitioned Data nodes • Event-Based application developed on top of YSI Sonde Data • “observation.Battery” carries the battery life-time information;
References • Arno Puder, Teresa Johnson, Kleber Sales, Marcello de Sales, andDale Davidson. A component-based sensor network for environmen-tal monitoring. In SNA-2009: 1st International Conference on SensorNetworks and Applications, pages 54–60, San Francisco, CA, USA, November 2009. The International Society for Computers and Their Applica-tions - ISCA. • I.F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci.A survey on sensor networks. Communications Magazine, IEEE,40(8):102–114, Aug 2002. • K. Romer and F. Mattern. The design space of wireless sensor networks.IEEE Wireless Communications, 11(6):54–61, December 2004 • Seungjae Lee, Changhwa Kim, and Sangkyung Kim. New database operators for sensor networks. In SERA ’07: Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management & Applications, pages 689–696, Washington, DC, USA, 2007. IEEE Computer Society. • Jonathan Ledlie, Chaki Ng, and David A. Holland. Provenance-aware sensor data storage. In ICDEW ’05: Proceedings of the 21st International Conference on Data Engineering Workshops, page 1189, Washington, DC, USA, 2005. IEEE Computer Society. • Xiaotao Liu, Mark Corner, and PrashantShenoy. Seva: Sensor-enhancedvideo annotation. ACM Trans. Multimedia Comput. Commun. Appl., 5(3):1–26, 2009. • Jacob Nikom. Real-time sensor data warehouse architecture using mysql. InMySQL Users Conference. O’Reilly Media, Inc., April 2005. • [sfb09] Oil spills into s.f. bay south of bay bridge. http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/10/30/BA9B1ACTST.DTL,October 2009. • Images • [sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/ • [snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg • [ns] http://www.cc.gatech.edu/projects/disl/specialProjects/figure1.gif • [an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png • []ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
References • Daby M. Sow, Lipyeow Lim, Min Wang, and Kyu Hyun Kim. Persisting and querying biometric event streams with hybrid relational-xml dbms. In DEBS’07: Proceedings of the 2007 inaugural international conference on Distributedevent-based systems, pages 189–197, New York, NY, USA, 2007. ACM. • Samuel R.Madden, Michael J. Franklin, JosephM. Hellerstein, andWeiHong.Tinydb: an acquisition query processing system for sensor networks. ACMTrans. Database Syst., 30(1):122–173, 2005 • . • Images • [sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/ • [snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg • [ns] http://www.cc.gatech.edu/projects/disl/specialProjects/figure1.gif • [an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png • []ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
Department of Computer Science A Key-Value-based Persistence Model for Sensor Networks ? Marcello de Sales Master of Science in Computer Science(msales@sfsu.edu) http://code.google.com/p/netbeams http://www.netbeams.org “The brick walls are not there to keep us out. The brick walls are thereto give us a chance to show how badly we want something. Because the brick walls are there to stop the people who don't want it badly enough.” Dr. Randy Pausch
DSP in practice = NetBEAMSUse Cases • Data Payload for the YSI Sonde 6600V2 • SondeDataType: representation for the collected data • SondeDataContainer: collection of the collected data
Data Sensor Platform (DSP)Message Structure • DSP Message • Header • Producer • Consumer • Body • Message Content • DSP Messages Container • Package of DSP Messages
Data Sensor Platform (DSP)Communication Mechanism • DSP Broker • Local delivery • Remote delivery • Gateway Component • DSP Matcher • Filtering based on rules • Independent Per Host
3. NetBEAMS: A Case Study DSP Data Persistence component Requirements • Open-Source • Support Data-Centric • Free of charge • Accessibility (API) • Cope with RTCSmall Volume of Data