110 likes | 187 Views
This study presents a learning approach for reducing data packets in sensor networks to save base station resources. The approach involves selective reporting of interested data points based on a classification model to maximize the sensor lifetime. Cost comparison and performance evaluation are conducted using a Naïve Bayes classifier. Results show the cost-effectiveness of the proposed strategy over traditional approaches.
E N D
A learning approach for reducing data packets in sensor networks Yinghui Na
Problems • Sensors have very limited computation capability • Traditionally, the sensors report ALL collected data to base station • In some situation, most of these data is not interested • E.g., rare event (intrusion) detection purpose • Is there a way to report only interested data to BS, and thus to save limited BS resource to maximize lifetime of sensors
Classification • Interaction between data mining algorithms and network protocols • Classification: a task of induction of finding patterns • Assign objects to one of predefined categories • A ‘supervised’ approach to classify the unknown (test data) based on well-know (training data).
Approach • We denote class label of the i-th example xi by yi, where yi∈ Y={0,1}. O is negative and 1 is positive • Collected data points can be labeled at the base station as positive (interesting) and negative (not interesting) • Process • Initialization: at beginning, the BS has no data points; the sensors send all data points until the first model from the base station is received • Classification model creation: BS forms the classification model based the received minimum number of positive examples • Sensors report collected data selectively: Sensors report all positive data points and part of negative data points based on the model • BS updates the model: BS retains all received data and update the model
Cost comparison • If report all collected data, the cost is Cb=N*c, the N is the number of all data points, and we assume that c, the cost of sending a data point, is a constant • In the proposed approach, the total cost is C=Ns*c+NfpCfp+NfnCfn+NmCm, where is the number of selected data points from sensor to BS; Nfp and Nfn are numbers of false positives and false negatives respectively; Cfp and Cfn are their corresponding costs per data point; Nm is the number of models sent by the base station to the sensors and Cm is the cost of such communication • The approach is profitable only if the cost of proposed approach is lower than the cost of traditional approach
Cost matrix • The penalties of classifying the data points in BS can be represented by a 2*2 matrix with element c(i,j). C(0,0) denotes the penalty for not sending a negative data point, c(0,1) the penalty for a false negative, c(1,0) the penalty for sending a negative example, and c(1,1) the penalty for sending a true positive data point. • We assume that c(0,0)=0 and c(1,1) =c. penalties c(0,1)= Cfp and c(1,0)=c+ Cfn are varied.
Classification modeling • Naïve Bayes classifier • This is a direct application of Bayes’ • P(C|X) = P(X|C)P(C)/P(X): X – a vector of x1, x2,.., xn • We used NBC as the classification modeling technique in base station
Performance evaluation • Simulation • Tossim • The network simulation parameters were: packet sizes of 32 bytes (sensor data), and 140 bytes (BS learning model); 100 nodes • PRTools • We used the PRTools data generators [] and obtained examples using gendatd and gendatb routines. The dataset contained 1,000,000 examples. Furthermore, we assume that the probability of a positive example was 0.02 and the probability of negative example was 0.98.
Results • We assume that the cost for c(1,0) ∈ {1,4,16,64} and c(0,1) ∈ {1,4,16,64,256,1024,4096}. • Traditionally, the cost of transmitting 1,000,000 data points will have total 1,000,000 cost • In the table, the increase of the false negative penalty beyond 1024 resulted in a non-profitable system.
References [1] F. Zhao and L. Guibas. Wireless Sensor Networks: An Information Processing Approach. Morgan Kaufmann, 2004. [2] H. Kargupta. Distributed Data Mining for Sensor Networks, PKDD 2004,Tutorial. [3] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy- e±cient communication protocol for wireless microsensor networks. In In Proccedings of the Hawaii Conference on System Sciences, January 2000. [4] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. An application-specific protocol architecture for wireless microsensor net- works. IEEE Transactions on Wireless Communications, 1(4):660-670, 2002. [5] P. Radivojac, U. Korad, K. M. Sivalingam, and Z. Obradovic. Learning from class-imbalanced data in wireless sensor networks. In 58th IEEE Semiannual Conf. Vehicular Technology Conference (VTC), volume 5, pages 3030-3034, Orlando, FL, October, 2003. [6] S. S. Ghiasi, A. Srivastava, X. Yang, and M. Sarrafzadeh. Optimal energy aware clustering in sensor networks. Sensors, 2:258-269, 2002.[7] O. Younis and S. Fahmy. Heed: A hybrid, energy-efficient, distributed clustering approach for ad-hoc sensor networks. IEEE Transactions on Mobile Computing, 3(4), 2004. [8] W. Chen, J. C. Hou, and L. Sha. Dynamic clustering for acoustic tar- get tracking in wireless sensor networks. IEEE Transactions on Mobile Computing, 3(3):258-271, 2004. [9] D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tso- tras, M. Vlachos, N. Koudas, and D. Srivastava. The threshold join algorithm for top-k queries in distributed sensor networks. In DMSN '05: Proceedings of the 2nd international workshop on Data management for sensor networks, pages 61-66, New York, NY, USA, 2005. ACM Press. [10] T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Dis- tributed deviation detection in sensor networks. SIGMOD Record, 32(4):77-82, December, 2003. [11] Loo K., Tong I., Kao B., and Cheung D. Online Algorithms for Mining Inter-Stream Associations From Large Sensor Networks. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2005 [12] S. Vucetic, D. Pokrajac, H. Xie and Z. Obradovic. Dection of underrepresented biological sequences using class-conditional distribution models, in proceeding of Third SIM Interational Conference on Data Mining, May 2003 [13] Department of University of California, Berkeley. TOSSIM: Simulating TinyOS Networks. http://www.cs.berkeley.edu/~pal/research/tossim.html [14] ‘PRTools, a Matlab Toolbox for pattern Recognition,’ http://www.ph.tn.tudelft.nl/prtools, 2002