1 / 1

Introduction

loyal
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Control-Based Load Shedding in Data Stream ManagementYicheng Tu†, Song Liu‡, Sunil Prabhakar†, Bin Yao‡†Indiana Center of Database Systems, Department of Computer Sciences, 305 N. University Street, West Lafayette, IN 47907‡School of Mechanical Engineering, 140 S. Intramural Drive, West Lafayette, IN 47907 • Introduction • Data Stream Management Systems (DSMSs) process large number of data streams to answer user-specified queries. These systems are generally built following a query-passive data-active model, in which all data are pushed to the database server for processing and query results are sent to the users continuously. Data processing delay is critical in DSMSs since query results generated from old data are useless to users. In case of overloading, data tuples have to be discarded without processing in order to achieve desired processing delay. This is called load shedding. • Key Questions: • When? • How much? • Where? • We focus on the first two questions. • Our approach • View it as a feedback control problem • Develop a dynamic model for a specific DSMS • Design controller via rigorous control-theoretical methods • Work on a real DSMS – the open-source Borealis system Figure 3. The feedback control loop for load shedding. Output (y): average tuple delay; Input (u): tuple injection rate to DSMS; target delay value (yr) and control error (e). Figure 5. Relative performance of CTRL to AURORA and BASELINE. A, B, C: various aspects of delay violations; D: percentage of data discarded. • Results • Obtained a first-order linear model for Borealis • Pole placement-based design ended up a PD controller: • where c and H are system-specific constants and T is the control period. • Identified and solved several DSMS-specific problems • Control framework evaluated with real and synthetic data Figure 1.Pushed-based DSMS system model. Figure 6. Robustness of CTRL and AURORA tested with input streams of different burstiness (smaller bias factor represents more bursty stream). • Objective • To design and implement a load shedding framework that • minimize the data loss; • maintains processing delays in rejection to disturbances: • - bursty data arrivals; • - internal dynamics of DSMS. • is robust, i.e., works for a wide range of input streams. • Conclusions • First database work that uses feedback-control-theoretical methods; • 2. Rigorous system modeling and controller design generate a PD controller that controls average tuple delays by adjusting the amount of load shedding; • 3. Control framework implemented and evaluated in real DSMS. Experiments show that feedback-control-based method significantly improves control of delays with the same amount of data loss as compared to current solutions. • 4.The above solution is also robust. Acknowledgements This is joint work with my advisor, Prof. Sunil Prabhakar (sunil@cs.purdue.edu), Dr. Song Liu (liu1@purdue.edu) and Prof. Bin Yao (byao@purdue.edu) of the School of Mechanical Engineering in Purdue University. The author would also like to thank Ms. Nesime Tatbul and Prof. Ston Zdonik, both from the Computer Science department of Brown University, for providing the Aurora/Borealis source code. Figure 4. Performance of our load shedding solution (CTRL), AURORA, an open-loop solution that represents state-of-the-art in DSMS load shedding, and BASELINE, a naïve feedback-based solution. Figure 2. Examples of disturbances in data processing in DSMS. Top: bursty arrival rates; Bottom: unit processing costs.

More Related