1 / 7

THEMIS: Fairness in Data Stream Processing under Overload

This paper explores the problem of data shedding in data stream processing under overload conditions and proposes a fair shedding mechanism based on the Source Information Content (SIC) metric. It also discusses the challenges of implementing shedding in a distributed setup and explores the possibility of a self-aware system for fair data shedding.

caver
Download Presentation

THEMIS: Fairness in Data Stream Processing under Overload

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THEMIS: Fairness in Data Stream Processing under Overload Marco Fiscato Imperial College London, UK EvangeliaKalyvianaki City University London, UK TheodorosSalonidis IBM Research, USA Peter Pietzuch Imperial College London, UK 15041 Model-driven Algorithms and Architectures for Self-Aware Computing Systems, Dagstuhl 2015

  2. The Puzzle of Big Data Real-Time Processing Engines in Data Centres Queries overload data center resources. How to efficiently allocate resources across clusters/engines?

  3. Data Shedding a well-known mechanism to handle transient overload conditions is to discard data overloaded overloaded How much data should we shed from queries? How to measure shedding across queries? How to implement shedding in this distributed setup? A well-known technique to handle transient overload conditions is to discard data [][][]

  4. How to measure shedding across queries? shedding data  reduced correctness  degraded performance different dropped data  difference degrees of degradation Source Information Content (SIC) metric measures the contribution of data from sources to results 11/6 < 3 degraded processing perfect processing SIC is a data-stream-processing-aware metric. But can we have a metric that is operator- or query-aware?

  5. Fair Shedding for Equalising SIC values each local shedder equalisesthe SIC values of its own queries global coordination is achieved with local informed shedding

  6. SIC Fair Shedder to address nodes’ heterogeneity and workload variations: online cost model estimates the time to process an average tuple Could we build the system to be goal-aware?

  7. A self-aware autonomic system for data processing in real-time Systems already have (some) adaption and (some) self-awareness but could we extend to (full) self-awareness? For example, can we build a self-aware system to perform fair data shedding for data stream processing and databases andfilesystems in overload? Thank you! Questions? evangelia.kalyvianaki.1@city.ac.uk http://www.staff.city.ac.uk/~sbbj913/

More Related