1 / 1

Applying Control Theory to Data Stream Processing Systems

TCQ query Q. online service. raw log data. Data Collection Automatic analysis. preprocessing. . ?. TCQ query Q. Repository. Sanitized Data. TCQ query R. Controlled Data Source. Output Rate Controller. 6+5+4. 3+2+1. Output Y from simulation. 4. 1. . 4. 1. TCQ query Q.

melina
Download Presentation

Applying Control Theory to Data Stream Processing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TCQquery Q onlineservice raw logdata Data Collection Automatic analysis preprocessing  ? TCQquery Q Repository Sanitized Data TCQquery R Controlled Data Source Output Rate Controller 6+5+4 3+2+1 Output Y from simulation 4 1  4 1 TCQquery Q Failure Detection 6 5 4 3 2 1 5 2 5 2 6 6 5 5 4 4 3 3 2 2 1 1 Source 6 3 6 3 Queue Length Monitor feedback loop Buffer TCQ Result Q Controlled Output Thread(Code Reuse) Queue Length Controller Desired Queue length Data Rate to TCQ Actual Queue Length Source Source P Controller with Pre-compensation PI Controller Client write duration is an outlier bytes-served <= 67958 | R_error-code = yes | | R_content-type = yes: true (253/6) | | R_content-type != yes: false (17) | R_error-code != yes | | gmt = 2003-06-24 00:01:07: true (54) | | gmt != 2003-06-24 00:01:07 | | | user-id = 96848766314153157: true (99/6) | | | user-id != 96848766314153157 | | | | gmt = 2003-06-24 02:23:28: true (45) | | | | gmt != 2003-06-24 02:23:28 | | | | | visit-url = 8227...: true (43) | | | | | visit-url != 8227...: false (18005) bytes-served > 67958: true (17733/55) Buffer Buffer TCQ TCQ Result Q Result Q Error Code bytes-served <= 195: 145 (135/9) bytes-served > 195 | R_content-len = yes: 32 (98) | R_content-len != yes | | R_not-cached-reason = yes: 32 (45/19) | | R_not-cached-reason != yes | | | duration <= 15.2 | | | | bytes-received <= 2680: -13 (39) | | | | bytes-received > 2680 | | | | | bytes-received <= 2805: 131 (30/7) | | | | | bytes-received > 2805: -13 (85/13) | | | duration > 15.2: 131 (69/6) Decision Trees Applying Control Theory to Data Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu) Bill Kramer Peter Bodik • Problem: TCQ drops tuples when result queue is full • Goal of control: • By controlling data rate to TCQ node • Regulate queue length on TCQ node • Prevent dropping tuples • Maximize throughput (and adapts when disturbance happens) • Preprocessing Data • Logs are in different format • Information we need may be implicit • Merge information from various sources • Sampling • Sanitize the data • Data stream processing • Continuous queries • Using Telegraph CQ • Preprocessing expressed as SQL queries • Queries over a sliding time window • Run multiple instances for scalability Problem: Actual output is not the same as desired rate for various reasons Goal: Providing an accurate data source using feedback control by controlling the “desired data rate” setting on the output thread Feature Selection Clustering Visualization See Poster Clustering DNS Problems load splitter combiner SLT 1 SLT 2 Scalable Software Architecture for Data Stream Processing If not careful with feedback control … System can become unstable under normal load Control theory analysis help make correct design

More Related