Twitter Frenzy FPGA Data Stream Processing Cory Kleinheksel (Team Leader) Tim Meyer David Graziano Josh Clausman
Project Idea • Twitter Frenzy - A way to filter tweets as a set of frequencies using a FPGA to perform packet analysis. • Accelerate the stream processing of Twitter data queries. • Specifically accelerate computationally intensive and long life-time queries with data with short life-times. • The design/implementation of a frequency-based query will be the primary focus (interesting application of signal processing).
Details • Input: Live (or simulated) Twitter stream data • Java program used to simulate twitter feed by reading from a dataset • Processing: • Extract tweets from input stream • Filter tweets based on query parameters • Text Matching • Determine tweet frequency components • Frequency Analysis • Apply signal filter (signal processing) • Output: Tweets matching filter
Design Issues • Ability to acquire data from twitter at a useful speed • Determining packet usefulness (send/drop) in efficient manner • Managing concurrently arriving packets and multi-fragment packets • How to calculate frequency and filter corresponding packets
Implementation Issues • How to properly buffer and send fragmented tweets • Time/clock cycles needed to perform frequency calculations • Time to perform Hashing • Created a lookup table based hashing block • Modules consuming data at different rates • Debugging HW
Algorithms • Hashing • String Matching • Frequency Analysis • Filtering (FIR)
Project Results • Analyzed the problem • Implemented full simulator in software • Implemented in VHDL • Simulated in ModelSim • Tested on hardware, confirmed results against software implementation • Dataset: JSON_29493.txt • Processed 29493 tweets • 192 passed string filter • 133 passed frequency filter
References Berinde, Indyk, Cormode, Strauss. "Space-optimal Heavy Hitters with Strong Error Bounds" Cormode, Korn, Tirthapura. "Time-Decaying Aggregates in Out-of-order Streams" Charikar, Chen, Farach-Colton. "Finding Frequent Items in Data Streams“