1 / 1


Highly Scalable Platform for Data Streaming. Vincenzo Massimiliano Gulisano Prof: Ricardo Jimenez-Peris {vgulisano, rjimenez }@fi.upm.es. Background & Challenges. Data Streaming : Paradigm for querying in real-time continuous massive flows of data. Queries are

Download Presentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Highly Scalable Platform for Data Streaming Vincenzo Massimiliano GulisanoProf: RicardoJimenez-Peris{vgulisano, rjimenez}@fi.upm.es Background&Challenges Data Streaming: Paradigm for querying in real-time continuous massive flows of data. Queries are composed by Operators (filters, maps, aggregates, joins) on continuous Flows. Queries are used to manipulate incoming data and extract desired information. Possible applications are Sensor Networks, Network traffic Analysis and Real-Time Financial Applications. SOTA: Scales on number of queries but not on data stream volume. Each operator is allocated to a single node; system capacity is then limited to the volume it can process. Challenge: Scale to massive data streams by allocating individual operators to computing clouds. Research Operator 1 Operator 2 How To: - “Cloudify” operators. - Interconnect scalably / consistently. cloudified operators. - Provide elastic computing for massive data streams. Data Operator 3 Outcome Data Stream Cloud Platform Highly Scalable: Able to use 10s of sites for each individual operator. Elastic: Self Provisioning & Decommissioning nodes as needed depending on the workload. Self-Optimizing: Non-intrusive dynamic load balancing – dynamic optimization of full queries cloudification. Self-Healing: Tolerating failures without losing data. Current Status • Cloudification of: - Stateless Operators (Map, Filter, Union) - Stateful Operators (Join, Equi-Join, Aggregate) • Distributed Load Injector. • Distributed Load Monitor. • First prototype for Self-Provisioning. • Compiler from Abstract Queries to Cloudified Concrete Deployments. • Preliminary evaluation showing close to linear scalability. Conclusions & Next Steps • Preliminary evaluation demonstrates our vision is possible. • Our approach enables to scale 2 orders of magnitude beyond current SOTA. • Next steps will focus on full implementation of elasticity and other essential autonomic capabilities. Lsd Distributed Systems Laboratory School of Computer Science UNIVERSIDAD POLITECNICA DE MADRID Spain

More Related