1 / 123

StreamCloud: an Elastic Parallel-Distributed Stream Processing Engine

Lsd. Distributed Systems Laboratory. StreamCloud: an Elastic Parallel-Distributed Stream Processing Engine. Ph.D. Student Vincenzo Massimiliano Gulisano Director: Ricardo Jiménez Peris Co-director: Patrick Valduriez December 20, 2012. StreamCloud in a Nutshell.

cloris
Download Presentation

StreamCloud: an Elastic Parallel-Distributed Stream Processing Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lsd Distributed Systems Laboratory StreamCloud:an Elastic Parallel-Distributed Stream Processing Engine Ph.D. Student Vincenzo Massimiliano Gulisano Director: Ricardo Jiménez Peris Co-director: Patrick Valduriez December 20, 2012

  2. StreamCloud in a Nutshell • Sample data streaming application(fraud detection) • Pioneer Stream Processing Engines (SPEs) • Contributions

  3. Motivation - Fraud detection

  4. Motivation - Fraud detection

  5. Motivation - Fraud detection

  6. Motivation - Fraud detection

  7. Background - Pioneer SPEs Centralized SPE

  8. Background - Pioneer SPEs Centralized SPE 100% CPU

  9. Background - Pioneer SPEs Distributed SPE

  10. Background - Pioneer SPEs Distributed SPE

  11. Background - Pioneer SPEs Distributed SPE

  12. Background - Pioneer SPEs Distributed SPE 100% CPU

  13. Contributions - StreamCloud … …

  14. Contributions - StreamCloud … …

  15. Contributions - StreamCloud 1 Parallelization … …

  16. Contributions - StreamCloud … …

  17. Contributions - StreamCloud + … … +

  18. Contributions - StreamCloud - … … -

  19. Contributions - StreamCloud 2 Elasticity … …

  20. Contributions - StreamCloud

  21. Contributions - StreamCloud

  22. Contributions - StreamCloud

  23. Contributions - StreamCloud 3 Fault Tolerance … …

  24. Contributions - StreamCloud … …

  25. Contributions - StreamCloud … …

  26. Contributions - StreamCloud 4 Integrated DevelopmentEnvironment … …

  27. Agenda • Introduction • Motivation • System Model • Parallelization • Elasticity • Fault tolerance • Integrated Development Environment • Conclusions

  28. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions Motivation • Financial applications, sensor networks monitoring, … require • Continuous processing of data streams • Real Time fashion • Store and process is not feasible • high-speed networks, nanoseconds to handle a packet • ISP router: gigabytes of headers every hour,… • Data Streaming: • In memory • Bounded resources • Efficient one-pass analysis

  29. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Data Stream: unbounded sequence of tuples • Example: Call Description Record (CDR) time

  30. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model OP • Operators: OP • Stateless • 1 input tuple1 output tuple • Stateful • 1+ input tuple(s) • 1 output tuple

  31. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model OP Agg • Operators: • Continuous Query: graph operators/streams OP • Stateless • 1 input tuple1 output tuple • Stateful • 1+ input tuple(s) • 1 output tuple Map Filter Convert € $ Only > 10$ Count callsmade by eachCaller number

  32. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows time [8:00,9:00) [8:20,9:20) [8:40,9:40)

  33. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 1 time [8:00,9:00) 8:05

  34. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 2 time [8:00,9:00) 8:15 8:05

  35. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 3 time [8:00,9:00) 8:15 8:22 8:05

  36. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 4 time [8:00,9:00) 8:15 8:22 8:45 8:05

  37. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 4 time [8:00,9:00) 8:15 8:22 8:45 8:05 9:05 Output: 4

  38. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions System Model • Infinite sequence of tuples / bounded memory  windows • Example: 1 hour windows Counter: 3 time 8:15 8:22 8:45 8:05 9:05 [8:20,9:20)

  39. Agenda • Introduction • Motivation • System Model • Parallelization • Elasticity • Fault tolerance • Integrated Development Environment • Conclusions

  40. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • Building blocks: • Parallelization of data streaming operators • Parallelization and Distribution strategy

  41. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • General approach OPA OPB

  42. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • General approach LB: Load BalancerIM: Input Merger OPA OPB OPA OPA IM IM LB LB Node m Node 1 …

  43. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • General approach LB: Load BalancerIM: Input Merger Subcluster A OPA OPB OPA OPA IM IM LB LB Node m Node 1 …

  44. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • General approach LB: Load BalancerIM: Input Merger Subcluster A Subcluster B OPA OPB OPB OPB OPA OPA IM IM IM IM LB LB LB LB Node m Node n Node 1 Node 1 … …

  45. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • General approach LB: Load BalancerIM: Input Merger Subcluster A Subcluster B OPA OPB OPA OPA OPB OPB … … … … IM IM IM IM LB LB LB LB Node 1 Node 1 … … Node m Node n

  46. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization Agg1 Agg2 Agg3 • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by caller number Caller A … IM IM IM LB LB … … … … Previous Subcluster

  47. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization Agg1 Agg2 Agg3 • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by caller number Caller A … IM IM IM LB LB … … … … Previous Subcluster

  48. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • Depending on the stateful operator semantic: • Partition input stream into buckets • Each bucket is processed by 1 node • # buckets >> # nodes

  49. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • Depending on the stateful operator semantic: • Partition input stream into buckets • Each bucket is processed by 1 node • # buckets >> # nodes B A Agg1 Agg2 Agg3 D C F E Keys domain

  50. Introduction Parallelization Elasticity Fault Tolerance IDE Conclusions StreamCloud - Parallelization • Depending on the stateful operator semantic: • Partition input stream into buckets • Each bucket is processed by 1 node • # buckets >> # nodes B A Agg1 Agg2 Agg3 D C F E Keys domain

More Related