1 / 35

StreamScope, S-Store

StreamScope is a streaming computation engine designed to handle continuous, reliable, and distributed processing of big data streams. It offers abstractions for creating, debugging, and understanding data computation engines, while providing strong guarantees in the face of failures and variations. Key contributions include the introduction of rVertex and rStream abstractions, which simplify programming and improve scalability. StreamScope has been deployed in production at Microsoft, running business-critical applications and coping with large amounts of load. It offers strategies for failure recovery, such as checkpoint-based, replay-based, and replication-based recovery. The system has been evaluated for fraud detection, demonstrating its scalability and handling of high volumes of data. While no comparisons with other streaming systems are provided in the paper, StreamScope offers a novel approach to stream processing. The paper does not mention future plans for making the system open source or available as a platform-as-a-service. It is important to note that deterministic applications are a significant restriction in the system.

jmilagros
Download Presentation

StreamScope, S-Store

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StreamScope, S-Store Akshun Gupta, Karthik Bala

  2. What is Stream Processing? “Stream processing is designed to analyze and act on real-time streaming data, using “continuous queries” • Infoq - Stream Processing Difference between Batch Processing: “Ability to process potentially infinite input events continuously with delays in seconds and minutes, rather than processing a static data set in hours and days. “ • StreamS paper

  3. Applications of Stream Processing • Twitter uses stream processing to show trending tweets • Algorithmic Trading or High Frequency Trading • Surveillance using sensors • Realtime Analytics • And many more!

  4. Stream Processing: Challenges • Continuous infinite amounts of data • Need to deal with failures and planned maintenance • Latency sensitive • Need for high throughput All of this makes stream applications hard to develop, debug, and deploy!

  5. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams Microsoft Research Presented by Akshun Gupta

  6. StreamScope - General Information • Paper came out of Microsoft Research • Has been deployed in a shared 20k server production cluster at Microsoft • Runs Microsoft’s core online advertisement service - created to handle business critical applications - supposed to give strong guarantees

  7. Motivation Want to design a streaming computation engine to • Execute an event exactly once with server failures and message loss. • Handle large amounts of load • Scale well • Travel back in time • Continue operation during maintenance • Make distributed streaming programming easy

  8. Key Contributions • StreamS shows a streaming computation engine does not need to unnaturally convert streaming computation to a series of mini-batch jobs. Eg, Apache Spark • Introduction of abstractions, rVertex and rStream, to simplify creating, debugging, and understanding data computation engines. • Proven system - deployed in production running business critical applications while coping with failures and variations.

  9. StreamS Abstractions - DAG • Execution of program modeled as a DAG • Vertex performs local computation • Instreams and OutStreams

  10. StreamS Abstractions - rStream • Abstraction to decouple upstream and downstream vertices with failure recovery mechanisms. • Maintains sequence of events and sequence numbers. • Provides API calls Write, Read, GarbageCollect • Maintains the following properties: • Uniqueness: Unique value for each sequence number • Validity: If a Read happens for seq, a Write for seq is guaranteed to have happened • Reliability: For any Write(seq, e), Read(seq) will return e

  11. StreamS Abstractions - rVertex • Vertex can save state with snapshots • If Vertex fails, it can be restarted with Load(s). s is a saved snapshot. • rVertex guarantees determinism • Running Execute() on the same snapshot will produce the same result • Determinism ensures correctness. • Requires user defined functions to behave deterministically

  12. Architecture

  13. Failure Recovery Strategies • Checkpoint-based recovery • Not performant when vertices hold large internal state • Replay-based recovery • Rebuilding state using the most recent window like 5 minutes • Deterministic execution property comes in handy • Might have to reload large window but don’t have to checkpoint as frequently • Replication-based recovery • Multiple instances of the same vertex can be run at the same time • Determinism will ensure output of different machines but of the same vertex to be the same • Overhead of extra resources

  14. Evaluation • Detect fraud clicks of online transactions • 3220 Vertices • 9.5 TB of events processed • 180 TB I/O • 21.3 TB aggregate memory usage • 7 day evaluation period

  15. Evaluation - Failure Impact on Latency* A: Failed machines had high in-memory state → Latency increased for small number of failures B: Large number of failures but vertices did not have high in-memory state C: Unscheduled mass outage of machines → significant increase in latency D: scheduled maintenance → graceful transition and no significant increase in latency *End-to-end latency

  16. Evaluation - Scalability X Axis: Degree of Parallelism Y Axis: Maximum throughput sustained under a 1-second latency bound.

  17. Comparing Failure Recovery Strategies • No effect on latency when using Replication strategy • Longer latency delay for Replay because state in checkpoint is more condensed (common case) • Company uses 25% replay based but others uses checkpointing

  18. Comments • Paper does not compare their streaming system with other streaming systems like Spark, Storm, etc. • No outlook given on whether this system will be provided as PaaS or their plan on making it open source. • Restriction on deterministic applications significant

  19. Key Takeaways • Introduction of abstractions rStream and rVertex • A new way to design streaming systems • Decoupling upstream and downstream vertices • Valuable engineering advice • Good comparison between failover strategies • Checkpointing • Replay Based • Replication Based • Proven system under production load • Business critical application • 20k+ nodes used • Scaling is robust

  20. S-Store Presented by Karthik Bala

  21. Streaming Meets Transaction Processing • Streaming: handle large amounts of data, but... • Transaction Processing: ACID guarantees, but... Challenge: Build a streaming system which provides shared mutable state

  22. Guarantees Transactions are stored procedures with input parameters -”Recall that it is the data that is sent to the query in streaming systems in contrast to the standard DBMS model of sending the query to the data” OLTP Transaction - can access public tables, “pull based” Streaming transaction - can access public tables, windows, streams, “push based”

  23. Contributions • Start with traditional OLTP database system (H-Store) and add streaming transactions • streams and windows represented as time-varying state • triggers to enable push-based processing over such state • a streaming scheduler that ensures correct transaction ordering • a variant on H-Store’s recovery scheme that ensures exactly-once processing for streams

  24. Transaction Execution s: stream b: atomic batch w: window (difference?) T: transaction

  25. Transaction Execution • ACID: Wait till T commits to makeits writes public • Valid orderings? • For an ordering to be correct • Must follow the topological orderingof the dataflow graph (relaxed if graph has multiple orderings) • All batches must be processed inorder

  26. Hybrid Schedules, Nested Transactions • Any OLTP transaction can interleave between any pair of streaming transactions (in a valid TE schedule) • Nested transactions : two or more transactions which execute like a block • No transaction can interleave between nested transactions

  27. H- Store Architecture • Commit Log, Checkpointing • Layers

  28. S-Store Extensions • Streams: time varying H-Store tables • Persistent, recoverable • Triggers • Attached to tables, activate when tuples added • PE/EE triggers • Window Tables

  29. Fault Tolerance • Goal: Exactly once processing • Even if a failure happens, state must be as if transaction T occurred exactly once! • Weak recovery: correct but nondeterministic results

  30. Recovery • Strong Recovery • Use H-Store’s commit log from latest snapshot + disable PE triggers (why?) • Weak Recovery • Apply Snapshot • Start at the inputs of dataflow graph (cached) • Leave PE triggers as is! Need interior transactions that were not logged to be re-executed • Finally, replay the log

  31. Performance and Evaluation

  32. Performance and Evaluation (2)

  33. Performance and Evaluation (3)

  34. Key Takeaways • Ordering • Push-based processing (triggers!) • Weak vs. strong recovery • ACID guarantees

  35. Discussion • S-Store: >1 node?! • S-Store evaluation methods okay? • Implementation of different failure strategies for each vertex not given in the paper. • No details on how the optimizer works - how does it know the cost of running the application before deploying? • Job Manager fault tolerance not talked about in the paper. If not replicated, it is a single point of failure • Lack of custom DAG creation - probably because they have optimized for their own workload and applications

More Related