140 likes | 351 Views
Replica Placement for High Availability in Distributed Stream Processing Systems. Vinay Dhareshwar. Overview. Introduction Middleware System Model Designing Replica Placement for High Availability Distributed Placement Protocol Conclusion. Distributed Stream Processing Systems.
E N D
Replica Placement for High Availability in Distributed Stream Processing Systems Vinay Dhareshwar
Overview • Introduction • Middleware • System Model • Designing Replica Placement for High Availability • Distributed Placement Protocol • Conclusion
Distributed Stream Processing Systems • Event based systems deal with large volume and high rate data feeds • Data streams are processed in or near real-time • Application domains include network traffic management, financial trades surveillance, e-commerce applications • Distributed stream processing systems provide low-latency and high throughput processing of data streams
Characteristics of Distributed Stream Processing Systems • Availability • Replication • Strict • Sharing of components • Failure Recovery • Large number of components
Where does replica placement fit in? • Not all primary replicas can be hosted by the same server • Practical constraints in replica placement
System Model • Residual processing capacity rpvi • Residual available bandwidth rbej • Communication latency lej • Component ci • Query Plan • Application Component Graph • Replication Component Graph • Primary/backup replication scheme • Replication degree
Designing Replica Placement For High Availability • Maximizing Application Availability • Respecting Resource Availability • Maximizing Application Performance • Inter-operation communication • Intra-operation communication
Distributed Placement Protocol • Phase 1: Bootstrapping • Phase 2: Propagation • Step 1: Primary Placement Selection • Step 2: Primary Placement Negotiation • Step 3: Primary Placement Evaluation • Step 4: Primary Placement decision • Phase 3: Completion • Failure Handling
Algorithm 1 Placement algorithm. • Input: query plan , replication degree, node vs • Output: application component graph, replication component graph • for each node vi in path • perform transient resource allocation at vi • identify candidate nodes already used for placement • select candidate nodes meeting bandwidth requirements • sort candidate nodes by latency • for each primary replica of downstream component • send placement request or placement negotiation • receive placement reply • send placement decision • for each backup replica of current component • send placement decision
Conclusion • Design principles for replica placement • Maximize availability while respecting resource constraints • Making performance aware decisions • Decentralized protocol