1 / 48

Cost Aware Resource Management for Decentralized Network Services

Cost Aware Resource Management for Decentralized Network Services. Venugopalan Ramasubramanian (Rama) Microsoft Research Silicon Valley / Cornell University. Introduction. decentralized services have become increasingly important e.g. name systems, CDNs, publish-subscribe

harry
Download Presentation

Cost Aware Resource Management for Decentralized Network Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost Aware Resource Management for Decentralized Network Services Venugopalan Ramasubramanian (Rama) Microsoft Research Silicon Valley / Cornell University

  2. Introduction • decentralized services have become increasingly important • e.g. name systems, CDNs, publish-subscribe • low latency, constant availability, and high scalability • current services often fall short of required performance • ad hoc techniques

  3. Problems with Ad hoc Techniques • no performance guarantees • unable to quantify/bound performance • unable to tune resource utilization to meet performance targets • tailored to specific workloads • e.g. opportunistic caching: on “90/10” rule • heavy-tailed popularity distributions • mutable objects

  4. Principled Approach • fundamental cost-performance tradeoff • e.g. lookup latency vs. memory / bandwidth consumption • resource allocation problem • which node hosts which object? • depends on popularity, size, update rate, etc.

  5. Prior Work • Scalability • high complexity even to express the problem • number of objects x number of nodes (M x N) • Decentralization • objects are distributed among multiple nodes • expensive to perform resource allocation centrally

  6. Cost-Aware Resource Management Framework • high performance, robust, and scalable services • Mathematical Optimization • system-wide performance goals become constraints to optimization problems: Min. cost s.t. performance meets target Max. performance s.t. cost ≤ limit • Structured Overlays • decentralization and self-organization • well-defined topology with bounded diameter and node degree

  7. Decentralized Internet Services • name service for the Internet • Cooperative Domain Name System (CoDoNS) • content distribution network • Cooperative Beehive Web (CoBWeb) • on-line data monitoring • Cornell On-line News Aggregator (CorONA)

  8. Scalable Resource Allocation • structured overlay • each object has a home node • DAG rooted at home node reaching all nodes • uniform branching-factor • allocate resources at well-defined levels • level ℓ means all nodes ℓ hops away from home node • low complexity resource allocation • Number of objects x Diameter (e.g. M x log N) • practical and scalable

  9. object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Structured Overlays: Pastry prefix-matching logbN hops 2012

  10. object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Opportunistic Caching in Pastry 2012

  11. Structured Resource Allocation • analytically model performance-overhead tradeoff • object replicated at all nodes with ℓ matching prefix-digits lookup latency:ℓ hops replicas: N/bℓ • inexpensive to locate and update replicas 0021 0112 0122 2012

  12. Outline • Introduction • Honeycomb Framework • Optimization Analysis • Implementation • Applications • Evaluation • Conclusions

  13. Analytical Modeling • level of allocation (ℓ) • object hosted at all nodes ℓ hops from the home node • optimization problem: find optimal values of ℓi • min.  Ci(ℓi), s.t.  Pi(ℓi)  T • max.  Pi(ℓi), s.t.  Ci(ℓi)  T • performance variables • lookup latency, update latency • cost variables • memory consumption, network overhead, number of nodes

  14. Optimization Problem: Lookup Latency min. ci.bℓis.t., qi (D - ℓi) TL total overhead avg. lookup latency TL: target lookup latency in hops qi: relative query frequency ci: replication cost of object i objects M, nodes N, branching factor b, diameter D

  15. Resource Allocation for Lookup Performance • target avg. lookup latency hops • sub-one hop, fractional values (e.g., 0.5 hops) • indirectly specifies cache hit ratio • worst case lookup latency • lower bound on ℓ • optimizes multiple overhead metrics • number of nodes: c = 1 • memory: c = size of object • bandwidth: c = size x update rate

  16. 1 [ ] 1 -  b’ℓ (D – C) 1 + b’ + … + b’D-1 where b’ = b(1- ) / x*ℓ = Analytical Optimization (Beehive) • Zipf popularity distribution (e.g. DNS, Web, RSS) • analytically tractable (one parameter ) • closed-form solution • inexpensive to compute and apply [Ramasubramanian and Sirer NSDI 04]

  17. Numerical Optimization • general-purpose approach • any popularity distribution (including Zipf) • many cost metrics (fine-grained bandwidth consumption) • many performance metrics (update latency) • optimization problem is NP-Hard • Multiple choice Knapsack problem • discrete, convex, and separable • fast and accurate approximation algorithm • O(M D log(M D)) running time • at most one object per node (more or less than optimum)

  18. Numerical Optimization 2 • Lagrange multiplier min.  C(ℓm) + λ [ P(ℓm) – T] • bisection-based bracketing algorithm • upper and lower bound solutions that differ in one channel yields near-optimal solution • pre-computation and sorting of λs before iterating yields O(MD log (MD)) algorithm

  19. Honeycomb • cost-aware resource allocation framework for structured overlays • properties: • system-wide performance goals • scalability and failure resilience • quick adaptation to workload • fast update propagation

  20. independent decisions local aggregation estimate popularity communication only with overlay neighbors replicas managed by one-hop neighbors Scalable Resource Management

  21. independent decisions local aggregation estimate popularity communication only with overlay neighbors replicas managed by one-hop neighbors Scalable Resource Management

  22. Decentralized Optimization • global optimum requires global information • Using local knowledge alone leads to sub-optimal solutions • solution: • approximate tradeoffs for non-local channels • aggregate coarse-grained information between neighbors

  23. Decentralized Optimization 2 • approximate parameters • cluster channels with similar values of P(ℓ) / C(ℓ) • constant number of clusters per level

  24. Decentralized Optimization 3 • Aggregating Clusters • Exchange clusters with one-hop neighbors • Hierarchical aggregation through structured overlay

  25. Adaptation to Workload Changes • popularity of objects may change drastically • flash-crowds, denial of service attacks • nodes measure popularity for local objects and aggregate popularity estimates with neighbors

  26. Adaptation to Workload Changes 2 • orders of magnitude difference in query rates of popular and unpopular objects • solution: combine inter-arrival times and query counts • estimation times proportional to the query rate of the object • monitoring overhead proportional to the query rate of the object • quick detection of large increases in query rate

  27. Honeycomb: Fast Update Propagation • single integer (replication level) indicates locations of all objects • no TTL required • proactively propagate updates • use neighbors in the underlying overlay • increasing version numbers differentiate versions • lazy updates in background

  28. Outline • Introduction • Honeycomb Framework • Applications • Name service (CoDoNS) • Content distribution network (CoBWeb) • On-line data monitoring system (CorONA) • Evaluation • Conclusions

  29. CoDoNS: Cooperative Domain Name System • legacy DNS has fundamental problems • poor failure resilience due to limited replication • high response times due to multi-hop lookups • no support for spontaneous updates • cooperative cache for DNS bindings LegacyDNS [Ramasubramanian and Sirer SIGCOMM 04]

  30. CoDoNS: Cooperative Domain Name System • structured, proactive caching of name-data mappings • targets avg. lookup latency of (0.5 hops) • minimizes memory consumption • updates pushed proactively to all caching nodes • self-certifying data to preserve integrity (DNS-SEC) • incremental deployment path • safety-net for legacy DNS • deployed on Planet-Lab

  31. CobWeb: Cooperative Beehive Web • Web caches • passive, client driven • Content Distribution Networks • active, replication driven • e.g. Akamai, Digital Island (commercial), CoDeeN, CoralCDN (academia) • web caching solutions based on heuristics • ideal cache hit rate (60-70%) [Wolman et al. 01] • achieved cache hit rate (20%-40%) [Breslao et al. 99, Wolman et al. 01]

  32. CobWeb: Cooperative Beehive Web • CobWeb is a cooperative web cache • high cache hit rate through structured, proactive caching • low network overhead using object size and update rate • adaptation to flash crowds • CobWeb performance goals • min. network bandwidth s.t. cache hit rate meets a target • max. cache hit rate s.t. network bandwidth is all consumed

  33. CobWeb: Cooperative Beehive Web • user interfaces • append cob-web.org to urls • e.g., http://slashdot.org.cob-web.org:8888 • DNS redirection, URL rewriting • Meridian finds closest node to the client • deployed on Planet-Lab • greater than10 million requests per day

  34. Corona: Monitoring Online Data • continuously monitoring and detecting changes is crucial • e.g., web pages, sensors, databases • content servers only provide query-based interface • naïve approach through repeated, independent polling • bad update performance • high server load

  35. Corona: Monitoring Online Data • publish-subscribe interface for monitoring web urls • cooperative polling • resource allocation decides how many nodes poll each channel [Ramasubramanian, Peterson, and Sirer NSDI 06]

  36. Corona: Performance Goals • Corona Lite: • Min. update detection time s.t. network load is bounded • Corona Fast: • Min. network load s.t. update detection time meets a target • Corona Fair: • Min. relative update detection time s.t. network load is bounded • ratio of update detection time to update interval

  37. Outline • Introduction • Honeycomb Framework • Applications • Evaluation • Conclusions

  38. CoDoNS: Lookup Latency MIT-DNS trace: 265111 queries, 30000 names, 65 nodes CoDoNS gives 1.5 to 2 times better latency

  39. CoBWeb: Lookup Performance NLANR Workload: 1024 nodes, 10,000 objects, 100, 000 queries

  40. CoBWeb vs. Opportunistic Caching Lookup Latency

  41. CoBWeb vs. Opportunistic Caching Storage Overhead

  42. CoBWeb: Flash Crowd Lookup Latency

  43. CoBWeb: Flash Crowd Network Bandwidth

  44. Corona: Update Performance Corona improves update detection time from 15 min to 45 sec Corona keeps load lower than Legacy RSS

  45. Corona: Update Performance Heuristics vs. Corona

  46. Conclusions • enables high performance, robust, and scalable network services • principled approach for achieving performance goals in distributed systems • mathematical optimization and structured overlays • CoDoNS, CobWeb, and Corona

  47. Other Research in Wireless Networks • Sharp hybrid adaptive routing prorocol for mobile ad hoc networks [Mobihoc 03] • combines proactive and reactive approaches to routing to achieve high performance efficiently • SRL: bidirectional abstraction to support routing protocols on asymmetric mobile ad hoc networks [INFOCOM 02] • Anonymous Gossip: improving multicast reliability on mobile ad hoc networks [ICDCS 01]

More Related