1 / 24

Broadcast Federation

An architecture for scalable inter-domain multicast/broadcast. Broadcast Federation. Mukund Seshadri mukunds@cs.berkeley.edu. with Yatin Chawathe Yatin@research.att.com. http://www.cs.berkeley.edu/~mukunds/bfed/. Spring 2002. Motivation. One-to-many or many-to-many applications

esben
Download Presentation

Broadcast Federation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An architecture for scalable inter-domain multicast/broadcast. Broadcast Federation Mukund Seshadri mukunds@cs.berkeley.edu with Yatin Chawathe Yatin@research.att.com http://www.cs.berkeley.edu/~mukunds/bfed/ Spring 2002

  2. Motivation • One-to-many or many-to-many applications • e.g. Internet live audio/video broadcast • No universally deployed multicast protocol. • IP Multicast • Limited Scalability (due to router state or flooding nature) • Address scarcity • Need for administrative boundaries. • SSM - better semantics and business model, still requires smart network. • Overlays - Application-level “routers” form an overlay network and perform multicast forwarding. • Less efficient, but easier deployability • Maybe used in CDNs (Content Distribution Networks) for pushing data to edge • Heavy duty edge servers replicate content

  3. Goals • Design an architecture for the composition of different, non-interoperable multicast/broadcast domains to provide an end-to-end one-to-many packet delivery service. • Design and implement a high performance (clustered) broadcast gateway for the above architecture

  4. Requirements • Intra-domain protocol independence (both app-layer and IP-layer) • Should be easily customizable for each specific multicast protocol. • Scalable (throughput, number of sessions) • Should not distribute info about sessions to entities not interested in those sessions. • Should use available multicast capability wherever possible.

  5. Basic Design Source • Broadcast Network (BN) – any multicast capable network/domain/CDN) • Broadcast Gateway (BG) • Bridges between 2 BNs • Explicit BG peering • Overlay of BGs • Analogous to BGP routers. • App-level • For both app-layer and IP-layer protocols • Less efficient link usage, and more delay • Commodity hardware • Easier customizability and deployability • Inefficient hardware Clients BG BN Peering Data

  6. Naming • Session  Owner BN • Facilitates shared tree protocols • Address space limited only by individual BNs’ naming protocols. • Session Description • Owner BN • Session name in owner BN • Options • Metrics – hop-count, latency,bandwidth, etc. • Transport – best-effort, reliable, etc. • Number of sources – single, multiple. • URL style- bin://Owner_BN/native_session_name?pmtr=value&pmtr2=value2…

  7. B-Gateway components 3 loosely coupled components – • Routing – for “shortest” unicast routes towards sources • Tree building – for “shortest” path distribution tree. • Data forwarding – to send data efficiently across tree edges. • NativeCast interface – interacts with local broadcast capability

  8. Routing • Peer BGs exchange BN/BG level reachability info • Path – vector algorithm • Different routes for different metrics/options • e.g. BN-hop-count + best-effort+multi-source, latency+reliable, etc. • Session-agnostic • Avoids all BNs knowing about all sessions. • BG-level selectivity available using SROUTEs. • Policy hooks can be applied to such a protocol.

  9. One reverse shortest path-tree per session Tree rooted at owner BN. “Soft” BG tree state:(Session:Upstream node: list of downstream nodes) Can be bi-directional Fine-grained selectivity using SROUTE messages before JOIN phase. Source Source Source B1 B1 (S1:N:B2) (S1:N:B2) B1 (S1:N:B2) B2 B2 B2 (S1:B1:N,B3) (S1:B1:N) C2 JOINs C3 JOINs (S1:B1:N,B3) C1 C1 B3 C1 B3 B3 (S1:B2:B4) (S1:B2:B4,B5) C1 JOINs B5 B4 B4 (S1:B3:N) (S1:B3:N) (S1:B3:N) C2 C3 Peering Link C2 JOIN BG and its tree state N NativeCast (Session:Parent:Child1,Child2,..) Client/Mediator Distribution Trees

  10. Mediator How does a client tell a BG that it wants to join a session? • Client in owner BN  no interaction with the federation. • Client not in owner BN  needs to send JOIN to a BG in its BN. • BNs are required to implement the Mediator abstraction, for sending JOINs for sessions to BGs. • Modified clients which send JOIN to BGs • Well-known Mediator IP Multicast group. • Routers or other BN-specific aggregators • Can be part of the NativeCast interface.

  11. JOIN TRANSLATION Data Forwarding Source (S1:L:P2) P1 • Decouples control from data • …control nodes from data nodes. • TRANSLATION messages carry data path addresses per session • e.g. TCP/UDP/IP Multicast address+port. • e.g. a transit SSM network might require 2+ channels to be setup for one session. • Label negotiation, for fast forwarding. • Can be piggy-backed on JOINs UDP:IP2,Port2 UDP:IP1,Port1 (S1:P1:L) P2 IPM:IPm1,Portm1 IPM:Null C1 IPM:Null P3 IPM:IPm1,Portm1

  12. BGx: Broadcast Gateway IPMul or CDN stream Cx Sources Broadcast Gateway Control Node BN1 Data Stream D11 D12 Control Mesgs. BG1 C1 Dxx: Data Node Or Dnode BG2 C2 D21 D22 IPMul or CDN stream BN2 Receivers Clustered BG design • 1 Control node+`n’ Data nodes. • Control node routing + tree-building. • Independent data paths  flow directly through data nodes. • TRANSLATION messages contain IP addresses of data nodes in the cluster. • Throughput bottlenecked only by the IP router/NIC. • “Soft” data-forwarding state at data nodes.

  13. NativeCast • Encapsulates all BN-specific customization • Interface to local broadcast capability • Send and Receive broadcast data • Allocate and reclaim local broadcast addresses • Subscribe to and unsubscribe from local broadcast sessions • Implement “Mediator” functionality – intercept and reply to local JOINs • Get SROUTE values. • Exists on control and data nodes.

  14. Implementation • Linux/C++ event-driven program • Best-effort forwarding. • NativeCast implemented for IP Multicast, a simple HTTP-based CDN and SSM. • Each NativeCast implementation ~ 700 lines of code. • Tested scalability of clustered BG (throughput, sessions) using HTTP-based NativeCast. • Used Millennium cluster.

  15. No. of sources = no. of sinks = no. of Dnodes. (so that sources/sinks don’t become bottleneck). 440Mbps raw TCP throughput. 500MHz PIII’s; 1 Gbps NICs. >50Gbps switch. Sources of two types – rate-limited,and unlimited. BGx: Broadcast Gateway IPMul or CDN stream Cx Sources Broadcast Gateway Control Node BN1 Data Stream D11 D12 Control Mesgs. BG1 C1 Dxx: Data Node Or Dnode BG2 C2 D21 D22 IPMul or CDN stream BN2 Receivers Experimental Setup • Note: IPMul is based on UDP; CDN is based on HTTP (over TCP).

  16. Results • Vary number of data nodes, use one session per data node. • Near-linear throughput scaling. • Gigabit speed achieved. • Better with larger message size. Note: maximum (TCP-based) throughput achievable using different data message (framing) sizes is shown above.

  17. Multiple Sessions • Variation of total throughput when no. of sessions is increased to several sessions per Dnode shown. The sources are rate-unlimited. • High throughput is sustained when no. of sessions is large. With 1 Dnode With 5 Dnodes

  18. Multiple Sessions … • Rate-limited sources (<103Kbps). • 5 Dnodes, 1 KB message size. • No significant reduction in throughput.

  19. Future Work • Achieve large number of sessions+high throughout for large message sizes. • Transport-layer modules (e.g. SRM local recovery). • Wide area deployment?

  20. Links • “Broadcast Federation: An Application Layer Broadcast Internetwork” – Yatin Chawathe, Mukund Seshadri (NOSSDAV’02) http://www.cs.berkeley.edu/~mukunds/bfed/nossdav02.ps.gz • This presentation: http://www.cs.berkeley.edu/~mukunds/bfed/bfed-retreat.ppt

  21. Extra Slides…

  22. SROUTEs… Source • …are session-specific routes to source in the owner BN • All BGs in owner BN know all SROUTEs for owned sessions. • SROUTE-Response gives all SROUTEs. • Downstream BGs can cache this value to reduce SROUTE traffic. • Downstream BG(s) compute best target BG in owner BN and send JOINs towards that BG. • JOINs contain SROUTEs received earlier. • Session info sent only to interested BNs. • Increases initial setup latency Phase 1 Phase 2 Client JOIN BG SROUTE-Request BN SROUTE-Response REDIRECT Peering TRANSLATION

  23. More Results • Varied data message size from 64 bytes to 64 KB. • 1 Dnode • Clearly, higher message sizes are better • Due to forwarding overhead –memcpys, syscalls, etc.

  24. Some More Results • Used 796 MHz PIII’s as Dnodes. • Varied no.of Dnodes, single session per Dnode. • Achieved Gigabit-plus speeds with 4 Dnodes.

More Related