320 likes | 324 Views
Network Coding for Large Scale Content Distribution. Pablo Rodriguez Microsoft Research. Christos Gkantsidis Georgia Institute of Technology. IEEE INFOCOM 2005. Presented by Ryan. Outline. Introduction Related Works Model for Cooperative Content Distribution Performance Evaluation
E N D
Network Coding for Large Scale Content Distribution Pablo Rodriguez Microsoft Research Christos Gkantsidis Georgia Institute of Technology IEEE INFOCOM 2005 Presented by Ryan
Outline • Introduction • Related Works • Model for Cooperative Content Distribution • Performance Evaluation • Conclusion and Future Works
Introduction • Large Scale Content Distribution • Typical content distribution solutions • CDN – Content Delivery Network • Placing dedicated equipment around the network • e.g. Akamai • Cooperative content distribution solutions • Self-scalable • Preventing sudden surge of traffic to the source • e.g. BitTorrent
Introduction • Network Coding • Allowing intermediate nodes to encode packets • Making optimal use of the available network resources
Introduction • An example • Without a global coordinated scheduler • Node B, receiving Packet 1 or 2 from Node A?
Introduction • Contributions in the Paper • Proposing a practical system based on network coding • Not require the knowledge of the underlying topology and centralized scheduling • Robust to extreme situations with sudden server and nodes departures • Better performance comparing to source coding and no encoding schemes
Related Works • Tree-Based Cooperative Systems • Creating and maintaining shortest-path multicast trees • Bandwidth-limited (by the bottleneck link on the path from the server) • e.g. SplitStream
Related Works • Mesh Cooperative Architectures • Improving the download rates by using parallel downloads • Under-utilizing the network resources (the same block traveling over multiple competing paths) • e.g. BitTorrent
Related Works • Erasure Codes • Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols • Network Coding • Based on theoretical calculations (with the detailed knowledge of the topology and a centralized scheduler)
The Model • Server • Dividing the file into k blocks • Uploading blocks at random to different clients • Clients (Users) • Collaborating with each other to assemble the blocks and reconstruct the original file • Exchanging information and data with only a small subset of others (neighbors) • Symmetric neighborhood and links
The Model • Upon arrival • Contacting a centralized server (like the tracker in BitTorrent) to get a random list of users in the system • Connecting to the returned users to construct the neighborhood
The Model • Content Propagation • 1) No Coding • 2) Source Coding • 3) Network Coding
The Model • No Coding and Source Coding • Based only on local information for deciding which block to transfer • Random • A random block • Local Rarest • The rarest block in the neighborhood
The Model • e.g. BitTorrent system • A combination of the Random and Local Rarest schemes • Random for the first few blocks • Local Rarest afterwards
The Model • Network Coding • The node generates and sends a linear combination of all the information available to it
The Model • Recovering the original file after receiving k blocks (associated coefficient vectors are linearly independent to each other) • Just solving the system of linear equations
The Model • Incentive Mechanisms • Discouraging free-riding • Scheme 1 • Preference to mutual exchanges • Scheme 2 (Tit-for-tat) • Bounding the absolute difference of uploading minus downloading from one to another
Performance Evaluation • Round based simulator • Input • Overlay topology • Users’ upload and download capacities • Server’s capacity • Capacity: number of blocks that can be downloaded/uploaded in a single round • Size of file to distribute • Metric • Download finish time
Performance Evaluation • Connecting to 4 peers when joining • Max number of neighbors = 6 • Discovering new neighbors when the utilization of the download capacity is below a certain threshold (10%)
Performance Evaluation • Homogeneous topologies • 200 users with capacity = 1 • Server’s capacity = 1 • File size = 100 blocks No Coding Source Coding Network Coding
Performance Evaluation • Topologies with clusters • Two clusters, 100 users each • Capacity • Within cluster = 8 • Cluster to cluster = 4 • Server • Capacity = 4 • Departing at round 30 • File size = 100 blocks
Performance Evaluation No Coding Source Coding Network Coding
Performance Evaluation • Heterogeneous capacities • 10 fast users with capacity = 4 • 190 slow users with capacity = 1 • Server’s capacity = 4 • File size = 400 blocks No Coding Source Coding Network Coding
Performance Evaluation • Minimum finish time for the fast users = 50 rounds
Performance Evaluation • Dynamic Arrivals • 40 empty nodes every 20 rounds • Capacity = 1 • Staying in the system 10 more rounds after finishing • Server’s capacity = 1 • File size = 100 blocks
Performance Evaluation • Robustness to node departures
Performance Evaluation • Leaving after serving 5% extra blocks • Network coding : 100% finish • Source coding : 40% finish • No coding : 10% finish Network Coding Source Coding No Coding
Performance Evaluation • Incentive mechanisms • Max difference = 2 (tit-for-tat)
Conclusion • A new content distribution system • Not require knowledge of the whole network topology • Easy to schedule content propagation • Good performance in simulations • Download finish time • Robust to server and users departures • Avalanche – a real system implementation using network coding
Future Works • Speed of encoding and decoding • Encoding : O(k) • Decoding : inverting a matrix O(k3), reconstructing the file O(k2) • Dominated by reconstruction • Many reads of large blocks from the harddisk • Protection against malicious nodes • Introducing arbitrary blocks • Making the reconstruction of the original file impossible