Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Network Aware Forward Caching Presenter: Alexandre Gerber Jeffrey Erman, Mohammad T. Hajiaghayi, Dan Pei, Oliver Spatscheck AT&T Labs Research April 24th , 2009
Outline • What’s happening on the Internet? • Traffic characterization • Efficiency of existing delivery mechanisms • Cacheability of HTTP traffic • Let’s revisit forward caching with a new twist! • Network Aware Forward caching
Data available to understand Internet traffic • Application Mix and HTTP requests (2007-2008): • 100,000 broadband subscribers: • US only: California, Texas, Illinois • DSL not Cable • Monitoring at the edge: BRAS (aggregation point) • Application classification based on application header (e.g. MIME type) • Efficiency of delivery mechanisms (October 2008): • Complete Netflow based PoP to PoP Traffic matrix of a US broadband ISP • Air Miles between Points of Presence (PoP)
Aggregate downstream traffic growth per subscriber Stable over the last 8 years in aggregate!20-30% per subscriber per year
Let’s drill down: Application Mix over the last 2 years HTTP is back and it is growing fast!Thanks in part to Multimedia streams over HTTP! Growing 3 times faster than the historical aggregate growth rate
Application Mix: January 200889% of potentially “reusable” content Downstream Upstream Busy Hour Average
Let’s keep drilling down: HTTP & MultimediaIt’s no longer about text and images, HTTP is the workhorse for data delivery (streaming or direct downloads) /http/download accounts for 10.2% traffic of overall traffic while FTP accounts for 0.3% http accounts for 80% of the multimedia content delivery (flash)
Is this content delivered efficiently?CDNs are doing a good job! P2P protocols are not! CDN traffic is traversing 60% fewer air miles than the average traffic P2P flows are traversing more air miles than the average traffic
Are CDNs big enough to have an impact?Yes, because they carry a significant fraction of large files. 46% of very large files are distributed by the 3 large CDNs studied
Are all the Points of Presence equal in the US?No, some are much closer/farther from content
Are all the Autonomous Systems traversing the same distance on an ISP’s backbone?No, some are much closer/farther from content
What have we learned so far? • 89% of the content during the busy hour may be “reusable” (HTTP content, Multimedia streams, P2P / File sharing) • 68% of the content during the busy hour is coming from HTTP and HTTP is growing fast: it is the workhorse for data delivery and includes the strong growth of Multimedia streams • There are big differences in the delivery of content when comparing air miles on an ISP’s backbone: • CDNs are doing a good job but other web content providers and P2P protocols are not • Distance traversed to reach PoPs varies significantly • What if we only cached the HTTP traffic that has to traverse the longest distance?
Let’s first understand if HTTP content is really cacheable: Yes, it is!32% of HTTP bytes served from the cache (24% of total downstream traffic during the busy hour)Cache size is reasonable (TB) Based on 20,000 subscribers in January 2008
Network Aware Forward caching • HTTP forward cache that is aware of the network: • Understand the “cost” of each bit: • Backbone cost: Air Miles traversed on ISP’s network by each bit • Source of traffic: free peering vs. paid transit traffic • Let’s only cache the bits that need it most • Tradeoff between bandwidth cost vs. caching cost • Caching decision made for: • Each PoP: no caches? X caches? • And each “entity” in each PoP: BGP prefix, AS, IP addresses, etc. • Tradeoff between amount of traffic generated vs. storage requirement • Combinatorial, knapsack like, NP-hard problem • Can be solved with dynamic programming in pseudo polynomial time • Or greedy heuristic by sorting “entities” in decreasing order of benefits
Solution sensitive to backbone costs and caching costs Base scenario: $4/Mbps/Month for transit and backbone costs, $20K for caches (400 Mbps, 4TB) Sensitivity to Backbone costs Sensitivity to Caching costs The optimal deployment is to cache 68% of the traffic. This is 37% better than a simple cache all solution!
Drilling down into the optimal solution:25% of the PoPs don’t need it a cacheOnly for 15% of the PoPs does it make sense to cache all the cacheable content
Conclusion & Future Work • Contribution: • Characterization of Internet traffic & its distribution efficiency: • There is a strong growth of HTTP content , especially multimedia streams • Significant differences in distribution efficiencies • Proposed and evaluated Network Aware Forward Caching for HTTP Future work: • Explore other mechanisms to better distribute Internet traffic: network aware P2P, Multicast, Anycast, etc. • Extend study to other environments: wireless IP networks