1 / 34

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton). IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD. Internet access is a scarce commodity in the developing world: expensive / slow

andren
Download Presentation

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

  2. IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD • Internet access is a scarce commodity in the developing world: expensive / slow • Our focus: improving performance of connected network access • Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet) 2 Sunghwan Ihm, Princeton University

  3. POSSIBLE OPTIONS Web proxy caching Whole objects Single endpoint (local) Designated cacheable traffic only WAN acceleration Packet-level caching Mostly for enterprise Two (or more) endpoints, coordinated Effective in first world 3 Sunghwan Ihm, Princeton University

  4. DEVELOPING WORLD QUESTIONS • How effective are these approaches? • Systems designed for first-world use • Most traffic studies small, first-world focused • How similar is developing region traffic? • Any new opportunities to exploit? • Differences in traffic • Differences in cost/tradeoffs • System design issues 4 Sunghwan Ihm, Princeton University

  5. UNDERSTANDING DEVELOPING WORLD TRAFFIC Goal Shape system design by better understanding the traffic optimization opportunities Requirements Large-scale, content-focused analysis 5 Sunghwan Ihm, Princeton University

  6. PRIOR TRAFFIC ANALYSIS WORK • Large scale traffic analysis • Internet Study 2007, 2008/2009 by ipoque • One million users • High-level characteristics via DPI • First-world focus • Developing world traffic analysis • Du et al. WWW’06, Johnson et al. NSDR’10 • Proxy-level analysis from kiosk, Internet cafes, and community centers 6 Sunghwan Ihm, Princeton University

  7. OUR APPROACH • Combine best features • Large-scale and content-focused • First world and developing world • Use traffic from CoDeeN content distribution network (CDN) • Global proxy (500+ PlanetLab nodes) • Running since 2003 • 30+ million requests per day 7 Sunghwan Ihm, Princeton University

  8. WHAT TO ANALYZE? Traffic profile Caching opportunities User behavior 8 Sunghwan Ihm, Princeton University

  9. WAN Browser Cache Local Proxy Cache CoDeeN Cache DATA COLLECTION Origin Web Server User • Assume local proxy caches • Focus on cache misses only • Capture full content 9 9 Sunghwan Ihm, Princeton University

  10. DATA SET • Duration: 1 week (March 25-31, 2010) • # Requests: 157 Million • Volume: 3 TeraBytes • # Clients (unique IPs): 348 K • # Countries/Regions: 190 • /8 networks coverage: 61.3% • /16 networks coverage: 24.1% 10 Sunghwan Ihm, Princeton University

  11. TOP COUNTRIES Requests % Bytes % Clients % SA PL CN Etc. Etc. Etc. PL CN CN DE PL US SA DE AE US RU SA PL (Poland) DE (Germany) CN (China) US (United States) SA (Saudi Arabia) RU (Russian Federation) 11 Etc.(185 Countries) AE (United Arab Emirates)

  12. OECD VS. DEVREG • OECD: the first world • 27 high-income economies from OECD member countries • 25% of total traffic • DevReg: the developing world • The remaining 163 countries and 3 OECD members: Mexico, Poland, and Turkey • 75% of total traffic 12 Sunghwan Ihm, Princeton University

  13. ANALYSIS #1: TRAFFIC PROFILE • Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy) • We often hear a variant of “Offline Wikipedia content suffices for developing world users” 13 Sunghwan Ihm, Princeton University

  14. OBJECT SIZE • Small: median 3KB vs. 5KB • Large: similar demand/profile 16KB 14 Sunghwan Ihm, Princeton University

  15. TEXT AND IMAGES • DevReg has a higher fraction of images • Exact opposite of bandwidth conjecture 15 Sunghwan Ihm, Princeton University

  16. VIDEO AND AUDIO • DevReg: higher fraction of video & audio • Music videos and MP3 songs 16 Sunghwan Ihm, Princeton University

  17. APPLICATION (FLASH) • DevReg has a higher fraction of application traffic • Median near 7% 17 Sunghwan Ihm, Princeton University

  18. ANALYSIS #1 SUMMARY • Some evidence that DevReg-visited sites have smaller objects, but • DevReg users visit large pages as well, and • DevReg users seek a higher fraction of rich content than OECD users 18 Sunghwan Ihm, Princeton University

  19. ANALYSIS #2: CACHING OPPORTUNITY • Conjecture: little gain from larger caches • Some analysis suggests 1GB sufficient • Typical cache size < 20GB • Object-based caching 19 Sunghwan Ihm, Princeton University

  20. A B C D E CONTENT-BASED CHUNK CACHING • Split content into chunks • Name chunks by content (SHA-1 hash) • Cache chunks instead of objects • Fetch content, send only modified chunks • Two endpoints needed • Applies to “uncacheable” content 20 Sunghwan Ihm, Princeton University

  21. OVERALL REDUNDANCY • 40% @ 64 KB: objects or parts of large object • 60% @ 1 KB: parts of text pages • 65% @ 128 bytes: paragraphs or sentences 21 Sunghwan Ihm, Princeton University

  22. CACHE BEHAVIOR SIMULATION • Simulate one week’s traffic • Cache misses only • LRU cache replacement policy • Determine size for near-ideal hit rate • Calculate byte hit ratio (BHR) • Vary storage size (from 10MB to max) • Results for US, China, and Brazil 22 Sunghwan Ihm, Princeton University

  23. US – 213 GB

  24. CHINA – 559 GB

  25. BRAZIL – 44 GB

  26. ANALYSIS #2 SUMMARY • Chunk caching useful • Reduces WAN (cache miss) traffic • Complements existing Web proxies • Larger caches useful • Useful reduction in miss rate • Cheap compared to bandwidth costs 26 Sunghwan Ihm, Princeton University

  27. ANALYSIS #3: USER BEHAVIOR • Conjecture: as first-world Web pages get larger, DevReg users suffer delays • Mechanism: observe aborted transfers • Intentional termination • Automatic when browsing away • Abort = users bored or downloads slow 27 Sunghwan Ihm, Princeton University

  28. CANCELLED OBJECT SIZEC-CDF • Cancelled objects larger than normal (red) • Complete objects (green) much larger than actual download (blue) • Most downloads less than 10MB 28 Sunghwan Ihm, Princeton University

  29. CANCELLED TRANSFER VOLUME • 17% of transfers are terminated early • Due to the early termination, 25% of actual traffic • If fully downloaded, would have been 80% of all bytes • Overall traffic increase of 375% 29 Sunghwan Ihm, Princeton University

  30. CANCELLED CONTENT TYPES • Most canceled responses were text • Most bytes from video/audio/application 30 Sunghwan Ihm, Princeton University

  31. % CANCELLED REQUESTS CDF • OECD cancel more often than DevReg • Median almost double 31 Sunghwan Ihm, Princeton University

  32. ANALYSIS #3 SUMMARY • Many transactions aborted • Previewing video files • Content-based caching is effective • OECD users less patient than DevReg • Cheap bandwidth = more sampling? 32 Sunghwan Ihm, Princeton University

  33. First glimpse at CoDeeN traffic Large-scale, content-focused analysis OECD and developing world Many DevReg assumptions are false In fact, strong desire for rich content, and Patient despite slow connections Systems implications Chunk caching worth more exploration Larger caches very useful CONCLUSIONS 33 Sunghwan Ihm, Princeton University

  34. sihm@cs.princeton.eduhttp://www.cs.princeton.edu/~sihm/

More Related