TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC - PowerPoint PPT Presentation

Towards understanding developing world traffic l.jpg
Download
1 / 34

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton). IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD. Internet access is a scarce commodity in the developing world: expensive / slow

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Towards understanding developing world traffic l.jpg

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

Sunghwan Ihm (Princeton)

KyoungSoo Park (KAIST)

Vivek S. Pai (Princeton)


Improving network access in the developing world l.jpg

IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD

  • Internet access is a scarce commodity in the developing world: expensive / slow

  • Our focus: improving performance of connected network access

  • Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet)

2

Sunghwan Ihm, Princeton University


Possible options l.jpg

POSSIBLE OPTIONS

Web proxy caching

Whole objects

Single endpoint (local)

Designated cacheable traffic only

WAN acceleration

Packet-level caching

Mostly for enterprise

Two (or more) endpoints, coordinated

Effective in first world

3

Sunghwan Ihm, Princeton University


Developing world questions l.jpg

DEVELOPING WORLD QUESTIONS

  • How effective are these approaches?

    • Systems designed for first-world use

    • Most traffic studies small, first-world focused

    • How similar is developing region traffic?

  • Any new opportunities to exploit?

    • Differences in traffic

    • Differences in cost/tradeoffs

    • System design issues

4

Sunghwan Ihm, Princeton University


Understanding developing world traffic l.jpg

UNDERSTANDING DEVELOPING WORLD TRAFFIC

Goal

Shape system design by better understanding the traffic optimization opportunities

Requirements

Large-scale, content-focused analysis

5

Sunghwan Ihm, Princeton University


Prior traffic analysis work l.jpg

PRIOR TRAFFIC ANALYSIS WORK

  • Large scale traffic analysis

    • Internet Study 2007, 2008/2009 by ipoque

    • One million users

    • High-level characteristics via DPI

    • First-world focus

  • Developing world traffic analysis

    • Du et al. WWW’06, Johnson et al. NSDR’10

    • Proxy-level analysis from kiosk, Internet cafes, and community centers

6

Sunghwan Ihm, Princeton University


Our approach l.jpg

OUR APPROACH

  • Combine best features

    • Large-scale and content-focused

    • First world and developing world

  • Use traffic from CoDeeN content distribution network (CDN)

    • Global proxy (500+ PlanetLab nodes)

    • Running since 2003

    • 30+ million requests per day

7

Sunghwan Ihm, Princeton University


What to analyze l.jpg

WHAT TO ANALYZE?

Traffic profile

Caching opportunities

User behavior

8

Sunghwan Ihm, Princeton University


Data collection l.jpg

WAN

Browser

Cache

Local

Proxy

Cache

CoDeeN

Cache

DATA COLLECTION

Origin

Web Server

User

  • Assume local proxy caches

  • Focus on cache misses only

  • Capture full content

9

9

Sunghwan Ihm, Princeton University


Data set l.jpg

DATA SET

  • Duration: 1 week (March 25-31, 2010)

  • # Requests: 157 Million

  • Volume: 3 TeraBytes

  • # Clients (unique IPs): 348 K

  • # Countries/Regions: 190

    • /8 networks coverage: 61.3%

    • /16 networks coverage: 24.1%

10

Sunghwan Ihm, Princeton University


Top countries l.jpg

TOP COUNTRIES

Requests %

Bytes %

Clients %

SA

PL

CN

Etc.

Etc.

Etc.

PL

CN

CN

DE

PL

US

SA

DE

AE

US

RU

SA

PL (Poland)

DE (Germany)

CN (China)

US (United States)

SA (Saudi Arabia)

RU (Russian Federation)

11

Etc.(185 Countries)

AE (United Arab Emirates)


Oecd vs devreg l.jpg

OECD VS. DEVREG

  • OECD: the first world

    • 27 high-income economies from OECD member countries

    • 25% of total traffic

  • DevReg: the developing world

    • The remaining 163 countries and 3 OECD members: Mexico, Poland, and Turkey

    • 75% of total traffic

12

Sunghwan Ihm, Princeton University


Analysis 1 traffic profile l.jpg

ANALYSIS #1: TRAFFIC PROFILE

  • Conjecture:

    DevReg users visit low-bandwidth Web pages (small objects and text-heavy)

  • We often hear a variant of

    “Offline Wikipedia content suffices for developing world users”

13

Sunghwan Ihm, Princeton University


Object size l.jpg

OBJECT SIZE

  • Small: median 3KB vs. 5KB

  • Large: similar demand/profile

16KB

14

Sunghwan Ihm, Princeton University


Text and images l.jpg

TEXT AND IMAGES

  • DevReg has a higher fraction of images

  • Exact opposite of bandwidth conjecture

15

Sunghwan Ihm, Princeton University


Video and audio l.jpg

VIDEO AND AUDIO

  • DevReg: higher fraction of video & audio

  • Music videos and MP3 songs

16

Sunghwan Ihm, Princeton University


Application flash l.jpg

APPLICATION (FLASH)

  • DevReg has a higher fraction of application traffic

  • Median near 7%

17

Sunghwan Ihm, Princeton University


Analysis 1 summary l.jpg

ANALYSIS #1 SUMMARY

  • Some evidence that DevReg-visited sites have smaller objects, but

  • DevReg users visit large pages as well, and

  • DevReg users seek a higher fraction of rich content than OECD users

18

Sunghwan Ihm, Princeton University


Analysis 2 caching opportunity l.jpg

ANALYSIS #2: CACHING OPPORTUNITY

  • Conjecture: little gain from larger caches

    • Some analysis suggests 1GB sufficient

    • Typical cache size < 20GB

    • Object-based caching

19

Sunghwan Ihm, Princeton University


Content based chunk caching l.jpg

A

B

C

D

E

CONTENT-BASED CHUNK CACHING

  • Split content into chunks

    • Name chunks by content (SHA-1 hash)

    • Cache chunks instead of objects

  • Fetch content, send only modified chunks

    • Two endpoints needed

    • Applies to “uncacheable” content

20

Sunghwan Ihm, Princeton University


Overall redundancy l.jpg

OVERALL REDUNDANCY

  • 40% @ 64 KB: objects or parts of large object

  • 60% @ 1 KB: parts of text pages

  • 65% @ 128 bytes: paragraphs or sentences

21

Sunghwan Ihm, Princeton University


Cache behavior simulation l.jpg

CACHE BEHAVIOR SIMULATION

  • Simulate one week’s traffic

    • Cache misses only

    • LRU cache replacement policy

  • Determine size for near-ideal hit rate

    • Calculate byte hit ratio (BHR)

    • Vary storage size (from 10MB to max)

  • Results for US, China, and Brazil

22

Sunghwan Ihm, Princeton University


Us 213 gb l.jpg

US – 213 GB


China 559 gb l.jpg

CHINA – 559 GB


Brazil 44 gb l.jpg

BRAZIL – 44 GB


Analysis 2 summary l.jpg

ANALYSIS #2 SUMMARY

  • Chunk caching useful

    • Reduces WAN (cache miss) traffic

    • Complements existing Web proxies

  • Larger caches useful

    • Useful reduction in miss rate

    • Cheap compared to bandwidth costs

26

Sunghwan Ihm, Princeton University


Analysis 3 user behavior l.jpg

ANALYSIS #3: USER BEHAVIOR

  • Conjecture: as first-world Web pages get larger, DevReg users suffer delays

  • Mechanism: observe aborted transfers

    • Intentional termination

    • Automatic when browsing away

  • Abort = users bored or downloads slow

27

Sunghwan Ihm, Princeton University


Cancelled object size c cdf l.jpg

CANCELLED OBJECT SIZEC-CDF

  • Cancelled objects larger than normal (red)

  • Complete objects (green) much larger than actual download (blue)

  • Most downloads less than 10MB

28

Sunghwan Ihm, Princeton University


Cancelled transfer volume l.jpg

CANCELLED TRANSFER VOLUME

  • 17% of transfers are terminated early

  • Due to the early termination, 25% of actual traffic

  • If fully downloaded, would have been 80% of all bytes

    • Overall traffic increase of 375%

29

Sunghwan Ihm, Princeton University


Cancelled content types l.jpg

CANCELLED CONTENT TYPES

  • Most canceled responses were text

  • Most bytes from video/audio/application

30

Sunghwan Ihm, Princeton University


Cancelled requests cdf l.jpg

% CANCELLED REQUESTS CDF

  • OECD cancel more often than DevReg

    • Median almost double

31

Sunghwan Ihm, Princeton University


Analysis 3 summary l.jpg

ANALYSIS #3 SUMMARY

  • Many transactions aborted

  • Previewing video files

    • Content-based caching is effective

  • OECD users less patient than DevReg

    • Cheap bandwidth = more sampling?

32

Sunghwan Ihm, Princeton University


Conclusions l.jpg

First glimpse at CoDeeN traffic

Large-scale, content-focused analysis

OECD and developing world

Many DevReg assumptions are false

In fact, strong desire for rich content, and

Patient despite slow connections

Systems implications

Chunk caching worth more exploration

Larger caches very useful

CONCLUSIONS

33

Sunghwan Ihm, Princeton University


Sihm@cs princeton edu http www cs princeton edu sihm l.jpg

sihm@cs.princeton.eduhttp://www.cs.princeton.edu/~sihm/


  • Login