ACM IMC  2007-10-24
1 / 31

- PowerPoint PPT Presentation

  • Uploaded on

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - Anita

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

ACM IMC 2007-10-24

I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System

Meeyoung Cha (Intern at Telefonica Research / KAIST)

Why the study of l.jpg
Why the study of

“bite-size bits for high-speed munching”

[Wired mag. Mar 2007]

  • Plethora of YouTube clones

  • UGC is very different

    How different?

Ugc vs non ugc l.jpg
UGC vs. Non-UGC

  • Massive production scale

    15 days in YouTube to produce 120-yr worth of movies in IMDb!

  • Extreme publishers

    1000 uploads over few years vs. 100 movies over 50 years

  • Short video length

    30 sec–5 min vs. 100 min movies in LoveFilm

    the rest: consumption patterns

Goals and data l.jpg
Goals and Data

  • Popularity distribution

  • Popularity evolution

  • P2P scalable distribution

  • Content duplication

  • Crawled YouTube and other UGC systems

    metadata: video ID, length, views

    1.6M Entertainment, 250KScience videos



Part1 popularity distribution l.jpg
Part1: Popularity Distribution

Static popularity characteristics

Underlying mechanism

Pareto principle l.jpg
Pareto Principle

  • 10% popular videos account for 80% total views

Other online VoD systems show smaller skew!

Fraction of aggregate views

Normalized video ranking

Dominant power law behavior l.jpg
Dominant Power-Law Behavior

  • Richer-get-richer principle

    If video has K views, then users will watch the video with rate K

  • word frequency- citations of papers - scale of earthquakes

  • web hits



Frequency (log)

City population (log)

Ugc video distribution l.jpg
UGC Video Distribution

  • Straight-line waists and truncated both ends

Focusing on popular videos l.jpg
Focusing on Popular Videos

  • Why popular videos deviate from power-law?

  • Fetch-at-most-once[SOSP2003]

    • Behavior of fetching immutable objects oncecf. visiting popular web sites many times

Simulation on various parameters l.jpg
Simulation on Various Parameters

  • Number of videos (V), users (U), avg. requests per user (R)


Tail is more truncated forlarger R and smaller V


U=1000 R=10


Comp. cumulative videos (log)





Views (log)

Why the unpopular tail falls off l.jpg
Why the Unpopular Tail Falls Off

  • Natural shape is curved

  • Sampling bias or pre-filters

    • Publishers tend to upload interesting videos

  • Information filtering or post-filters

    • Search results or suggestions favor popular items

Impact of post filters l.jpg
Impact of Post-Filters

  • Videos exposed longer to filtering effect appear more truncated

video rank

Is it naturally curved l.jpg
Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos


Zipf + exp cutoff



Is it naturally curved14 l.jpg
Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks


Zipf + exp cutoff



Implication of our findings l.jpg
Implication of Our Findings

Latent demand for products that is suppressed by bottlenecks in the system

[Chris Anderson, The Long Tail]



40% additional views!

How? Personalized recommendation

Enriched metadataAbundant videos


Part2 popularity evolution l.jpg
Part2: Popularity Evolution

Relationship between popularity and age

Popularity evolution l.jpg
Popularity Evolution

  • So far, we focused on static popularity

  • Now focus on popularity dynamics

  • How requests on any given day are distributed across the video age?

  • 6-day daily trace of Science videos

    • Step1- Group videos requested at least once by age

    • Step2- Count request volume per age group

Request volume across age l.jpg
Request Volume Across Age

1. Viewers mildly more interested in new videos

Request volume across age19 l.jpg
Request Volume Across Age

2. User preference relatively insensitive to age

← 80% requests on old videos

Request volume across age20 l.jpg
Request Volume Across Age

3. Daily top hits mostly come from new videos

Request volume across age21 l.jpg
Request Volume Across Age

4. Some old videos get significant requests

Part3 p2p scalable distribution l.jpg
Part3: P2P Scalable Distribution

Potential savings from P2P (against client-server model)

Optimistic upper bound

Peer assisted vod l.jpg
Peer-assisted VoD

  • 50-200 Gb/s estimated serving capacity

    • Bandwidth, hardware, power consumption

  • Stream from VoD servers or from peers

    • Varying user lifetime

video server




user C

user A

user B

P2P when possible

Number of beneficiary videos l.jpg
Number of Beneficiary Videos

  • P2P viable whenat least 2 online usersshare video

  • Very few videos benefit, but they benefit a lot

Estimated number of online users per video at any moment

Server workload savings in p2p l.jpg
Server Workload Savings in P2P

  • Potential for significant savingsDue to skewed and temporal request patterns


Part4 content duplication l.jpg
Part4: Content Duplication

Level of duplication

Birth of duplicates

Content duplication l.jpg
Content Duplication

  • Alias-identical or similar copies of the same content

  • Aliases dilute popularity of a single event

    • Views distributed across multiple copies

    • Difficulty in recommendation & ranking systems

  • Test with 51 volunteers

    • Find alias using keyword search

    • Identified 1,224 aliases for 184 original videos

The level of popularity dilution l.jpg
The Level of Popularity Dilution

  • Popularity diluted up to 2-order magnitude

How late aliases appear l.jpg
How Late Aliases Appear?

  • Significant aliases appear within one week

Contribution l.jpg

  • The first detailed study on UGC video popularity

    • Power-law waist

    • Truncation at popular/non-popular videos

  • Analyzed popularity dynamicsusing daily trace

    • Relationship between popularity and age

  • Explored potential for P2Pdistribution

  • Showed difficulty in video ranking due to aliases

Dataset available at http an kaist ac kr traces imc2007 html meeyoung cha meeyoung cha@gmail com l.jpg

Dataset available at

Meeyoung Cha