ACM IMC  2007-10-24
1 / 31

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: Music / Video

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

ACM IMC 2007-10-24

I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System

Meeyoung Cha(Intern at Telefonica Research / KAIST)

Why the study of

“bite-size bits for high-speed munching”

[Wired mag. Mar 2007]

  • Plethora of YouTube clones

  • UGC is very different

    How different?

UGC vs. Non-UGC

  • Massive production scale

    15 days in YouTube to produce 120-yr worth of movies in IMDb!

  • Extreme publishers

    1000 uploads over few years vs. 100 movies over 50 years

  • Short video length

    30 sec–5 min vs. 100 min movies in LoveFilm

    the rest: consumption patterns

Goals and Data

  • Popularity distribution

  • Popularity evolution

  • P2P scalable distribution

  • Content duplication

  • Crawled YouTube and other UGC systems

    metadata: video ID, length, views

    1.6M Entertainment, 250KScience videos



Part1: Popularity Distribution

Static popularity characteristics

Underlying mechanism

Pareto Principle

  • 10% popular videos account for 80% total views

Other online VoD systems show smaller skew!

Fraction of aggregate views

Normalized video ranking

Dominant Power-Law Behavior

  • Richer-get-richer principle

    If video has K views, then users will watch the video with rate K

  • word frequency- citations of papers - scale of earthquakes

  • web hits



Frequency (log)

City population (log)

UGC Video Distribution

  • Straight-line waists and truncated both ends

Focusing on Popular Videos

  • Why popular videos deviate from power-law?

  • Fetch-at-most-once[SOSP2003]

    • Behavior of fetching immutable objects oncecf. visiting popular web sites many times

Simulation on Various Parameters

  • Number of videos (V), users (U), avg. requests per user (R)


Tail is more truncated forlarger R and smaller V


U=1000 R=10


Comp. cumulative videos (log)





Views (log)

Why the Unpopular Tail Falls Off

  • Natural shape is curved

  • Sampling bias or pre-filters

    • Publishers tend to upload interesting videos

  • Information filtering or post-filters

    • Search results or suggestions favor popular items

Impact of Post-Filters

  • Videos exposed longer to filtering effect appear more truncated

video rank

Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos


Zipf + exp cutoff



Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks


Zipf + exp cutoff



Implication of Our Findings

Latent demand for products that is suppressed by bottlenecks in the system

[Chris Anderson, The Long Tail]



40% additional views!

How? Personalized recommendation

Enriched metadataAbundant videos


Part2: Popularity Evolution

Relationship between popularity and age

Popularity Evolution

  • So far, we focused on static popularity

  • Now focus on popularity dynamics

  • How requests on any given day are distributed across the video age?

  • 6-day daily trace of Science videos

    • Step1- Group videos requested at least once by age

    • Step2- Count request volume per age group

Request Volume Across Age

1. Viewers mildly more interested in new videos

Request Volume Across Age

2. User preference relatively insensitive to age

← 80% requests on old videos

Request Volume Across Age

3. Daily top hits mostly come from new videos

Request Volume Across Age

4. Some old videos get significant requests

Part3: P2P Scalable Distribution

Potential savings from P2P (against client-server model)

Optimistic upper bound

Peer-assisted VoD

  • 50-200 Gb/s estimated serving capacity

    • Bandwidth, hardware, power consumption

  • Stream from VoD servers or from peers

    • Varying user lifetime

video server




user C

user A

user B

P2P when possible

Number of Beneficiary Videos

  • P2P viable whenat least 2 online usersshare video

  • Very few videos benefit, but they benefit a lot

Estimated number of online users per video at any moment

Server Workload Savings in P2P

  • Potential for significant savingsDue to skewed and temporal request patterns


Part4: Content Duplication

Level of duplication

Birth of duplicates

Content Duplication

  • Alias-identical or similar copies of the same content

  • Aliases dilute popularity of a single event

    • Views distributed across multiple copies

    • Difficulty in recommendation & ranking systems

  • Test with 51 volunteers

    • Find alias using keyword search

    • Identified 1,224 aliases for 184 original videos

The Level of Popularity Dilution

  • Popularity diluted up to 2-order magnitude

How Late Aliases Appear?

  • Significant aliases appear within one week


  • The first detailed study on UGC video popularity

    • Power-law waist

    • Truncation at popular/non-popular videos

  • Analyzed popularity dynamicsusing daily trace

    • Relationship between popularity and age

  • Explored potential for P2Pdistribution

  • Showed difficulty in video ranking due to aliases

Dataset available at



  • Login