slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System PowerPoint Presentation
Download Presentation
I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

Loading in 2 Seconds...

play fullscreen
1 / 31

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System - PowerPoint PPT Presentation

  • Uploaded on

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System' - Anita

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

ACM IMC 2007-10-24

I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System

Meeyoung Cha (Intern at Telefonica Research / KAIST)

why the study of
Why the study of

“bite-size bits for high-speed munching”

[Wired mag. Mar 2007]

  • Plethora of YouTube clones
  • UGC is very different

How different?

ugc vs non ugc
UGC vs. Non-UGC
  • Massive production scale

15 days in YouTube to produce 120-yr worth of movies in IMDb!

  • Extreme publishers

1000 uploads over few years vs. 100 movies over 50 years

  • Short video length

30 sec–5 min vs. 100 min movies in LoveFilm

the rest: consumption patterns

goals and data
Goals and Data
  • Popularity distribution
  • Popularity evolution
  • P2P scalable distribution
  • Content duplication
  • Crawled YouTube and other UGC systems

metadata: video ID, length, views

1.6M Entertainment, 250KScience videos



part1 popularity distribution
Part1: Popularity Distribution

Static popularity characteristics

Underlying mechanism

pareto principle
Pareto Principle
  • 10% popular videos account for 80% total views

Other online VoD systems show smaller skew!

Fraction of aggregate views

Normalized video ranking

dominant power law behavior
Dominant Power-Law Behavior
  • Richer-get-richer principle

If video has K views, then users will watch the video with rate K

  • word frequency- citations of papers - scale of earthquakes
  • web hits



Frequency (log)

City population (log)

ugc video distribution
UGC Video Distribution
  • Straight-line waists and truncated both ends
focusing on popular videos
Focusing on Popular Videos
  • Why popular videos deviate from power-law?
  • Fetch-at-most-once[SOSP2003]
    • Behavior of fetching immutable objects oncecf. visiting popular web sites many times
simulation on various parameters
Simulation on Various Parameters
  • Number of videos (V), users (U), avg. requests per user (R)


Tail is more truncated forlarger R and smaller V


U=1000 R=10


Comp. cumulative videos (log)





Views (log)

why the unpopular tail falls off
Why the Unpopular Tail Falls Off
  • Natural shape is curved
  • Sampling bias or pre-filters
    • Publishers tend to upload interesting videos
  • Information filtering or post-filters
    • Search results or suggestions favor popular items
impact of post filters
Impact of Post-Filters
  • Videos exposed longer to filtering effect appear more truncated

video rank

is it naturally curved
Is it Naturally Curved?
  • Matlab curve fitting for Science

Science videos


Zipf + exp cutoff



is it naturally curved14
Is it Naturally Curved?
  • Matlab curve fitting for Science

Science videos

Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks


Zipf + exp cutoff



implication of our findings
Implication of Our Findings

Latent demand for products that is suppressed by bottlenecks in the system

[Chris Anderson, The Long Tail]



40% additional views!

How? Personalized recommendation

Enriched metadataAbundant videos


part2 popularity evolution
Part2: Popularity Evolution

Relationship between popularity and age

popularity evolution
Popularity Evolution
  • So far, we focused on static popularity
  • Now focus on popularity dynamics
  • How requests on any given day are distributed across the video age?
  • 6-day daily trace of Science videos
    • Step1- Group videos requested at least once by age
    • Step2- Count request volume per age group
request volume across age
Request Volume Across Age

1. Viewers mildly more interested in new videos

request volume across age19
Request Volume Across Age

2. User preference relatively insensitive to age

← 80% requests on old videos

request volume across age20
Request Volume Across Age

3. Daily top hits mostly come from new videos

request volume across age21
Request Volume Across Age

4. Some old videos get significant requests

part3 p2p scalable distribution
Part3: P2P Scalable Distribution

Potential savings from P2P (against client-server model)

Optimistic upper bound

peer assisted vod
Peer-assisted VoD
  • 50-200 Gb/s estimated serving capacity
    • Bandwidth, hardware, power consumption
  • Stream from VoD servers or from peers
    • Varying user lifetime

video server




user C

user A

user B

P2P when possible

number of beneficiary videos
Number of Beneficiary Videos
  • P2P viable whenat least 2 online usersshare video
  • Very few videos benefit, but they benefit a lot

Estimated number of online users per video at any moment

server workload savings in p2p
Server Workload Savings in P2P
  • Potential for significant savingsDue to skewed and temporal request patterns


part4 content duplication
Part4: Content Duplication

Level of duplication

Birth of duplicates

content duplication
Content Duplication
  • Alias-identical or similar copies of the same content
  • Aliases dilute popularity of a single event
    • Views distributed across multiple copies
    • Difficulty in recommendation & ranking systems
  • Test with 51 volunteers
    • Find alias using keyword search
    • Identified 1,224 aliases for 184 original videos
the level of popularity dilution
The Level of Popularity Dilution
  • Popularity diluted up to 2-order magnitude
how late aliases appear
How Late Aliases Appear?
  • Significant aliases appear within one week
  • The first detailed study on UGC video popularity
    • Power-law waist
    • Truncation at popular/non-popular videos
  • Analyzed popularity dynamicsusing daily trace
    • Relationship between popularity and age
  • Explored potential for P2Pdistribution
  • Showed difficulty in video ranking due to aliases
dataset available at http an kaist ac kr traces imc2007 html meeyoung cha meeyoung cha@gmail com
Dataset available at

Meeyoung Cha