ACM IMC  2007-10-24
Download
1 / 31

- PowerPoint PPT Presentation


  • 462 Views
  • Uploaded on

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

ACM IMC 2007-10-24

I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System

Meeyoung Cha (Intern at Telefonica Research / KAIST)


Why the study of l.jpg
Why the study of

“bite-size bits for high-speed munching”

[Wired mag. Mar 2007]

  • Plethora of YouTube clones

  • UGC is very different

    How different?


Ugc vs non ugc l.jpg
UGC vs. Non-UGC

  • Massive production scale

    15 days in YouTube to produce 120-yr worth of movies in IMDb!

  • Extreme publishers

    1000 uploads over few years vs. 100 movies over 50 years

  • Short video length

    30 sec–5 min vs. 100 min movies in LoveFilm

    the rest: consumption patterns


Goals and data l.jpg
Goals and Data

  • Popularity distribution

  • Popularity evolution

  • P2P scalable distribution

  • Content duplication

  • Crawled YouTube and other UGC systems

    metadata: video ID, length, views

    1.6M Entertainment, 250KScience videos

Goals

Data


Part1 popularity distribution l.jpg
Part1: Popularity Distribution

Static popularity characteristics

Underlying mechanism


Pareto principle l.jpg
Pareto Principle

  • 10% popular videos account for 80% total views

Other online VoD systems show smaller skew!

Fraction of aggregate views

Normalized video ranking


Dominant power law behavior l.jpg
Dominant Power-Law Behavior

  • Richer-get-richer principle

    If video has K views, then users will watch the video with rate K

  • word frequency- citations of papers - scale of earthquakes

  • web hits

a

y=x

Frequency (log)

City population (log)


Ugc video distribution l.jpg
UGC Video Distribution

  • Straight-line waists and truncated both ends


Focusing on popular videos l.jpg
Focusing on Popular Videos

  • Why popular videos deviate from power-law?

  • Fetch-at-most-once[SOSP2003]

    • Behavior of fetching immutable objects oncecf. visiting popular web sites many times


Simulation on various parameters l.jpg
Simulation on Various Parameters

  • Number of videos (V), users (U), avg. requests per user (R)

Fetch-at-most-once

Tail is more truncated forlarger R and smaller V

(log)

U=1000 R=10

power-lawbehavior

Comp. cumulative videos (log)

R=50

R=20

R=10

V=100

Views (log)


Why the unpopular tail falls off l.jpg
Why the Unpopular Tail Falls Off

  • Natural shape is curved

  • Sampling bias or pre-filters

    • Publishers tend to upload interesting videos

  • Information filtering or post-filters

    • Search results or suggestions favor popular items


Impact of post filters l.jpg
Impact of Post-Filters

  • Videos exposed longer to filtering effect appear more truncated

video rank


Is it naturally curved l.jpg
Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf

Zipf + exp cutoff

Exponential

Log-normal


Is it naturally curved14 l.jpg
Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks

Zipf

Zipf + exp cutoff

Exponential

Log-normal


Implication of our findings l.jpg
Implication of Our Findings

Latent demand for products that is suppressed by bottlenecks in the system

[Chris Anderson, The Long Tail]

Views

Entertainment

40% additional views!

How? Personalized recommendation

Enriched metadataAbundant videos

Rankings


Part2 popularity evolution l.jpg
Part2: Popularity Evolution

Relationship between popularity and age


Popularity evolution l.jpg
Popularity Evolution

  • So far, we focused on static popularity

  • Now focus on popularity dynamics

  • How requests on any given day are distributed across the video age?

  • 6-day daily trace of Science videos

    • Step1- Group videos requested at least once by age

    • Step2- Count request volume per age group


Request volume across age l.jpg
Request Volume Across Age

1. Viewers mildly more interested in new videos


Request volume across age19 l.jpg
Request Volume Across Age

2. User preference relatively insensitive to age

← 80% requests on old videos


Request volume across age20 l.jpg
Request Volume Across Age

3. Daily top hits mostly come from new videos


Request volume across age21 l.jpg
Request Volume Across Age

4. Some old videos get significant requests


Part3 p2p scalable distribution l.jpg
Part3: P2P Scalable Distribution

Potential savings from P2P (against client-server model)

Optimistic upper bound


Peer assisted vod l.jpg
Peer-assisted VoD

  • 50-200 Gb/s estimated serving capacity

    • Bandwidth, hardware, power consumption

  • Stream from VoD servers or from peers

    • Varying user lifetime

video server

movie2

movie1

movie1

user C

user A

user B

P2P when possible


Number of beneficiary videos l.jpg
Number of Beneficiary Videos

  • P2P viable whenat least 2 online usersshare video

  • Very few videos benefit, but they benefit a lot

Estimated number of online users per video at any moment


Server workload savings in p2p l.jpg
Server Workload Savings in P2P

  • Potential for significant savingsDue to skewed and temporal request patterns

P2P-assisted


Part4 content duplication l.jpg
Part4: Content Duplication

Level of duplication

Birth of duplicates


Content duplication l.jpg
Content Duplication

  • Alias-identical or similar copies of the same content

  • Aliases dilute popularity of a single event

    • Views distributed across multiple copies

    • Difficulty in recommendation & ranking systems

  • Test with 51 volunteers

    • Find alias using keyword search

    • Identified 1,224 aliases for 184 original videos


The level of popularity dilution l.jpg
The Level of Popularity Dilution

  • Popularity diluted up to 2-order magnitude


How late aliases appear l.jpg
How Late Aliases Appear?

  • Significant aliases appear within one week


Contribution l.jpg
Contribution

  • The first detailed study on UGC video popularity

    • Power-law waist

    • Truncation at popular/non-popular videos

  • Analyzed popularity dynamicsusing daily trace

    • Relationship between popularity and age

  • Explored potential for P2Pdistribution

  • Showed difficulty in video ranking due to aliases


Dataset available at http an kaist ac kr traces imc2007 html meeyoung cha meeyoung cha@gmail com l.jpg

Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html

Meeyoung Cha meeyoung.cha@gmail.com

Questions?