ACM IMC  2007-10-24
Download
1 / 31

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System - PowerPoint PPT Presentation


  • 422 Views
  • Uploaded on
  • Presentation posted in: Music / Video

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


ACM IMC 2007-10-24

I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System

Meeyoung Cha(Intern at Telefonica Research / KAIST)


Why the study of

“bite-size bits for high-speed munching”

[Wired mag. Mar 2007]

  • Plethora of YouTube clones

  • UGC is very different

    How different?


UGC vs. Non-UGC

  • Massive production scale

    15 days in YouTube to produce 120-yr worth of movies in IMDb!

  • Extreme publishers

    1000 uploads over few years vs. 100 movies over 50 years

  • Short video length

    30 sec–5 min vs. 100 min movies in LoveFilm

    the rest: consumption patterns


Goals and Data

  • Popularity distribution

  • Popularity evolution

  • P2P scalable distribution

  • Content duplication

  • Crawled YouTube and other UGC systems

    metadata: video ID, length, views

    1.6M Entertainment, 250KScience videos

Goals

Data


Part1: Popularity Distribution

Static popularity characteristics

Underlying mechanism


Pareto Principle

  • 10% popular videos account for 80% total views

Other online VoD systems show smaller skew!

Fraction of aggregate views

Normalized video ranking


Dominant Power-Law Behavior

  • Richer-get-richer principle

    If video has K views, then users will watch the video with rate K

  • word frequency- citations of papers - scale of earthquakes

  • web hits

a

y=x

Frequency (log)

City population (log)


UGC Video Distribution

  • Straight-line waists and truncated both ends


Focusing on Popular Videos

  • Why popular videos deviate from power-law?

  • Fetch-at-most-once[SOSP2003]

    • Behavior of fetching immutable objects oncecf. visiting popular web sites many times


Simulation on Various Parameters

  • Number of videos (V), users (U), avg. requests per user (R)

Fetch-at-most-once

Tail is more truncated forlarger R and smaller V

(log)

U=1000 R=10

power-lawbehavior

Comp. cumulative videos (log)

R=50

R=20

R=10

V=100

Views (log)


Why the Unpopular Tail Falls Off

  • Natural shape is curved

  • Sampling bias or pre-filters

    • Publishers tend to upload interesting videos

  • Information filtering or post-filters

    • Search results or suggestions favor popular items


Impact of Post-Filters

  • Videos exposed longer to filtering effect appear more truncated

video rank


Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf

Zipf + exp cutoff

Exponential

Log-normal


Is it Naturally Curved?

  • Matlab curve fitting for Science

Science videos

Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks

Zipf

Zipf + exp cutoff

Exponential

Log-normal


Implication of Our Findings

Latent demand for products that is suppressed by bottlenecks in the system

[Chris Anderson, The Long Tail]

Views

Entertainment

40% additional views!

How? Personalized recommendation

Enriched metadataAbundant videos

Rankings


Part2: Popularity Evolution

Relationship between popularity and age


Popularity Evolution

  • So far, we focused on static popularity

  • Now focus on popularity dynamics

  • How requests on any given day are distributed across the video age?

  • 6-day daily trace of Science videos

    • Step1- Group videos requested at least once by age

    • Step2- Count request volume per age group


Request Volume Across Age

1. Viewers mildly more interested in new videos


Request Volume Across Age

2. User preference relatively insensitive to age

← 80% requests on old videos


Request Volume Across Age

3. Daily top hits mostly come from new videos


Request Volume Across Age

4. Some old videos get significant requests


Part3: P2P Scalable Distribution

Potential savings from P2P (against client-server model)

Optimistic upper bound


Peer-assisted VoD

  • 50-200 Gb/s estimated serving capacity

    • Bandwidth, hardware, power consumption

  • Stream from VoD servers or from peers

    • Varying user lifetime

video server

movie2

movie1

movie1

user C

user A

user B

P2P when possible


Number of Beneficiary Videos

  • P2P viable whenat least 2 online usersshare video

  • Very few videos benefit, but they benefit a lot

Estimated number of online users per video at any moment


Server Workload Savings in P2P

  • Potential for significant savingsDue to skewed and temporal request patterns

P2P-assisted


Part4: Content Duplication

Level of duplication

Birth of duplicates


Content Duplication

  • Alias-identical or similar copies of the same content

  • Aliases dilute popularity of a single event

    • Views distributed across multiple copies

    • Difficulty in recommendation & ranking systems

  • Test with 51 volunteers

    • Find alias using keyword search

    • Identified 1,224 aliases for 184 original videos


The Level of Popularity Dilution

  • Popularity diluted up to 2-order magnitude


How Late Aliases Appear?

  • Significant aliases appear within one week


Contribution

  • The first detailed study on UGC video popularity

    • Power-law waist

    • Truncation at popular/non-popular videos

  • Analyzed popularity dynamicsusing daily trace

    • Relationship between popularity and age

  • Explored potential for P2Pdistribution

  • Showed difficulty in video ranking due to aliases


Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html

Meeyoung Chameeyoung.cha@gmail.com

Questions?


ad
  • Login