slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System PowerPoint Presentation
Download Presentation
I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

Loading in 2 Seconds...

play fullscreen
1 / 31

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System - PowerPoint PPT Presentation


  • 483 Views
  • Uploaded on

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. ACM IMC 2007-10-24 I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System Meeyoung Cha (Intern at Telefonica Research / KAIST)

    2. Why the study of “bite-size bits for high-speed munching” [Wired mag. Mar 2007] • Plethora of YouTube clones • UGC is very different How different?

    3. UGC vs. Non-UGC • Massive production scale 15 days in YouTube to produce 120-yr worth of movies in IMDb! • Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years • Short video length 30 sec–5 min vs. 100 min movies in LoveFilm the rest: consumption patterns

    4. Goals and Data • Popularity distribution • Popularity evolution • P2P scalable distribution • Content duplication • Crawled YouTube and other UGC systems metadata: video ID, length, views 1.6M Entertainment, 250KScience videos Goals Data

    5. Part1: Popularity Distribution Static popularity characteristics Underlying mechanism

    6. Pareto Principle • 10% popular videos account for 80% total views Other online VoD systems show smaller skew! Fraction of aggregate views Normalized video ranking

    7. Dominant Power-Law Behavior • Richer-get-richer principle If video has K views, then users will watch the video with rate K • word frequency- citations of papers - scale of earthquakes • web hits a y=x Frequency (log) City population (log)

    8. UGC Video Distribution • Straight-line waists and truncated both ends

    9. Focusing on Popular Videos • Why popular videos deviate from power-law? • Fetch-at-most-once[SOSP2003] • Behavior of fetching immutable objects oncecf. visiting popular web sites many times

    10. Simulation on Various Parameters • Number of videos (V), users (U), avg. requests per user (R) Fetch-at-most-once Tail is more truncated forlarger R and smaller V (log) U=1000 R=10 power-lawbehavior Comp. cumulative videos (log) R=50 R=20 R=10 V=100 Views (log)

    11. Why the Unpopular Tail Falls Off • Natural shape is curved • Sampling bias or pre-filters • Publishers tend to upload interesting videos • Information filtering or post-filters • Search results or suggestions favor popular items

    12. Impact of Post-Filters • Videos exposed longer to filtering effect appear more truncated video rank

    13. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf Zipf + exp cutoff Exponential Log-normal

    14. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks Zipf Zipf + exp cutoff Exponential Log-normal

    15. Implication of Our Findings “ Latent demand for products that is suppressed by bottlenecks in the system [Chris Anderson, The Long Tail] ” Views Entertainment 40% additional views! How? Personalized recommendation Enriched metadataAbundant videos Rankings

    16. Part2: Popularity Evolution Relationship between popularity and age

    17. Popularity Evolution • So far, we focused on static popularity • Now focus on popularity dynamics • How requests on any given day are distributed across the video age? • 6-day daily trace of Science videos • Step1- Group videos requested at least once by age • Step2- Count request volume per age group

    18. Request Volume Across Age 1. Viewers mildly more interested in new videos

    19. Request Volume Across Age 2. User preference relatively insensitive to age ← 80% requests on old videos

    20. Request Volume Across Age 3. Daily top hits mostly come from new videos

    21. Request Volume Across Age 4. Some old videos get significant requests

    22. Part3: P2P Scalable Distribution Potential savings from P2P (against client-server model) Optimistic upper bound

    23. Peer-assisted VoD • 50-200 Gb/s estimated serving capacity • Bandwidth, hardware, power consumption • Stream from VoD servers or from peers • Varying user lifetime video server movie2 movie1 movie1 user C user A user B P2P when possible

    24. Number of Beneficiary Videos • P2P viable whenat least 2 online usersshare video • Very few videos benefit, but they benefit a lot Estimated number of online users per video at any moment

    25. Server Workload Savings in P2P • Potential for significant savingsDue to skewed and temporal request patterns P2P-assisted

    26. Part4: Content Duplication Level of duplication Birth of duplicates

    27. Content Duplication • Alias-identical or similar copies of the same content • Aliases dilute popularity of a single event • Views distributed across multiple copies • Difficulty in recommendation & ranking systems • Test with 51 volunteers • Find alias using keyword search • Identified 1,224 aliases for 184 original videos

    28. The Level of Popularity Dilution • Popularity diluted up to 2-order magnitude

    29. How Late Aliases Appear? • Significant aliases appear within one week

    30. Contribution • The first detailed study on UGC video popularity • Power-law waist • Truncation at popular/non-popular videos • Analyzed popularity dynamicsusing daily trace • Relationship between popularity and age • Explored potential for P2Pdistribution • Showed difficulty in video ranking due to aliases

    31. Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html Meeyoung Cha meeyoung.cha@gmail.com Questions?