an overview of hulu s metrics platform n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An overview of Hulu’s metrics platform PowerPoint Presentation
Download Presentation
An overview of Hulu’s metrics platform

Loading in 2 Seconds...

play fullscreen
1 / 25

An overview of Hulu’s metrics platform - PowerPoint PPT Presentation


  • 442 Views
  • Uploaded on

An overview of Hulu’s metrics platform. Tristan Reid tristan.reid@hulu.com. Prasan Samtani prasan.samtani@hulu.com. What we do. Streaming video service > 5.5 million subscribers > 20 million unique visitors/month > 1 billion ads/month. It all begins with beacons. Living room device

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

An overview of Hulu’s metrics platform


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. An overview of Hulu’s metrics platform Tristan Reid tristan.reid@hulu.com Prasan Samtani prasan.samtani@hulu.com

    2. What we do • Streaming video service • > 5.5 million subscribers • > 20 million unique visitors/month • > 1 billion ads/month

    3. It all begins with beacons Living room device (Roku, Xbox, etc) Beacon collection service Mobile device (Android, iPhone, etc) Web (hulu.com)

    4. What’s in a beacon 80 2013-04-01 00:00:00 /v3/playback/start? bitrate=650 &cdn=Akamai &channel=Anime &clichéent=Explorer &computerguid=EA8FA1000232B8F6986C3E0BE55E9333 &contentid=5003673 …

    5. Reporting platform (RP2) Find Metrics & Dimensions Design and execute reports

    6. The pipeline Beacon collection service Devices Devices Devices LogCollector/Flume HDFS Monitoring (metstat) MapReduce jobs/JobScheduler Developers Hive Reporting (RP2) Harpy – continuous aggregation RDBMS Business

    7. Log Collection Devices Devices Devices … Log Collection machine #1 Log Collection machine #11 Load balancer HDFS Files bucketed by beacon type and partitioned by hour

    8. Directory hierarchy on HDFS

    9. MapReduce - going from beacons to basefacts

    10. If a program manipulates a large amount of data, it does so in a small number of ways - Alan Perlis

    11. The BeaconSpec compiler Java MapReduce code that can run on the cluster Definitions of beacons and base-facts Beaconspec compiler

    12. What does our language look like? basefactplayback_watched_uniquesfrom playback/(position|end) { dimensionharpyhour.id as hourid; dimensioncomputerguid as computerguid; dimensionuserid as userid; required dimension video.id as video_id; required dimensioncontentPartner.id as content_partner_id; … dimensionsiteSessionId.chosen as site_session_id; dimensionfacebook.isfacebookconnected as is_facebook_connected; factsum(watched.out) as watched; } FAQ: Why didn’t we just use Pig?

    13. The superior [program] cultivates itself so as to give rest to [programmers] - Confucius, the Way of the Superior Man

    14. Scheduling jobs Outside world MapReduce job MapReduce job MapReduce job JobMonitor JobMonitor JobMonitor JobScheduler Interface JobScheduler Logmanager databases Checks databases for jobs that are ready to run and whether dependencies are met

    15. JobScheduler technology • The actor model of concurrency • Communication through async messaging • Completely encapsulated state

    16. Message passing Actor creation Central idea: Treat local objects as if they are distributed, as opposed to treating distributed objects as if they are local

    17. Fault-tolerance – let it crash!

    18. Harpy – continuous aggregations Harpy Metadata Queue Processor Hive DataSync Publishing HDFS NFS Holding Sweeper Agg Scheduler HoldingDB Output DBs

    19. RP2 • Reporting Portal for pulling Metrics + Dimensions • Quick ‘Demo’

    20. Let’s Reexamine the pipeline: Beacon collection service Devices Devices Devices LogCollector/Flume HDFS Monitoring (metstat) MapReduce jobs/JobScheduler Developers Hive Reporting (RP2) Harpy – continuous aggregation RDBMS Business

    21. Metstat • Python Django App • Tasks on Celery + RabbitMQ • JQuery • Tracks status, status changes and statistics • Gets data directly from various sources (databases, HDFS)

    22. FAQ: Why didn’t we just use Pig? • Dataflow language – runs on Hadoop • Pig philosophy • (Taken from the Apache website) • Pigs eat anything • Pigs live anywhere • Pigs are domestic animals • Pigs fly Beaconspec

    23. REGISTER ./tutorial.jar; raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query); clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query); clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query; Beware of the Turing tar-pit where everything is possible but nothing of interest is easy - Alan Perlis Beaconspec

    24. FAQ: What is open sourced? • Slickint – database interface generation for Scala • github.com/zenbowman/slickint • Local filesystem caching for hadoop • github.com/ZenBowman/luna