1 / 12

One Billion Rows Per Second: Analytics for the Digital Media Markets

One Billion Rows Per Second: Analytics for the Digital Media Markets. XLDB October 19, 2011. MICHAEL DRISCOLL CO-FOUNDER & CTO. @ medriscoll. Taming the Inferno of the Online Ad Markets. billions of microtransactions per day dozens of publisher, advertiser, & audience attributes.

jolie
Download Presentation

One Billion Rows Per Second: Analytics for the Digital Media Markets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One Billion Rows Per Second: Analytics for the Digital Media Markets XLDB October 19, 2011 MICHAEL DRISCOLL CO-FOUNDER & CTO @medriscoll

  2. Taming the Inferno of the Online Ad Markets • billions of microtransactions per day • dozens of publisher, advertiser, & audience attributes

  3. Goal: Fast Analytics Over 100s of Terabytes

  4. Goal: Fast Analytics Over 100s of Terabytes dashboard queries in seconds database data crunched in minutes ingestion

  5. Solution 1: MPP Database dashboard queries in minutes database MPP Database data crunched in minutes ingestion Hadoop

  6. Solution 2: HBase dashboard queries in seconds database HBase data crunched in hours ingestion Hadoop

  7. Solution 3: Do It Ourselves: Druid dashboard queries in seconds database Druid data crunched in minutes ingestion Hadoop

  8. Four Principles of Druid’s Performance at Scale SUMMARIZE 100x smaller vs raw data DISTRIBUTE 100x throughput vs a single node (with 100 cores) PARALLELIZE 100x faster vsdisk STORE IN-MEMORY = 10^6 Druid can filter and aggregate over 1 billion rows per second on a 50-core cluster, or 20m rows per core per second factor speed-up

  9. Consequences of Druid: Faster Queries photo credit tonylanciabeta http://www.flickr.com/photos/tonysphotos/3305157904/sizes/o/in/photostream/

  10. Consequences of Druid: Fresher Data photo credit: Lars P. http://www.flickr.com/photos/lars_p/4911238308/sizes/o/in/photostream/

  11. Consequences of Druid: Scalable in the Cloud photo credit: MonkeyAt Large http://www.flickr.com/photos/monkeyatlarge/16645379/sizes/l/in/photostream/

  12. One Billion Rows Per Second: Analytics for the Digital Media Markets QUESTIONS? CONTACT ME AT MIKE@METAMARKETSGROUP.COM MICHAEL DRISCOLL CO-FOUNDER & CTO @medriscoll

More Related