1 / 21

Data Freeway : Scaling Out to Realtime

Data Freeway : Scaling Out to Realtime. Author: Eric Hwang, Sam Rash { ehwang,rash }@ fb.com Speaker : Haiping Wang ctqlwhp1022@gamil.com. Agenda. Data at Facebook Realtime Requirements Data Freeway System Overview Realtime Components Calligraphus/Scribe

wilton
Download Presentation

Data Freeway : Scaling Out to Realtime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Freeway : Scaling Out to Realtime • Author: Eric Hwang, Sam Rash {ehwang,rash}@fb.com • Speaker : Haiping Wang ctqlwhp1022@gamil.com

  2. Agenda • Data at Facebook • Realtime Requirements • Data Freeway System Overview • Realtime Components • Calligraphus/Scribe • HDFS use case and modifications • Calligraphus: a Zookeeper use case • ptail • Puma • Future Work

  3. Big Data, Big Applications / Data at Facebook • Lots of data • More than 500 million active users • 50 million users update their statuses at least once each day • More than 1 billion photos uploaded each month • More than 1 billion pieces of content (web links, news stories, blog posts, notes, photos, etc.) shared each week • Data rate: over 7 GB / second • Numerous products can leverage the data • Revenue related: Ads Targeting • Product/User Growth related: AYML, PYMK, etc • Engineering/Operation related: Automatic Debugging • Puma: streaming queries

  4. Example: User related Application • Major challenges: Scalability , Latency

  5. Realtime Requirements • Scalability: 10-15 GBytes/second • Reliability: No single point of failure • Data loss SLA: 0.01% • Loss due to hardware: means at most 1 out of 10,000 machines can lose data • Delay of less than 10 sec for 99% of data • Typically we see 2s • Easy to use: as simple as ‘tail –f /var/log/my-log-file’

  6. Data Freeway System Diagram • Scribe & Calligraphus get data into the system • HDFS at the core • Ptail provides data out • Puma is a emerging streaming analytics platform

  7. Scribe • Scalable distributed logging framework • Very easy to use: • scribe_log(string category, string message) • Mechanics: • Built on top of Thrift • Runs on every machine at Facebook, Collect the log data into a bunch of destinations • Buffer data on local disk if network is down • History: • 2007: Started at Facebook • 2008 Oct: Open-sourced

  8. Calligraphus • What • Scribe-compatible server written in Java • Emphasis on modular, testable code-base, and performance • Why? • Extract simpler design from existing Scribe architecture • Cleaner integration with Hadoop ecosystem • HDFS, Zookeeper, HBase, Hive • History • In production since November 2010 • Zookeeper integration since March 2011

  9. HDFS : a different use case • Message hub • Add concurrent reader support and sync • Writers + concurrent readers a form of pub/sub model

  10. HDFS : add Sync • Sync • Implement in 0.20 (HDFS-200) • Partial chunks are flushed • Blocks are persisted • Provides durability • Lowers write-to-read latency

  11. HDFS : Concurrent Reads Overview • Without changes, stock Hadoop 0.20 does not allow access to the block being written • Need to read the block being written for realtime apps in order to achieve < 10s latency

  12. HDFS : Concurrent Reads Implementation • DFSClient asks Namenode for blocks and locations • DFSClient asks Datanode for length of block being written • opens last block

  13. Calligraphus: Log Writer Calligraphus Servers HDFS Scribe categories Server ? Category 1 Server Category 2 Category 3 Server • How to persist to HDFS?

  14. Calligraphus (Simple) Calligraphus Servers HDFS Scribe categories Server Category 1 Server Category 2 Category 3 Server Number of categories Total number of directories Number of servers = x

  15. Calligraphus (Stream Consolidation) Calligraphus Servers HDFS Scribe categories Router Writer Category 1 Router Writer Category 2 Category 3 Router Writer ZooKeeper Number of categories Total number of directories =

  16. ZooKeeper: Distributed Map • Design • ZooKeeper paths as tasks (e.g. /root/<category>/<bucket>) • Cannonical ZooKeeper leader elections under each bucket for bucket ownership • Independent load management – leaders can release tasks • Reader-side caches • Frequent sync with policy db Root A B C D 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

  17. Canonical Realtime ptail Application • Hides the fact we have many HDFS instances: user can specify a category and get a stream • Check pointing Puma

  18. Puma Overview • Realtime analytics platform • Metrics • count, sum, unique count, average, percentile • Uses ptail check pointing for accurate calculations in the case of failure • Puma nodes are sharded by keys in the input stream • HBase for persistence

  19. Puma Write Path

  20. Puma Read Path • Performance • Elapsed time typically 200-300 ms for 30 day queries • 99th percentile, cross-country, < 500ms for 30 day queries

  21. Future Work • Puma • Enhance functionality: add application-level transactions on Hbase • Streaming SQL interface • Compression

More Related