1 / 28

CORONA

-Nagarjuna K. CORONA. What is happening in FaceBook. 1,000 people accessing (custom built-in data infrastructure) Technical & Non Technical > 500TB/day data arrival ad-hoc queries (Hive) custom MR data pipelines. What is happening in FaceBook. Largest cluster > 100PB

laksha
Download Presentation

CORONA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. -Nagarjuna K CORONA nagarjuna@outlook.com

  2. What is happening in FaceBook • 1,000 people accessing (custom built-in data infrastructure) • Technical & Non Technical • > 500TB/day data arrival • ad-hoc queries (Hive) • custom MR • data pipelines nagarjuna@outlook.com

  3. What is happening in FaceBook • Largest cluster > 100PB • More than 60,000 queries/day • datawarehouseNow = 2500 X datawarehousepast nagarjuna@outlook.com

  4. Limitations of Hadoop MR scheduling nagarjuna@outlook.com Job Tracker Responsibilities Managing Cluster Resources Scheduling All user Job Limitations Job Tracker unable to handle dual responsibilities adequately At Peak Load, cluster utilization dropped precipitously due to scheduling overhead.

  5. Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Pull based scheduling Task trackers provide a heartbeat status to the job tracker in order to get tasks to run. This is periodic Smaller Jobs => waste of time

  6. Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: STATIC SLOT-BASED RESOURCE MANAGEMENT a MapReduce cluster is divided into a fixed number of map and reduce slots based on a static configuration. slots are wasted anytime the cluster workload does not fit the static configuration.

  7. Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Job tracker design required hard downtime (all running jobs are killed) during a software upgrade Every software upgrade resulted in significant wasted computation.

  8. Limitations of Hadoop MR scheduling nagarjuna@outlook.com Another problem: Traditional analytic databases have advanced resource-based scheduling for a long time. Hadoop needs this.

  9. A better Scheduling Frame Work • Better scalability and cluster utilization • Lower latency for small jobs • Ability to upgrade without disruption • Scheduling based on actual task resource requirements rather than a count of map and reduce tasks nagarjuna@outlook.com

  10. CORONA nagarjuna@outlook.com Cluster Manager Track nodes and free resources in the cluster Job Tracker A dedicated job tracker for each and every job Client process separate process in the cluster.

  11. CORONA nagarjuna@outlook.com Push based implementations Cluster manager gets resource requests from Job Tracker CM pushes back resource grants back to Job Tracker Job Tracker then creates tasks and pushes to task trackers for execution. No Periodic Heat-Beat. Scheduling latency is minimized.

  12. CORONA nagarjuna@outlook.com Cluster Manager doesn’t track the progress of jobs. Cluster Manager is agnostic abt MapReduce Job Tracker takes care. Job Trackers now track one job each  less code complexity With this change, Manage many jobs simultaneously Better cluster utilization

  13. Benefits of Corona • Greater scalability • Lower Latency • No downtime upgrades • Better resource management nagarjuna@outlook.com

  14. Some Metrics run at FB • Avg time to refill lot • During the given period, MapReduce took around 66 seconds to refill a slot, while Corona took around 55 seconds (an improvement of approximately 17%) nagarjuna@outlook.com

  15. Some Metrics run at FB • Cluster Utilization • In heavy workloads, the utilization in the Hadoop MapReduce system topped out at 70%. Corona was able to reach more than 95%. nagarjuna@outlook.com

  16. Some Metrics run at FB • More improvements in • Scheduling fairness • Job Latency nagarjuna@outlook.com

  17. More about CORONA • http://goo.gl/XJRNN nagarjuna@outlook.com

  18. Why Not YARN  nagarjuna@outlook.com

  19. Corona Usage • Storage : 1oo PB of data • Analyzes : 105Tb/30 minutes nagarjuna@outlook.com

  20. What abtNameNode • Facebook eliminated the single point of failure in the HDFS platform using a creation it calls AvatarNode • Later on Open Source came up with HA NameNode with similar concept • More abt Avatar : • http://gigaom.com/cloud/how-facebook-keeps-100-petabytes-of-hadoop-data-online/ • https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920 nagarjuna@outlook.com

  21. Corona : Concerns • But Facebook will soon outgrow this cluster. • Those 900 million members are perpetually posting new status updates, photos, videos, comments, and — well, you get the picture. • What if 10,000 PB ? nagarjuna@outlook.com

  22. Solutions • What if hadoop cluster across multiple data centers. • Feasibility • Network packets couldn’t travel b/w networks so fast • Limitation with present Arch : • All the machines of the cluster shud be close enough nagarjuna@outlook.com

  23. Solutions • Feasibility • Introducing tens of milliseconds of delay  slowing down the system nagarjuna@outlook.com

  24. Prism nagarjuna@outlook.com A single light ray => refract to multiple rays Replicates and moves data wherever it’s needed across a vast network of computing facilities Physically separate but logically same

  25. Prism • Can move warehouses around • Not bound by limitations of the data center nagarjuna@outlook.com

  26. Prism Status • Still in development • Not yet deployed nagarjuna@outlook.com

  27. Time Line of this Technology • 23rd October • http://www.theregister.co.uk/2009/10/23/google_spanner/ • Google : Google Spanner — instamatic redundancy for 10 million servers? • Prism similar to Spanner ? • Very little known abtGoogle Spanner nagarjuna@outlook.com

  28.  Spanner, Facebook Prism could be used to instantly relocate data in the event of a data center meltdown. nagarjuna@outlook.com

More Related