1 / 12

Peek Partitioner: Leveraging Sampling to Improve TotalOrderPartitioner

Peek Partitioner: Leveraging Sampling to Improve TotalOrderPartitioner. Alex Edelsburg Eric Wheeler. Problems and Background. Sampling Harness TotalOrderPartitioner samples on K1 Accuracy and load balancing demands sampling on K2. Goals.

xiu
Download Presentation

Peek Partitioner: Leveraging Sampling to Improve TotalOrderPartitioner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peek Partitioner: Leveraging Sampling to Improve TotalOrderPartitioner Alex Edelsburg Eric Wheeler

  2. Problems and Background • Sampling Harness • TotalOrderPartitioner samples on K1 • Accuracy and load balancing demands sampling on K2

  3. Goals Use preliminary sampling job to profile normal operation of mappers and construct partitions Discover possible crossover points with runtime vs. sampling fraction

  4. Results – Worst Case Naïve 559 seconds Sampler 778 seconds

  5. Results – Best Case Naïve 109 Seconds Sampler 92 seconds

  6. Results – Teragen (Duke) Naïve 77 Seconds Sampler 123 seconds

  7. Results – Teragen (EC2) Naïve 395 Seconds Sampler 924 seconds

  8. Results – Duration vs Percentage

  9. Conclusions Improved load balance Comparable runtimes Room at the bottom (bi-level sampling) Tend to do better with more reducers (parallelism)

  10. Future Work Investigate other variable axes (row sampling fraction etc.) Improve code space efficiency Leverage sampling job to investigate mapper behavior Comparison against custom InputFormat/InputSampler Local/Cluster Mode

  11. Questions?

More Related