1 / 15

Edinburgh Napier University Xiaodong Liu and Qi Liu

An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment. Edinburgh Napier University Xiaodong Liu and Qi Liu. Background Introduction Related Work Model and Algorithm Results and Evaluation Conclusion. Contents. Background.

gphillips
Download Presentation

Edinburgh Napier University Xiaodong Liu and Qi Liu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment Edinburgh Napier University Xiaodong Liu and Qi Liu

  2. Background Introduction Related Work Model and Algorithm Results and Evaluation Conclusion Contents

  3. Background Hadoop, which acts as the top project of Apache and one of the most popular cloud computing frameworks, has been widely adopted for its distributed features on data storage, computing and searching. Job scheduling is the core component of Hadoop and aims to divide a job into multiple tasks, and then provoke a JobTracker service to assign the tasks to corresponding TaskTracker nodes.

  4. Background Distributing tasks as fast as possible cannot guarantee that subsequent execution in each TaskTracker still maintains its superiority [3], and may lead to the so-called slow tasks-Straggler. Speculative Execution (SE) is the current effective mechanism to recognize and correct inefficient allocation made by a JobTracker service so as to improve the fault tolerance feature of the Hadoop.

  5. Related Work Due to the poor performance of Hadoop-naïve speculative execution strategy in heterogeneous environments, many optimized SE algorithm was proposed. LATE-using the remaining time as the speculative execution priority. MCP-optimizing the SE strategy by maximizing the benefits of launching backup tasks. ERUL-calculating the remaining time by the real-time system load and improves the accuracy of the prediction.

  6. Model and Algorithm

  7. Model and Algorithm • (1) The Recognition of Straggler Candidates • The LWR method was implemented to calculate the remaining time of tasks. • Where X is an input matrix, Y is the output vector. W is a diagonal weight function matrix.

  8. Model and Algorithm • A Gaussian kernel function is therefore used to calculate the weight function ω(d) ,where γ is the wave-length parameter and is set to 0.08 in this paper.

  9. Model and Algorithm • (2) The Benefit Calculation of Replicating Stragglers trem is the remaining time predicted by the LWR model, tavg is the average execution time of completed tasks. μ is introduced to avoid the influence of the data skew of the input data.

  10. Model and Algorithm • (3) The Selection of Backup Nodes • To enhance the performance of SE, we proposes a new method to measure and assess potential backup nodes by dividing the nodes into two good-at groups, i.e. “Map-Fast” nodes and “Reduce-Fast” nodes. • PR represents the processing rate of node candidates.

  11. Results & Evaluation The detailed information of experimental environment

  12. Results & Evaluation Job execution time and Cluster Throughtput of different SE strategies on Wordcount jobs in a normal load scenario

  13. Results & Evaluation Job execution time and Cluster Throughtput of different SE strategies on Wordcount jobs in a busy load with data skew scenario

  14. Conclusion • LWR-SE was proposed inspired by the non-linear relationship between job execution time and progress. • The experimental results have shown that the LWR-SE outperforms the MCP, LATE and Hadoop-None in three different heterogeneous scenarios designed with either normal or busy workloads

  15. Thank you!

More Related