Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids

Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University

Outline • Introduction and Motivation • System Model • Algorithm • Performance Analysis • Summary

Introduction • Distributed scientific applications in many cases require access to massive data sets. • In High Energy Physics (HEP) applications, for example, a handful of experiments have started producing petabytes of data per year for decades. • Data grids have served as a technology bridge between the need to access extremely large data sets and the goal of achieving high data transfer rates by providing geographically distributed computing resources and large-scale storage systems.

Introduction • The Google Data Cluster • 31,654 machines • 63,184 CPUs • 126,368 Ghz of processing • power • two identical buildings contain about 100,000 square feet of data center floor space

Introduction • Reliability • Computing in high temperatures is more error-prone than in an appropriate environment. • Operational Cost • A single 200-Watt server, such as the IBM 1U*300. The energy bill for this single server would be $180/year.

Introduction • A key factor in the process of scheduling data-intensive tasks is locations of input data sets required by tasks. • A straightforward strategy to enhance performance of data-intensive applications on data grids is to replicate popular data sets to multiple resource sites. • Offering higher data access speeds compared to maintaining the data sets in a single site.

Drawbacks of Making Too Many Replicas • It is challenging to maintain consistency among replicas in Data Grids. • It is nontrivial to efficiently generate replicas of massive data sets on the fly in Data Grids. • A large number of data replicas can increase energy dissipation in storage resources.

Reduce Energy Consumption in Data Grids • Minimize electricity cost • Improve system reliability • How to reduce energy consumption in Data Grids? • Energy-efficient scheduling algorithms for applications running on data grids.

Goals of Scheduling • Tradeoffs between energy efficiency and high-performance for data-intensive applications. • Integrate data placement strategies with task scheduling • Consider real-time requirements • How to achieve the goals? • A Distributed Energy-Efficient Scheduler called DEES • Three key components: energy-aware ranking, performance-aware scheduling, and energy-aware dispatching.

Design Goals of DEES • Maximize the number of tasks completed before their corresponding deadlines • Replicate data and place replicas in an energy-efficient way • Dispatches real-time tasks to peer computing sites, considering three factors: • Computational capacities of peer computing sites, • Energy consumption introduced by tasks, and • Data location.

Features of DEES • High scalability • Require no full knowledge of workload conditions of all the computing sites in a data grid. • One must consider that obtaining full knowledge of the state of the grid is a difficult task.

Key Ideas • High-priority tasks are scheduled first in order to meet their deadlines. • Explore slacks: low-priority tasks can have their deadlines guaranteed. • The dynamic voltage scaling (DVS) technique is used to reduce energy consumption by exploiting available slacks and adjusting appropriate voltage levels accordingly.

Dynamic Voltage Scaling • A effective technique for reducing energy consumption by adjusting the clock speed and supply voltage dynamically. • Energy dissipation per CPU cycle is proportional to v2 • Processor energy can be saved by reducing CPU voltages while running it at a slower speed.

Design Ideas • Two types of tasks: hard real-time tasks and soft real-time tasks. • Prioritize hard real-time tasks but create slacks by delaying their executions till the latest moment. • After a schedule is made, the processor voltage is adjusted to the lowest possible level on a task-by-task basis at each scheduling point.

System Model • Geographically distributed sites are interconnected through a WAN. • Each site consists of storage resources, computing resources, and a ticket server.

Energy Consumption Model • Consider energy consumption of executing tasks, making data replicas, and communicating. • The total energy consumption of a data grid, Etotal can be expressed as: where Ecomp is the total energy consumption of computing resources, Ecomm is the total energy consumption of communication, and Erep is the total energy consumption of replicating data.

Four Cases of Energy Consumption • Case 1: Local execution and local data • Case 2: Local execution and remote data • Case 3: Remote execution and same remote data • Case 4: Remote execution and different remote data

If data is not locally available, then? • Executing a task at a site where its data is located: • Energy efficient • No data transfer and no replication cost • Compared to the local execution and remote data scenario, executing the task at a remote site where data is located is still more energy efficient if • task’s input data set is larger than its execution code size.

Algorithm Components • DEES is composed of • Ranking • Scheduling • Dispatching • Goals: • Maximize the number of tasks meeting deadlines • Minimize energy consumption • Improve scalability

Task Grouping • Task Grouping: • Tasks requiring the same data are grouped together. • The task group whose data resides in the local site, called local task group, is ranked first. • Other task groups are ranked in descending order, according to the number of tasks in the task group. • Considering Real-Time Requirements: • Within each group, tasks are ordered by increasing deadline. • Thus, tasks with shorter deadlines are scheduled sooner.

DEES Scheduling • DEES schedules tasks on a group basis. • A local task group is scheduled first. In order to schedule task ti on site su, DEES selects machine mk at suthat can complete tiwithin its deadline and provide the minimum completion time. • After processing all tasks, remaining unscheduled tasks will be dispatched to remote sites.

Dispatching • Dispatching: To delivers tasks within each task group to data sites. • For task group gj whose data site is so, scheduling decisions are made by so’s scheduler based on its local resource status and task information of gj. • If so cannot schedule all tasks in gj, then unscheduled tasks are dispatched to so’s immediate neighbors using tickets in a breadth-first manner.

Energy-Aware Ranking • To make tradeoffs between energy efficiency and real-time performance, we propose a ranking system to rank so’s neighbors. where n is the number of tasks in gj that can be scheduled on sv, ε is a coefficient concerning the task deadline, μ is a coefficient concerning energy saving. • Energy consumed to replicate gi’sdata from so to sv, • Energy consumed to transfer gi’s data from so to sv, • Energy consumed to execute these n tasks at sv.

Dispatching: Energy Efficiency vs. real-time • ε and μ: To manage the two conflicting goals of saving energy and meeting deadlines. • For mission-critical tasks: ε is set to 1 and μ is set to 0, which means the neighbor that can schedule more tasks is given preference. • For energy efficiency: ε is set to 0 and μ is set to 1. Thus, the neighbor that consumes the least amount of energy will be considered first.

Simulation Parameters

Performance Analysis • Compared DEES with an effective scheduling algorithm - Close-to-Files. • Features of the Close-to-Files algorithm: • Good performance since Close-to-File takes data locality into account. • It schedules a task to its data site to decrease the amount of data transfer. • Scheduling overhead is high: It is an exhaustive algorithm that searches across all combinations of computing and data sites to find a result with the minimum computation and data transmission cost.

Performance Metrics • The Guarantee Ratio • Normalized Average Energy Consumption and Total Energy Consumption are used as the performance metrics in the evaluation.

Real-Time Performance Fig. 5.Guarantee Ratio by ranking coefficients

Energy Consumption Fig. 6.Normalized Average Energy Consumption by ranking coefficients

Performance Fig. 7.Guarantee Ratio by task loads

Energy Consumption Fig. 8.Normalized Average Energy Consumption by task loads

Summary • An energy efficient algorithm to schedule real-time tasks with data access requirements on data grids. • By reducing the amount of data replication and task transfers, the proposed algorithm effectively saves energy. • Distributed since it does not need knowledge of the complete state of the grid. • Detailed simulations demonstrate that DEES significantly reduces the energy consumption while increasing the Guarantee Ratio.

Questions • Xiao Qin • http://www.eng.auburn.edu/~xqin

Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids

Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids

Presentation Transcript

Data Grids and Data-Intensive Science

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing

Attacking Data Intensive Science with Distributed Computing

Data Intensive Applications on Clouds

Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud

Data Intensive Applications BOF

An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines

Runtime Data Management for Data-Intensive Scientific Applications

System Support for Data-Intensive Applications

Data Center Energy-Efficient Network-Aware Scheduling

System Support for Data-Intensive Applications