Greenhadoop leveraging green energy in data processing frameworks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks. Íñigo Goiri , Kien Le, Thu D. Nguyen, Jordi Guitart , Jordi Torres, and Ricardo Bianchini. Motivation. Datacenters consume large amounts of energy Energy cost is not the only problem Brown sources: coal, natural gas…

Download Presentation

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Greenhadoop leveraging green energy in data processing frameworks

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks

ÍñigoGoiri, Kien Le, Thu D. Nguyen,

JordiGuitart, Jordi Torres, and Ricardo Bianchini


Motivation

Motivation

  • Datacenters consume large amounts of energy

  • Energy cost is not the only problem

    • Brown sources: coal, natural gas…

  • Connect datacenters to green sources

    • Solar panels, wind turbines…

    • Green datacenter

    • Early examples in the field


Green datacenter

Green datacenter

  • Energy sources

    • Solar/wind: variable over time

    • Electrical grid: backup

  • Mitigation approaches are not ideal

    • Batteries and net metering

  • We need to match the energy demand to the supply

Solar power

Power

Load

Workload

Time


Delaying load within time bounds

Delaying load within time bounds

J1

J2

J2

J3

Power

Nodes

Delay some jobs is OK (respecting time bounds)

J1

J2

J3

Power

Nodes

Time


Scheduling data processing workloads in green datacenters

Scheduling data-processing workloadsin green datacenters

Shuffle

  • Data-processing jobs

    • Each task operates on a chunk of data

    • Data distributed among servers

  • Simple workflow: MapReduce

    • Map tasks: process input data

    • Reduce tasks: merge maps’ outputs

      Challenges

  • Match MapReduce workload with green energy availability

    • No information on #nodes, length, power…

  • Conserve energy while ensuring data availability

1

Map

2

Map

Reduce

6

3

Map

Reduce

7

4

Map

5

Map


Overview of greenhadoop

Overview of GreenHadoop

  • Predict solar energy availability

  • May delay jobs but must meet time bounds

    • Maximize green energy use

    • If not enough green energy, minimize brown electricity cost

    • Brown energy cost + peak brown power cost

  • Deactivate idle servers while keeping data available

  • Divided into two parts

    • Computation scheduling

    • Data management


1 computation scheduling

1. Computation scheduling

Estimate the energy required by jobs (EWMA)

Job3

Job3

Job5

Job5

Job1

Job1

Job2

Job2

Job4

Job4

Job6

Job6


1 computation scheduling1

1. Computation scheduling

Assign green energy first

Job3

Job5

Job1

Job2

Job4

Job6

Off-peak

On-peak

Off-peak

Predict energy availability

(weather forecast)

Power

Now

Time


1 computation scheduling2

1. Computation scheduling

Assign cheap brown energy

Job3

Job5

Job1

Job2

Job4

Job6

Off-peak

On-peak

Off-peak

Previous

peak

Power

Now

Time


1 computation scheduling3

1. Computation scheduling

Assign expensive energy

Job3

Job5

Job1

Job2

Current power → Active servers

Job4

Job6

Off-peak

On-peak

Off-peak

Active

servers

Power

Now

Time


1 computation scheduling4

1. Computation scheduling

As time goes by…

the number of active servers changes

Active

servers

Power

Now

Time


2 data management

2. Data management

  • Deactivate servers to save energy

    • Some data might become unavailable

  • Prior solution: covering subset [Leverich’09]

    • Set of servers always running has ALL data

Covering subset

7

6

3

2

1

7

1

2

3

6

8

5

7

4

8

3

4

1

5

  • Our approach

    • Only required data has to be available

    • We usually require fewer active servers


2 data management1

2. Data management

Server 1

Active

Server 3

7

Server 2

1

2

4

4

6

Running queue:

6

5

3

Non-required file

4

6

JobA

Required file

5

JobB

Decommission

1

JobC

Down

Server 4

Server 5

2

4

3

6

8

3

7


2 data management2

2. Data management

Server 1

Server 1

Active

Server 3

7

7

Server 2

1

1

2

2

4

4

6

Running queue:

6

5

3

Non-required file

4

6

JobA

Required file

5

JobB

Decommission

1

JobC

Down

Server 4

Server 5

2

4

3

6

8

3

7

GreenHadoop (computation) requires only 2 servers


2 data management3

2. Data management

Server 1

Active

1

Server 3

7

Server 2

1

2

4

4

6

Running queue:

6

5

3

4

6

JobA

5

JobB

Replicate

Decommission

1

JobC

Down

Server 4

Server 5

2

4

3

6

8

3

7

Move required files to Active servers


2 data management4

2. Data management

Server 1

Server 1

Active

1

Server 3

7

7

Server 2

1

1

2

2

4

4

6

Running queue:

6

5

3

Non-required file

4

6

JobA

Required file

5

JobB

Decommission

1

JobC

Down

Server 4

Server 5

2

4

3

6

8

3

7

Decommissioned server can be sent to Down


2 data management5

2. Data management

Server 1

Active

4

1

Server 3

7

Server 2

6

4

1

2

6

4

4

6

Running queue:

6

5

3

Non-required file

4

6

JobA

Required file

5

JobB

Decommission

1

JobC

8

JobD

Required file

Down

4

Server 4

Server 5

6

8

2

4

3

6

8

3

7

Jobs to be executed change → Required files change


2 data management6

2. Data management

Server 1

Active

1

Server 3

7

Server 2

1

2

4

4

6

6

5

3

Non-required file

Running queue:

Required file

5

JobB

Decommission

1

JobC

8

JobD

Required file

Down

Server 4

Server 4

Server 5

2

2

4

4

3

6

8

8

3

3

7

Make missing data available


2 data management7

2. Data management

Server 1

Active

1

Server 3

7

Server 2

1

2

4

4

6

6

5

3

Non-required file

Running queue:

Required file

5

JobB

Decommission

1

JobC

8

JobD

Down

Server 4

Server 4

Server 5

2

2

4

4

3

6

8

8

3

3

7

GreenHadoop (computation) requires 3 servers


Evaluation methodology

Evaluation methodology

  • Cluster with 16 Xeon servers

    • Hadoop and Hadoop turning off idle servers (EAHadoop)

    • GreenHadoop: green energy, brown electricity cost

  • Energy profile

    • NJ electricity pricing (on/off peak and peak cost)

    • Solar farm energy availability (14 PV panels)

    • Five pairs of days (combinations of high and low days)

  • Workload

    • Derived from Facebook [Zaharia’09]

    • Jobs with up to 37GB, 600 tasks, and 6 hours of length

    • Internal time bound of one day


Energy prediction vs actual

Energy prediction vs actual

cloud cover

rain

thunderstorm


Greenhadoop for facebook high high days

GreenHadoop for Facebook & high-high days

Green

produced

30 kWh

59 kWh

$8.00

Green

consumed

Brown

consumed

31% more green

39% cost savings

Brown

price

Green

predicted

39 kWh

25 kWh

$6.06 -24%


Greenhadoop for facebook

GreenHadoop for Facebook

Effect of parameters in

GreenHadoop

Different pairs of days


Other results

Other results

  • Workload intensity (datacenter utilization)

  • High-priority jobs

  • Shorter time bounds

  • Data availability

  • Workloads variations

  • Consistent green energy increases and cost savings


Conclusions

Conclusions

  • Data-processing scheduler for green datacenters

  • Predicts green energy availability

  • Increases the use of green energy

  • Reduces brown electricity costs

  • Manages data availability

  • We are building Parasol

    • Solar-powered μdatacenter

    • Poster session


Greenhadoop leveraging green energy in data processing frameworks1

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks

ÍñigoGoiri, Kien Le, Thu D. Nguyen,

JordiGuitart, Jordi Torres, and Ricardo Bianchini


  • Login