slide1
Download
Skip this Video
Download Presentation
Cours : Grille de donnèes Prof .: Jean-Marc Pierson, Lionel Brunie

Loading in 2 Seconds...

play fullscreen
1 / 20

Cours : Grille de donnèes Prof .: Jean-Marc Pierson, Lionel Brunie - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Cours : Grille de donnèes Prof .: Jean-Marc Pierson, Lionel Brunie Date de Présentation : 01/02/2006 Étudiant : Sammarco Aniello. An Adaptive Distributed Query Processing Grid Service F.Porto - V.F.V.da Silva – M.L.Dutra – B.Schulze Proc. VLDB Workshop on Data Management in Grids

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Cours : Grille de donnèes Prof .: Jean-Marc Pierson, Lionel Brunie' - ondrea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Cours :Grille de donnèes

Prof .:Jean-Marc Pierson, Lionel Brunie

Date de Présentation :01/02/2006

Étudiant :Sammarco Aniello

An Adaptive Distributed Query Processing Grid Service

F.Porto - V.F.V.da Silva – M.L.Dutra – B.Schulze

Proc. VLDB Workshop on Data Management in Grids

VLDB,LNCS 3836, Trondheim, Norway 2-3 September 2005

slide2

PLAN

1-INTRODUCTION

2-ABSTRACT DB

3-ARCHITECTURE

4-QUERY PROCESSING

5-GridGreedyNode (G2N) algorithm

6-Query Execution Engine Framework

7-INITIAL RESULT

8-CONCLUSION

Slide N.:2

slide3

PROJECT CoDIMS

(Configurable Data Integration Middleware)

It is a distributed grid service for the evaluation of

scientific queries . The design of CoDIMS-G

focused on conceiving efficient and adaptable

query evaluation strategies for the grid

environment.

TESTBED: It support the pre-processing stage of

a scientific visualization application (SVA) at the

National Laboratory of Scientific Computing

(LNCC) - Brazil -

OBJECTIVES

SOLUTIONS

RESULT

FOCUS ON ADAPTIVE PROBLEM

Slide N.:3

slide4

PROJECT CoDIMS-G

OBJECTIVES

Dynamic scheduling and allocation of query execution engine modules into grid nodes

(2) Adaptability of query execution to variations on environment conditions

(3) Support to special scientific operations

SOLUTIONS

RESULT

FOCUS ON ADAPTIVE PROBLEM

Slide N.:4

slide5

PROJECT CoDIMS-G

OBJECTIVES

SOLUTIONS

Using the processing power available in a grid

environment may substantially reduce the time

needed for pre-processing virtual particle

trajectory.

(1) A new node scheduling algorithm

“selects grid nodes for parallel evaluation”

(2) Extend the Eddy operator

RESULT

FOCUS ON ADAPTIVE PROBLEM

Slide N.:5

slide6

PROJECT CoDIMS-G

OBJECTIVES

SOLUTIONS

RESULT

Reduction of the sheduling time

FOCUS ON ADAPTIVE PROBLEM

Slide N.:6

slide7

PROJECT CoDIMS-G

OBJECTIVES

SOLUTIONS

RESULT

FOCUS ON ADAPTIVE PROBLEM

To adapt the execution of an application to the

changing conditions of selected grid nodes.

The problem in this context is to identify points

where execution may be interrupted in a node and

restarted in other nodes .

Slide N.:7

slide8

ABSTRACT DB

The Geometryrelation stores data associated with

polyhedron\'s geometry:

Geometry (id, time-instant, polyhedron<point>,

velocity<point-velocity>) ;

Particle relation holds the initial particle position :

Particle (part-id, time-instant, point)

The Resulting-vectoruser program computes a

resulting speed vector in a specific position of the

flow path:

Resulting-vector (position, polyhedron<point>,

velocity<point-velocity>): velocity

The Trajectory Computing Program (TCP)

computes VP\'s subsequent position:

TCP (particle-id, position, velocity): new-position

Velocity relation corresponds to velocity vectors

for each time instant.

Slide N.:8

slide9

ARCHITECTURE OF CoDIMS-G

A QE is the component where actual query execution takes place. Instances of QE are instantiated into grid scheduled nodes. Each QE receives a fragment of the DQEP and it is responsible of its execution control .

>>

Client Interface

Users requests are forwarded

to the Control component .

The Control Component is the essence of the CoDIMS

environment which stores, manages, validates and verifies an instance

configuration. which sends users requests to the query

processing system

>>

The QEM is responsible for deploying the query execution engine (QE) services at the nodes specified in the DQEP and managing their life-cycle during the query execution.

The QEM manages the QEs real-time performance .

The Parser transforms the users´ requests in a query graph representation(QG)

>>

The Query Optimizer (QO) receives the graph and generates a physical distributed query execution plan (DQEP) using a cost model based on data and programs statistics stored in the Metadata Manager (MM).

>>

The optimizer calls the Scheduler (SC) Component and it indicates the set of interesting nodes to be allocated for the parallelized operator. The scheduler and optimizer cooperate to generate an initial

distributed parallel query execution plan DQEP.

>>

Control Component

Metadata Manager

Parser Component

Query Optimizer

Scheduler Component

Query Engine 1

Query Engine 2

Query Execution

Manager

Query Engine n

Slide N.:9

slide10

DISTRIBUTED QUERY PROCESSING

We express a query as a query graph QG, defined

as a partial ordered set of operators QG={,},

where  is a set of algebraic operators and  is a

set of dependencies relations,where

if (w1  w2), with w1, w2  and w1 , then w2 succeds w1 in a bottom-up navigation of the DEQP and not (w2  w1)

The optimization algorithm explores the search

space of valid plans, in accordance to data

dependency restrictions. It considers all valid

execution orders of expensive operators in QG

Edges.

ALTERNATIVES

WHY

Slide N.:10

slide11

DISTRIBUTED QUERY PROCESSING

ALTERNATIVES

(a)non parallelization

(b)scheduling according to the G2N algorithm

(Grid Greedy )

(c) adoption of the same parallelization strategy used by the previous operator in the query execution plan.

For each computed query execution plan, a cost is

associated, using a parallel pipeline cost function.

The DQEP presenting the lowest cost is selected

for execution.

WHY

Slide N.:11

slide12

DISTRIBUTED QUERY PROCESSING

ALTERNATIVES

WHY

This strategy guarantees that costly programs only

get invoked when all predicates have been

evaluated, eventually reducing the number of

tuples to be processed by them

Slide N.:12

slide13

IMPLEMENTATION

GridGreedyNode (G2N) algorithm

G2N (throughput(tp1,tp2,…, tpn ),number-tasks):result

nodelist:= descending order(throughput);

result:= result {nodelist(1)};

cost(1):= number-tasks * nodelist(1);

current-cost:=cost(1);

While (nodes in the list and add-new-node)

total-cost:= current-cost;

new-node:= next-node in nodelist;

While (current-cost <= total-cost)

move tuples from lowest node in result to new-node;

Update costs of nodes and total-cost;

If current-cost > total-cost

If we could move at least 1 tuple to the new-node

result:= result {new-node}

else

add-new-node:=false;

Stop loop;

endwhile

endwhile

output result;

The loop node to new grid node . It produce a new evaluation estimation that reduce query elapsedtime,until actual elapsedtime becomes higher

the last computed. Conversely,

the algorithm stops and outputs the

grid nodes accepted so far

>>

OUTPUT :

Load Query Optimazer with

the initial query execution plan and the re-scheduling of allocated nodes in face of variations on estimated values

>>

The G2N algorithm receives a set of available nodes with corresponding average throughput (tp1;tp2;…tpn), measured in tuples per second. The total estimated number of

tasks (T) to be evaluated

>>

The algorithm classifies the list of available grid nodes in decreasing order of their corresponding average throughput values. It then allocates all T tuples to the fastest node

>>

Slide N.:13

slide14

ADAPTIVE QUERY EXECUTION - QEEF

Query Execution Engines(QEE) for supporting

the execution of traditional queries.

QEEF (Query Execution Engine Framework):

an extensible QEE adapted to new execution

models that implement each execution model as a

combination of execution modules

SIMULATION

ANALISIS ON BLOCK SIZE

Slide N.:14

slide15

ADAPTIVE QUERY EXECUTION - QEEF

SIMULATION

Eddy

MERGE

SPLIT

RECEIVE

SEND

SEND

RECEIVE

SEND

RECEIVE

SEND

RECEIVE

ANALYSIS ON BLOCK SIZE

Slide N.:15

slide16

ADAPTIVE QUERY EXECUTION - QEEF

SIMULATION

ANALYSIS ON BLOCK SIZE

Block size is an important tool to build adaptivity

into the system. Eddy modifies a remote node

block size in the following scenarios :

1-TimeOut(estimated time)

2- eddy proceeds a local adaptation(checking on current throughput values)

3- variations scheduled nodes

4- When 2/3 tuples have beene valuated:

- dataflow reduced

-Eddy recomputes the number of scheduled nodes

- increase the number of tuples in each node

Slide N.:16

slide17

SCIENTIFIC APPLICATIONS

QEEF framework has been extended with :

-user\'s program execution

(strategy Apply operator)

-spatial and temporal hash-joins

(implements the iterator interface)

-loop control over query execution plan fragment

(repetitively evaluated)

INITIAL RESULT

Slide N.:17

slide18

SCIENTIFIC APPLICATIONS

INITIAL RESULT

The project configuraation :

-java 1.4.2 and globus 3.2.1

-20 pentium IV

20 pentium IV, 1.7 GHz, processors

with 256 MB of RAM, running linux 2.4.20-31.9

We considered :

an instance with 1000 particles and executing 25

iterations by each particle.

Than we Obtained increasing :

from 1 node to 25 nodes

Results :

demonstrated a gain of up to 11 times with 20

machines, with respect to a centralized execution

(With 2.7 tuples for second).

Problem :

blocking size update strategy to be very useful .

Slide N.:18

slide19

CONCLUSION

CoDIMS-G, which is an adaptive distributed

query processing grid service.

The proposed query execution strategy extends eddy

adaptive query execution model for the grid.

Environment,considering the variations on grid nodes

run-time conditions.

Slide N.:19

slide20

by Paul Horn, senior vice president, IBM research:

“The information-technology industry loves to prove the impossible possible”

Mercì!

Slide N.:20

ad