- By
**lyre** - Follow User

- 140 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Cluster Computing and Datalog' - lyre

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Cluster Computing and Datalog

Recursion Via Map-Reduce

Seminaïve Evaluation

Re-engineering Map-Reduce for Recursion

Acknowledgements

- Joint work with Foto Afrati
- Alkis Polyzotis and Vinayak Borkar contributed to the architecture discussions.

Implementing Datalog via Map-Reduce

- Joins are straightforward to implement as a round of map-reduce.
- Likewise, union/duplicate-elimination is a round of map-reduce.
- But implementation of a recursion can thus take many rounds of map-reduce.

Seminaïve Evaluation

- Specific combination of joins and unions.
- Example: chain rule

q(W,Z) :- r(W,X) & s(X,Y) & t(Y,Z)

- Let r, s, t = “old” relations; r’, s’, t’ = incremental relations.
- Simplification: assume |r’| = a|r|, etc.

A 3-Way Join Using Map-Reduce

q(W,Z) :- r(W,X) & s(X,Y) & t(Y,Z)

- Use k compute nodes.
- Give X and Y shares to determine the reduce-task that gets each tuple.
- Optimum strategy replicates r and t, not s, using communication |s| + 2k|r||t|.

Seminaïve Evaluation – (2)

- Need to compute sum (union) of seven terms (joins): rst’+rs’t+r’st+rs’t’+r’st’+r’s’t+r’s’t’
- Obvious method for computing a round of seminaïve evaluation:
- Replicate r and r’; replicate t and t’; do not replicate s or s’.
- Communication = (1+a)(|s| + 2k|r||t|)

Seminaïve Evaluation – (3)

- There are many other ways we might use k nodes to do the same task.
- Example: one group of nodes does (r+r’)s’(t+t’); a second group does r’s(t+t’); the third group does rst’.
- Theorem: no grouping does better than the obvious method for this example.

Networks of Processes for Recursions

- Is it possible to do a recursion without multiple rounds of map-reduce and their associated communication cost?
- Note: tasks do not have to be Map or Reduce tasks; they can have other behaviors.

Example: Very Simple Recursion

p(X,Y) :- e(X,Z) & p(Z,Y);

p(X,Y) :- p0(X,Y);

- Use k compute nodes.
- Hash Y-values to one of k buckets h(Y).
- Each node gets a complete copy of e.
- p0 is distributed among the k nodes, with p0(x,y) going to node h(y).

Example – Continued

p(X,Y) :- e(X,Z) & p(Z,Y)

- Each node applies the recursive rule and generates new tuples p(x,y).
- Key point: since new tuples have a Y-value that hashes to the same node, no communication is necessary.
- Duplicates are eliminated locally.

Harder Case of Recursion

- Consider a recursive rule

p(X,Y) :- p(X,Z) & p(Z,Y)

- Responsibility divided among compute nodes by hashing Z-values.
- Node n gets tuple p(a,b) if either h(a) = n or h(b) = n.

h(a) = n

or h(b) = n

p(c,d)

produced

To nodes

for h(c)

and h(d)

Search for

matches

Compute Node for h(Z) = nNode for

h(Z) = n

Remember all

Received tuples

(eliminate

duplicates)

Comparison with Iteration

- Advantage: Lets us avoid some communication of data that would be needed in iterated map-reduce rounds.
- Disadvantage: Tasks run longer, more likely to fail.

Node Failures

- To cope with failures, map-reduce implementations rely on each task getting its input at the beginning, and on output not being consumed elsewhere until the task completes.
- But recursions can’t work that way.
- What happens if a node fails after some of its output has been consumed?

Node Failures – (2)

- Actually, there is no problem!
- We restart the tasks of the failed node at another node.
- The replacement task will send some data that the failed task also sent.
- But each node remembers tuples to eliminate duplicates anyway.

Node Failures – (3)

- But the “no problem” conclusion is highly dependent on the Datalog assumption that it is computing sets.
- Argument would fail if we were computing bags or aggregations of the tuples produced.
- Similar problems for other recursions, e.g., PDE’s.

Extension of Map-Reduce Architecture for Recursion

- Necessarily, all tasks need to operate in rounds.
- The master controller learns of all input files that are part of the round-i input to task T and records that T has received these files.

Extension – (2)

- Suppose some task S fails, and it never supplies the round-(i +1) input to T.
- A replacement S’ for S is restarted at some other node.
- The master knows that T has received up to round i from S, so it ignores the first i output files from S’.

Extension – (3)

- Master knows where all the inputs ever received by S are from, so it can provide those to S’.

Checkpointing and State

- Another approach is to design tasks so that they can periodically write a state file, which is replicated elsewhere.
- Tasks take input + state.
- Initially, state is empty.
- Master can restart a task from some state and feed it only inputs received after that state was written.

Example: Checkpointing

p(X,Y) :- p(X,Z) & p(Z,Y)

- Two groups of tasks:
- Join tasks: hash on Z, using h(Z).
- Like tasks from previous example.
- Eliminate-duplicates tasks: hash on X and Y, using h’(X,Y).
- Receives tuples from join tasks.
- Distributes truly new tuples to join tasks.

and h(b)

if new

p(a,b)

p(a,b)

to h’(a,b)

Example – (2).

.

.

Dup-elim tasks.

State has p(x,y) if

h’(x,y) is right.

Join tasks. State

has p(x,y) if h(x)

or h(y) is right.

Example – Details

- Each task writes “buffer” files locally, one for each of the tasks in the other rank.
- The two ranks of tasks are run on different racks of nodes, to minimize the probability that tasks in both ranks will fail at the same time.

Example – Details – (2)

- Periodically, each task writes its state (tuples received so far) incrementally and lets the master controller replicate it.
- Problem: the controller can’t be too eager to pass output files to their input, or files become tiny.

Future Research

- There is work to be done on optimization, using map-reduce or similar facilities, for restricted SQL such as Datalog, Datalog–, Datalog + aggregation.
- Check out Hive, PIG, as well as work on multiway join optimization.

Future Research – (2)

- Almost everything is open about recursive Datalog implementation under map-reduce or similar systems.
- Seminaïve evaluation in general case.
- Architectures for managing failures.
- Clustera and Hyrax are interesting examples of (nonrecursive) extension of map-reduce.
- When can we avoid communication as with p(X,Y) :- e(X,Z) & p(Z,Y)?

Download Presentation

Connecting to Server..