1 / 65

A Survey of Techniques for Designing I/O-Efficient Algorithm

A Survey of Techniques for Designing I/O-Efficient Algorithm. S.Fahimeh Moosavi Fall 1389. Basic Techniques. Scanning -N/B I/Os while linear scanning the whole array. Sorting -O((N/B) log M /B N/B) I/Os. Simulation of Parallel Algorithms in External Memory.

nalani
Download Presentation

A Survey of Techniques for Designing I/O-Efficient Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Survey of Techniques for Designing I/O-Efficient Algorithm S.FahimehMoosavi Fall 1389

  2. Basic Techniques • Scanning -N/B I/Os while linear scanning the whole array. • Sorting -O((N/B)logM/B N/B) I/Os.

  3. Simulation of Parallel Algorithms in External Memory

  4. PRAM [Parallel Random Access Machine] • p processors, each with local memory • Each processor has unique id in range 0-(p-1) • Shared memory reads and writes

  5. At each unit of time, a processor is either active or idle (depending on id) • At each time step, all processors may execute different instructions on different data. Note: any pair of processor Pi Pjcan communicate in constant time! Piwrites the message in cell x at timet Pjreads the message in cell x at timet+1 Measures of performance: 1. Running time. 2. Amount of work it performs.

  6. PRAM-algorithm A : Uses: • N processor • O(N) space Run time: O(T(N)) Assumption: every computation step of a processor consists of a constant number of write/read accesses to shared memory.

  7. Simulation one step of algorithm A • Scan the list of processor context (read accesses read requests). • Sort the resulting list of read request by the memory locations they access. • Scan the sorted list of read request and memory representation. • Sort the list of read request again, by the issuing processor. • Scan the sorted list of read request and the list of processor context to transfer the requested operands to each processor.

  8. O(1) scans of list of processor context. • O(1) scans of the representation of the shared memory . • A constant number of times scanning and sorting the list of read/write request. • All this lists have size O(N). Consequence: Simulation one step of algorithm A takes O(sort(N)) I/Os. Theorem 3.2. A PRAM algorithm that uses N processors and O(N) space and runs in time T(N) can be simulated in O(T(N).sort(N)) I/Os.

  9. Time-Forward Processing

  10. Evaluating a DAG G L: an assignment of labels L(v) to the vertices of G. Goal: compute another labelling Sof the vertices of G so that for every vertex vϵG, S(v) be computed from L(v) and S(u1), ... , S(uk) (u1, … , uk: in-neighbors of v).

  11. Expression-tree evaluation Input: a binary tree T whose leaves store real number and internal vertices store binary operation. If v is leaf then val(v)=number stored at v. If v is internal vertex with label o, left child x, right child y then val(v) = val(x) o val(y).

  12. Evaluate a DAG G I/O-efficiently Two assumption: • the vertices of G have to be stored in topologically sorted order. • label S(v) has to computable from labels L(v) and S(u1),..., S(uk) in O(sort(k)) I/Os.

  13. Insertion and deletemin operations on Q (priority queue) be performed in O((1/B).log(|E|/B)M/B). • Total number of priority queue operations: O(|E|) (Every edge inserted into and deleted from Q exactly once). Consequenc: all updates of priority queue takes O(sort(|E|)) I/Os.

  14. Note: • Vertex set + adjacency lists scanned: O(scan(|V| + |E|)) I/Os. • Computation labels S(v) from L(v) and S(u1),..., S(uk), for all vϵG, takes O(sort(|E|)). Theorem 3.3. given a DAG G=(V,E) whose vertices are stored in topologically sorted order, graph G can be evaluated in O(sort(|V|+|E|)) I/Os, provided that the computation of the label of every vertex vϵG can be carried out in O(sort(deg-(v))) I/Os, where deg-(v) is the in-degree of vertex v.

  15. Greedy Graph Algorithm

  16. A vertex labelling algorithm A call: • single-pass: if it compute the desired labelling of vertices of the graph by visiting every vertex exactly once. • local: if label L(v) can be computed in O(sort(k)) I/Os from labels L(u1),...,L(uk), where u1,...,ukthe neighbors of v whose labels are computed before L(v). • Presortable: if there is an algorithm that take O(sort(|V|+|E|)) I/Os to compute an order of the vertices of the graph so that A produces a correct result if it visits the vertices of the graph in this order.

  17. Main Problems at Make Algorithm A I/O-efficient • determine an order in which algorithm A should visit the vertices of graph. • devise a mechanism that provides every vertex v with the labels of its previously visited neighbors.

  18. Theorem 3.4. Every graph problem P that can be solved by a presortable local single-pass vertex labelling algorithm can be solved in O(sort(|V|+(|E|)) I/Os. Proof: A:presortable local single-pass vertex labelling algorithm. L: labelling of vertices of a graph G=(V,E). Á: an algorithm that takes O(sort(|V|+(E)) I/Os to compute an order of the vertices of G (numbering the vertices) G ́́: a derived DAG from G by directing every edge from the with smaller number to the vertex with larger number. Hence, labelling L can be computed using time-forward processing.

  19. Computing a Maximal Independent Set In internal memory: Process the vertices in an arbitrary order. When a vertex vϵV is visited, add it to S if none of its neighbors is in S. Translate into a labelling problem: Xs :V→{0,1} If vϵS then Xs(v)=1, If vϵS then Xs(v)=0. Theorem 3.5. Given an undirected graph G=(V,E), a maximal independent set of G can be found in O(sort(|V|+(E)) I/Os and linear space.

  20. Coloring Graphs of Bounded Degree In internal memory: Process the vertices in an arbitrary order. When a vertex vϵV is visited, assign a color c(v) ϵ {1, …, Δ+1} to vertex v that has not been assigned to any neighbor of v. Theorem 3.6. Given an undirected graph G=(V,E) whose vertices have degree at most Δ, a (Δ+1)-coloring of G can be found in O(sort(|V|+(|E|)) I/Os and linear space.

  21. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. 6 5 4 2 3 1 0

  22. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. 6 5 4 2 3 1 0

  23. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 0

  24. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 1 0

  25. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 1 0

  26. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 1 0

  27. List Ranking and the Euler Tour Technique

  28. List Ranking List ranking problem: computing distance from head of linked list L to xi, for every vertex of L (the number of edges on the path from head of L to xi). Solving in internal memory: Starting at the head of the list, follow successor pointers and number the vertices of the list from 0 to N-1 in the order they are visited.

  29. Generalization of the List Ranking(prefix product) I/O complexity: O(sort(N)) Input: λ: {x1 ,…,xN} → X  : X×X → X Output: Ø(xi) For each vertex xi of L such that • Ø(xσ(1))=λ(xσ(1)) • Ø(xσ(i))= Ø(xσ(i-1))  λ(xσ(i)) (1< i ≤ N) Where σ=[1,N] → [1,N] is a permutation, And xσ(1) is the head of L, And succ(xσ(i))=xσ(i+1).

  30. 3 1 5 2 3 1 0 1 2 3 4 5 3 4 9 11 14 15 Example List ranking: Generalization:

  31. 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Internal memory algorithm is not I/O-efficient The internal memory algorithm spends W(N) I/Os in the worst case.

  32. An Efficient List Ranking Algorithm • Find an independent set I of L so that |I|=Ω(N). • Remove elements of I from L. for every element x ϵ I with predecessor y and successor z in L let succ(y)=z. The label of x multiplied with the label of z, and result assigned to z. • Apply this algorithm recursively to the compressed list. • Compute the ranks of elements in I by multiplying their labels with the ranks of their predecessors in L.

  33. 3 1 5 2 3 1 3 1 5 2 3 1 3 1 7 4 3 4 11 15 3 4 9 11 14 15 Example

  34. I/O-Complexity • Every step, except the recursive invocation, takes O(sort(N)) I/Os. • Total I/O-complexity: Ι(N)=Ι(cN)+O(sort(N)) (0<c<1). • Solution of this recurrence: O(sort(N)). • Theorem 3.7. A list of length N can be ranked in O(sort(N)) I/Os.

  35. r The Euler Tour Technique Euler tour of a tree: a traversal of T that traverses every edge exactly twice, once in each direction. Tour is represented as a linked list L whose elements are the edges in the set {(v,w),(w,v):{v,w} ϵ E} so that for any two consecutive edges e1 and e2, the target of e1 is the source of e2.

  36. Define an Euler tuor • Choose a circular order of the edges incident to each vertex. • Let {v,w1} , … , {v,wk} be the edges incident to vertex v. then let succ((wi,v))=(v,wi+1) for 1≤i<k and succ((wk , v))=(v,w1). • Now by choosing an edge (v,r) with succ((v,r))=(r,w), setting succ((v,r))=null, and choosing (r,w) as the first edge of the traversal.

  37. Computing List L Input: a tree T=(V,E) Output: an tour L • Scan set E to replace every edge {v,w} with two directed edge (v,w) and (w,v). • Sort the resulting set of directed edges by their target vertices. • Scan the sorted edge list to compute the successor of every edge in L. Lemma 3.8. an Euler tour L of a tree with N vertices can be computed in O(sort(N)) I/Os.

  38. Rooting Tree Rooting tree T= computing for every edge {v,w} who is the parent and who is the child. Definition: for every pair of opposite edges (u,v), (v,u) in the ranked euler tour, we call: Forward edge: the edge with the lower rank. Back edge: the other.

  39. Algorithm Input: an unrooted (and undirected) tree T and a special vertex r. Output: For each vertex v  r, the parent p(v) of v in the tree rooted at r. • Construct an euler tour starting at an edge (r,v). • Compute the rank of every edge in the list. For every pair of adjacent vertices x and p(x), edge (p(v) ,x) is a forward edge, and edge (x, p(v)) is a back edge.

  40. I/O Complexity • Constructing euler tour starting at r: O(sort(N)) I/Os. • Ranking euler tour: O(sort(N)) I/Os. • Extracting the set of forward edge: O(sort(N)) I/Os. I/O complexity: O(sort(N))

  41. 1 1 1 0 9 1 9 0 8 2 10 1 2 0 8 1 5 0 5 1 6 7 0 4 3 6 1 3 0 8 1 7 0 4 1 8 1 4 0 7 0 3 4 5 8 9 Computing a Preorder Numbering A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os. preorder#(r) = 1 preorder#(v) = rank((p(v),v))+1

  42. 10 1 9 9 8 8 1 2 8 5 3 5 6 4 3 1 3 8 7 4 8 4 7 3 1 1 1 1 Computing Subtree Sizes The nodes of T can be labelled with their subtreesizes in O(sort(N)) I/Os.

  43. Graph Blocking

  44. Blocking Graph Goal: laying out graphs on disk so that traversals of paths in this graphs cause as few page faults as possible. Assumptions: • Graph to be stored on disk is static. • The paths are traversed in an online fashion. Measures of performance: • Number of page faults incurred by a path traversal in the worst case. • Amount of space used by the graph representation.

  45. Notes: • In order to store a Graph with N vertices at least N/B blocks are required. • The traversal of a path with length L causes at least L/B page faults Definition: storage blow-up a graph blocking to be β if it uses βN/B blocks of storage to store the graph on disk.

  46. Blocking List Natural Approach, β = 1 • Simple traversal in direction 1.. N, With the traversal a path of length L only L/B page faults occur. • More complicated traversal, alternatives if M≥2B, Keep last block in Memory so a page fault occurs every B-1step. With the traversal a path of length L at most L/B page faults occur. 1 2 3 4 ... N

  47. The Pathological Situation M=B An adversary can choose a path that causes a page fault every single step by choosing a path p=(v, w, v, w, …) Whenever vertex v is visited, the block containing v is brought into main memory, thereby overwriting the block containing w. v v w w

  48. Thwarting the adversary’s strategy: Choose β = 2, In a second array stores a copy of the array with an offset B/2. This implies that the visited vertex v is at least B/2-1steps from the next page fault away, since the page handler alternates between the two arrays every time a page fault occurs. Result: Traversing a path of L now incurs at most 2L/B page faults.

  49. Blocking Trees To blocking trees needs some more restrictions on the tree or the type of traversal. Consider a tree with internal degree M, then for any vertex v at most M-1 of its neighbors can reside in memory at the same time as v. So an adversary could always choose the missing neighbor to cause a page fault. Result: For unrestricted traversals, a good blocking of a tree can be achieved if the degree of the vertices of the tree is bounded by some constant d.

  50. Construct Layout Choose one vertex r of T as the root, construct two partitions with layers of height logdB. i-th layer contains: • Partition 1: all vertices have distance (i-1)logdB... ilogdB-1 • Partition 2: all vertices have distance (i-1/2)logdB ... (i+1/2)logdB-1 logdB logdB

More Related