1 / 34

Techniques for designing I/O-efficient Alghorithms

Techniques for designing I/O-efficient Alghorithms. Overview: Basic Techniques Part 1 Greedy Graphs Part 2 Graph Blocking. Basic Techniques. We have already seen Scanning -N/B I/O‘s while linear scanning the whole array. Sorting k-way mergesort.

yen
Download Presentation

Techniques for designing I/O-efficient Alghorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for designing I/O-efficient Alghorithms Techniques for designing efficient I/O Algorithms

  2. Overview: Basic Techniques Part 1 Greedy Graphs Part 2 Graph Blocking Techniques for designing efficient I/O Algorithms

  3. Basic Techniques • We have already seen • Scanning • -N/B I/O‘s while linear scanning the whole array. • Sorting • k-way mergesort. • -O((N/B)logM/B N/B) I/O‘s divide and conquer. Techniques for designing efficient I/O Algorithms

  4. Greedy Graph Algorithms Greedy graph algorithms Graph G =(V,E) Theorem: Every graph problem P that can be solved by a presortable local single-pass vertex labeling algorithm can be solved in O(sort(|V|+|E|)) I/O's Techniques for designing efficient I/O Algorithms

  5. Greedy Graph Algorithms Presortable: There exists an algorithm that sorts all the vertices in O(sort(|V|+|E|)) I/O’s in such a way that they can be visited in this order. Local: Label L(v) can be computed in O(sort(k)) from labels L(u1)...L(uk) where u1...uk are neighbors of v and are computed before v. Singlepass: The labeling L of a graph can be done by visiting every vertex v exactly once. Techniques for designing efficient I/O Algorithms

  6. Greedy Graph Algorithms If an algorithm A has to be efficient one has to determine an order to visit the vertices of graph G and provide a mechanism that supplies every vertex v with the labels of its neighbors u1...uk. Since A is presortable the ordering can be done in O(sort(|V|+|E|)) further we assume that this sorting is a numbering so that each vertex gets assigned with a number. 2 3 7 4 1 5 6 Techniques for designing efficient I/O Algorithms

  7. Greedy Graph Algorithms We derive a DAG(Directed Acyclic Graph) from the new gained Graph G’. This DAG has the property that every neighbor of v is labeled before v. Due to locality of A we can label all the vertices in O(sort(k)), if we assume that we have to sort all the labels from the neighbors. 7 5 6 1 2 3 4 Techniques for designing efficient I/O Algorithms

  8. Greedy Graph Algorithms Those are simplified condition of the applicability of time-foreward processing (see exercise from two weeks ago). The restriction that the vertices have to be stored in topologically sorted order does not pose problem since the direction of the edges are chosen after fixing an order of the vertices. Techniques for designing efficient I/O Algorithms

  9. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -chose an arbritrary order. Techniques for designing efficient I/O Algorithms

  10. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  11. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  12. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  13. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  14. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  15. Greedy Graph Algorithms Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors 6 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 5 4 2 3 1 0 Techniques for designing efficient I/O Algorithms

  16. Graph Blocking Graph Blocking Graph blocking is an efficient technique to store vertices in such a way that if we access the data we get as few page faults as possible. Techniques for designing efficient I/O Algorithms

  17. Graph Blocking Assumptions: Graphs are statically stored on Disk. To visit a vertex v we can access any copy of the vertex. Paths are traversed in online fashion. So an adversary can choose a worst case path for the algorithm. Graph blocking can be used to efficiently process queries on pointer based data structures. Techniques for designing efficient I/O Algorithms

  18. Graph Blocking In order to store a Graph with N vertices at least N/B blocks are required. Definition: storage blow-up ß is if the storage uses ßN/B blocks We must try to minimize ß at the same time as the number of page faults since space is a issue with large datasets. The traversal of a path with length L causes at least L/B page faults For this technique we include lists, trees, grids and planar graphs Techniques for designing efficient I/O Algorithms

  19. Blocking Lists Natural Approach, ß = 1 1 2 3 4 ... N Simple traversal in direction 1..N, L/B page faults occur. More complicated traversal, alternatives if M >= 2B Keep last block in Memory so a page fault occurs every B-1 step What if B == M? An adversery can choose a path that causes a page faultevery single step by choosing a path v,w,v,w. v w Techniques for designing efficient I/O Algorithms

  20. Blocking Lists Choose ß = 2 Array one 1 2 3 4 ... N Array two ..1 1..2 2..3 3..4 4..5 ... N.. In a second array we store a copy of the array with an offset B/2. This implies that the visited vertex v is at least B/2-1steps from the next page fault away since the page handler alternates between the two arrays every time a page fault occurs. Traversing a path of L implies that at most 2L/B page faults can occur. Techniques for designing efficient I/O Algorithms

  21. Blocking Trees To block trees we need some more restrictions. Consider a tree with internal degree M than at most M-1 of its neighbors can reside in memory. So an adversary could always choose the missing neighbor to cause a page fault. We require the degree of the node to be bound by a constant factor d. Techniques for designing efficient I/O Algorithms

  22. Blocking Trees r logdB logdB Create 2 partitions with layers of height logdB Partition 1: all vertices have distance (i-1)logdB... ilogdB Partition 2: all vertices have distance (i-1/2)logdB ... (i-1/2)logdB Techniques for designing efficient I/O Algorithms

  23. Blocking Trees Each layer contains at most B vertices and fits therefore into a single block. Small subtrees can be packed into blocks so that no block is less than half full. All in all we use at most 4N/B block. The paging algorithm now alternates between the two of thepossible partitions again. Now all vertices that can be reached from v without pagefault are at most logdB/2-1 steps away. Traversing a path of Length L causes at most 2L/logdB page faults. Techniques for designing efficient I/O Algorithms

  24. Blocking Trees Special cases: If we restrict the traversal only from the root away we onlyhave a storage blow-up of 2 and the number of page faults is reduced to L/logdB. After causing a page fault at least logdB-1 steps can be made without causing a page fault. If we only traverse the tree towards the root we only needO(N/B) blocks and cause at most O(L/B) page faults. Techniques for designing efficient I/O Algorithms

  25. Blocking Grids Blocking 2D Grids Sub grids of dimension √Bx√B are blocked Tessellation of 2 is not sufficient if M == B √B √B Techniques for designing efficient I/O Algorithms

  26. Blocking Grids If M == B then a storage blow-up of 3 is required to have a Page fault at most every √B/6 step Tessellations have an offset of √B/3 in x and y direction Paging algorithm brings appropriate grid into memory Techniques for designing efficient I/O Algorithms

  27. Blocking Grids If M >= 2B then a storage blow-up of 2 is sufficient to achieve a page fault every √B/4 Tessellations have an offset of √B/2 in x and y direction Techniques for designing efficient I/O Algorithms

  28. Blocking Grids If M >= 3B then a storage blow-up of 1 ensures at most 4L/ √B page faults where L is the length of the path Proof: within √B/2 steps can occur at most 2 page faults If v causes a page fault after visiting u we can reach all vertices in distance √B/2 of v without another page fault v u Techniques for designing efficient I/O Algorithms

  29. Planar Graph Blocking Graphs must have bounded degree. With constant storage blowup an upper bound of 4L/(logdB) page faults can be guaranteed for a path with length L. Where d is the maximal degree of a vertex. Techniques for designing efficient I/O Algorithms

  30. Planar Graph Blocking We divide the graph in regions and call vertices interior if they are connected only to vertices in the same region, vertices that are part of at least two regions are called boundary vertices. A B-Division is a covering of the Graph by O(N/B) regions. Each region has at most B vertices. Techniques for designing efficient I/O Algorithms

  31. Planar Graph Blocking The total number of boundary vertices is due to Frederikson O(N/√B). (Frederickson:Fast algorithms for shortest paths in planar graphs, with applications, SIAM 1987) For every planar graph G exists a Set S of O(N/ √B) boundary vertices so that no region is larger than B. Techniques for designing efficient I/O Algorithms

  32. Planar Graph Blocking We ensure every region is stored in a single block and if it is at least than half full we pack regions together into a block so we do not have blocks that are less than half full. This representation uses at most 2N/B blocks which is in O(N/B) In a next step we block the neighborhood of a boundary vertex v. Techniques for designing efficient I/O Algorithms

  33. Planar Graph Blocking We build the neighborhood of all vertices reachable from v in logdB/2 steps.Those would fit √B times into a single block since d^(logdB/2) = √B. We divide all boundary nodes in subsets of √B vertices and store them with their neighborhood in a block. So we get O(((N√B)/√B)/B) = O(N/B) therefore the whole storage blowup is O(N/B). Techniques for designing efficient I/O Algorithms

  34. Planar Graph Blocking Page faults: As long as the path remains in the same region we do not get any page faults. If the path leaves a region and therefore causes a page fault have to sit at a boundary vertex so if we bring the block containing the boundary vertex and its neighbor hood in to memory we can make logdB/2 steps before another page fault occurs. A path L can be traversed in O(L/logdB) I/O’s with a storage blowup of O(1) Techniques for designing efficient I/O Algorithms

More Related