1 / 45

Ideas on Treaps

Ideas on Treaps. Maverick Woo <pooh+@cmu.edu>. N,33. H,20. T,17. E,9. M,14. S,12. W,6. K,7. P,8. U,2. Z,4. Disclaimer. Articles of interest Raimund Seidel and Cecilia R. Aragon. Randomized search trees. Algorithmica 16 (1996), 464-497.

Download Presentation

Ideas on Treaps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ideas on Treaps Maverick Woo <pooh+@cmu.edu> N,33 H,20 T,17 E,9 M,14 S,12 W,6 K,7 P,8 U,2 Z,4

  2. Disclaimer • Articles of interest • Raimund Seidel and Cecilia R. Aragon.Randomized search trees. Algorithmica 16 (1996), 464-497. • Guy E. Blelloch and Margaret Reid-Miller.Fast Set Operations Using Treaps. In Proc. 10th Annual ACM SPAA, 1998. • Of course this is joint work with Guy. • Hopefully Daniel will also show up.

  3. Background • Very high level talk • No analysis • To make this a technical talk • i will insert a math symbol S • Some background • Splay Trees (zig, zig zig, zig zig zig…) • Treaps, if you still remember…

  4. Agenda • Data structure research overview • Treaps refresher • Some current issues on Treaps

  5. Data Structure Research • I am not qualified to say yet, but I do have some “feelings” about it. • Not that many high-level problems. • Representing a set/ordering • Support some operations • Some say it’s all about applications. • Applications don’t have to very specific. • But need to be specific enough---we can make assumptions.

  6. What Operations? • Basic • Insert, Membership • Intermediate • Delete (e.g. Binomial vs. Fibonacci Heaps) • Disjoint-Union (e.g. Union-Find) • Higher Level • Union, Intersection, Difference • Finger Search

  7. Behavior Restrictions • Persistence • “Functional” • More later… • Architecture Independence • Relatively new, a.k.a. “Cache-oblivious” • Runs efficiently on hierarchical memory • Avoid memory-specific parameterization • Forget data block size, cache line width etc. • Not my theme today

  8. Why Persistence? • Many reasons for persistence • It’s practical with good garbage collectors. • Functional programming makes everyone’s life easier. • For the theoretician • You don’t need to worry about side effects. • Better analysis possible: NESL • For the programmer • You don’t need to worry about side effects. • Less memory leak, less dangling pointers

  9. Real-life example 1 • You are have operations working on multiple-instances. • You index the web. • You build your indices with your cool data structures. • Conjunction query (AND) is intersection. • You do the intersection on two indices. • Now one of the indices can get corrupted.

  10. Real-life example 2 • You are rich. • Once upon a time, in a dot-com far away… • You run a multi-processor machine. • You learned that Splay Trees are cool. • You even learned how to write multi-threaded programs. • Thread1 searches for x on SplayInstance42. • Thread2 searches for y on SplayInstance42. • Real-world situation: search engines

  11. Data Structure vs. Hacking • Examples • To learn more about Splay Trees • Dial (412)-HACKERS. • Ask for Danny Sleator… • OK, real example • (Persistent) FIFO Queues • Operations • IsEmpty(Q), Enqueue(Q,x), Dequeue(Q) • Need to grow, let’s use Linked List…

  12. FIFO Queues • Linked List is “bad” though • Transverse to tail takes linear time. • Either Enqueue or Dequeue is going to be linear time. • How about doubly-ended queues (deques)? • With that much extra space, may be faster with a tree. • If one is not good enough, use two. • Suppose queue is x1x2…xiyi+1yi+2…yn. • Represent as [x1x2…xi],[yn,yn-1…yi+1]. • You can figure out the details yourself. • In the end, isn’t this just a hack?

  13. Agenda • Data structure research overview • Treaps refresher • Some current issues on Treaps

  14. Treaps Refresher • A Treap is a recursive data structure. • datatype 'a Treap = E | T of priority * 'a Treap * 'a * 'a Treap • Each node has a key and a priority. • Assume all unique • Arrange key in in-order, priority in heap-order • Priority is chosen uniformly at random. • 8-way independence suffices for the analysis • Can be computed with hash functions • Don’t need to store the priority • A key’s priority can be made consistent across runs

  15. Treap Operations • Membership • As in binary search trees • Insert • Add as leaf by key (in-order) • Rotate up by priority (heap-order) • Delete • Reverse what insert does • Find-min, etc. • Walk on the left spine, etc.

  16. Treap Split • Want top-down split (it’s faster) • (less, x, gtr) = Split(root, k) • If (root.k > k) // want to split left subtree • Let (l1, m, r1) = Split(root.left, k) • (l1, m, T(root.p, r1, root.k, root.right)) • If (root.k < k) // want to split right subtree • Let (l1, m, r1) = Split(root.right, k) • (T(root.p, root.left, root.k, l1), m, r1) • Else • (root.left, root.k, root.right)

  17. Before After less gtr N,33 N,33 W,6 H,20 T,17 H,20 T,17 Z,4 E,9 M,14 S,12 U,2 E,9 M,14 S,12 W,6 K,7 P,8 K,7 P,8 U,2 Z,4 Treap Split Example Split(Tr,“V”)

  18. Treap Split Persistence • These figures are deceptive. • Only 4 new nodes created • All on the search path to “V” less gtr N,33 N,33 W,6 H,20 T,17 H,20 T,17 Z,4 E,9 M,14 S,12 U,2 E,9 M,14 S,12 W,6 K,7 P,8 K,7 P,8 U,2 Z,4

  19. Treap Join • Join(less, gtr) // less < x < gtr • Handle empty less or gtr • If (less.p > gtr.p) • T(less.p, less.left, less.k, Join(less.right, gtr)) • Else • T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)

  20. After less gtr N,33 N,33 W,6 H,20 T,17 H,20 T,17 Z,4 E,9 M,14 S,12 U,2 E,9 M,14 S,12 W,6 K,7 P,8 K,7 P,8 U,2 Z,4 Treap Join Example Before Join(less,gtr)

  21. Treap Running Time • All expected O(lg n) • Also of note is Finger Search • Given a finger in a treap • Find the key that is d away in sorted order • Expected O(lg d) time • Require parent pointers • Evil… Waste so much space • See Seidel and Aragon for details.

  22. Treap Union • Treaps really shine in set operations. • Union(a,b) • Suppose roots are (k1,p1), (k2,p2) • WLOG assume p1 > p2. • Let (less,x,gtr) = Split(b,k1). • T(p1, Union(a.left, less), k1, Union(a.right, gtr))

  23. Treap Intersection • Inter(a,b) • Suppose roots are (k1,p1), (k2,p2); p1>p2 • Let (less,x,gtr) = Split(b,k1) • If x is null // k1 is not in b, sorry dude • Join(Inter(a.left, less), Inter(a.right, gtr)) • Else • T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))

  24. Treap Difference • Similar to intersection • Change the logic a bit • Messier because it is not symmetric • Leave as an exercise to the reader.

  25. Points of Note • Persistence • Did you see a side effect? (assignments?) • Parallelization • Parallelize without persistence is a pain. • Very natural divide-and-conqueror • Run the two recursive calls on different CPUs • Running times…

  26. Set Operation Running Time • For two sets of size m and n (m · n) • Optimal is • q(m lg (n/m)) • What’s known before this work • With AVL Trees, O(m lg(n/m)) • Rather complicated algorithms • For the sake of your smooth digestion… • Compare this to O(m+n) or O(m lg n) • With Treaps • Can use Finger Search if we have parent pointers • Does not parallelize---multiple fingers???

  27. Set Operation Running Time • What’s known after this work • No parent pointers • Parallelize naturally • Optimal expected running time • O(m lg (n/m)) • Analysis available in Blelloch and Miller • Relatively simple algorithm • Experimental results • 6.3-6.8 speedup on 8-processor SGI machine • 4.1-4.4 speedup on 5-processor Sun machine

  28. Agenda • Data structure research overview • Treaps refresher • Some current issues on Treaps

  29. A Word on Splay Trees • Splay Trees are slow in practice! • Even a single simple search would require O(lg n)pointer updates! • Skip Lists are way simpler and faster. • Let’s switch all Splay Trees to Skip Lists. • Danny???

  30. Bruce said… • First find Danny. • Ditch Splay Trees---say they are slow. • Then praise Skip Lists. • Danny will • refute by quoting experimental studies. • Splay Trees are not much slower than Skip List in practice. • ask who’s my advisor. • I wonder if that works. So I tried.

  31. Current Issues on Treaps • Treaps are simpler than Splay Trees • No famous conjecture for my back pocket • Neat idea from Adam Kalai • Not self-adjusting • Access introduces more explicit changes • Adding data compression to Treaps • Finger search on Treaps • Work by Guy + Daniel Blandford

  32. Adding Compression to Treaps • Search engines • Infrequent offline update (once a month) • Frequent online query and set operations • Keys are unique. • Keys can be huge and occurs sparsely. • Let’s compress the keys! • Assume they are 64-bit integers.

  33. We’ve got a problem! • I don’t know how to deploy data compression to general data structures. • Begin with the simplest---Array • The naïve approach • Compress the whole array • When need to access an element • decompress the whole array • do the access • compress the whole array again

  34. Isn’t that dumb? • Any suggestions? • Use chunking • Divide the array into blocks of size C. • Compress each block individually. • Now we are back to “constant” time! • Shh!!! That could be a trade secret. • Of course they use something better than vanilla array.

  35. Chunking a Treap • A sub-tree is a chunk. • Desire consistent chunk size • But Treaps are usually not full. • Need better chunking rules • Chunks • Can’t be too big---hurt running time • Can’t be too small---hurt compression (space)

  36. Vocab • Internal node and Leaf block • More precisely • datatype tblock = Packed of int * key * key * key vector | UnPacked of int * int * key vector datatype trearray = TE | TB of tblock | TN of trearray * key * trearray • All running time are in expected case.

  37. Idea 1 – Thresholds • Priority is in the range 1 to maxP • Invent a threshold Pth • e.g. maxP - log(maxP) • For n=(p,k) • If p > Pth, then n is an internal node. • Otherwise, n is in some leaf block. • Trick done when a key is inserted. • Also maintained by various operations.

  38. Idea 1 – Features • On average, constant ratio between internal keys to “keys in block”. • With Pth = maxP - log(maxP), N keys • log N internal nodes • Height is log log N. • O(log N) “bottom” node, each w/ a block • Expect (N-log N) / O(log N) keys / block • Binary search in block takes O(log N)

  39. Idea 1 – Running Time • Query is still O(log n). • Insert is also O(log n). • Join, Split both take O(log n). • Set operations rely on Join and Split’s O(log n) running time. • Looking good…

  40. Idea 1 – Problems • Asymptotic bound • Need to work out the constants • Exact analysis in progress • I now think of Knuth even higher… • SML implementation • Make the idea as concrete as code • Can now do more experiments

  41. Idea 1 – Questions • Do we really need to maintain consistent priority across runs? • Make things simpler • But Union looks suspicious • What compression algorithm to use? • No general data compression • Take advantage of index distribution

  42. Idea 2 – Small Blocks • Want a more-or-less constant block size • Small blocks are more realistic • Say 20 • Processor specific---fit cache line size • How well can we compress 20 integers? • Leave for second stage investigation

  43. Perhaps I can share • Writing down algorithm as code helps • Pseudo code are good for short algorithms • Real code is more concrete. • Good for sloppy people like me. • Actual SML code • You can figure out you missed some cases. • Now if SML has a debugger… • Space time tradeoff is very real

  44. Treap Finger Search • Daniel is working on it. • No parent pointers needed • Can mimic parent pointers by reversing root-to-(last accessed leaf) path • Should probably leave this to him

  45. Q&A / Suggestions • Work in progress, welcome suggestions • Danny, don’t kick me too hard…

More Related