1 / 24

An Approximation Algorithm for Binary Searching in Trees

An Approximation Algorithm for Binary Searching in Trees. Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio). Searching in sorted lists . Sorted list of numbers Marked number m Find the marked number using queries ‘ x ≤ m? ’. 10. 6. 14. 3. . 5.

phiala
Download Presentation

An Approximation Algorithm for Binary Searching in Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)

  2. Searching in sorted lists • Sorted list of numbers • Marked number m • Find the marked number using queries ‘x ≤ m?’ 10 6 14 3 ... 5

  3. 10 6 14 3 ... 5 DT 10 ≤ > 10 5 14 ≤ > > ≤ 5 3 6 14 ≤ > ≤ > 6 3 5 6 6 10 Searching in sorted lists • Search strategy: procedure that indicates which number should be queried next • Can be represented by a decision tree (DT) • # queries to find m = path length

  4. 10 6 14 3 ... 5 Searching in sorted lists • We are given the probability of each number being the marked one • Expected number of queries of a strategy = expected path length of the corresponding decision tree • Efficient strategy is one with minimum expected path 0,05 0,1 0,5 0.2 ... 0,1 10 ≤ > 5 14 ≤ > > ≤ 3 6 14 ≤ > ≤ > 0,5 3 5 6 10 0,05 0,1 0,2 0,1

  5. Searching in trees • Tree with exactly one marked node m • We can query an arc and find out which endpoint is closer to the marked node

  6. a DT (c,d) ~c ~d c b (a,b) (f,h) ~h ~f ~a ~b d (d,f) b ~f ~d f h Searching in trees • Search strategy: procedure that indicates which arc should be queried next • Can be represented by a decision tree f

  7. a c b d h Searching in trees • Search strategy: procedure that indicates which arc should be queried next • Can be represented by a decision tree • # queries to find m = path length DT (c,d) (c,d) ~c ~d (a,b) (f,h) (f,h) ~h ~f ~a ~b (d,f) (d,f) b ~f ~d f f f

  8. Searching in trees • We are given the probability of each node being the marked one • Expected number of queries is the expected path length of the corresponding decision tree • The goal is to find a DT with minimum expected path a (c,d) ~c ~d b c (a,b) (f,h) .1 .2 ~h ~f ~a ~b .1 d (d,f) b ~d ~f .2 .3 f f .3 h

  9. Searching in trees • Def: Given a tree T and weights w, compute a decision tree for searching in T with minimum expected path from root to leaves w.r.t. w • Motivation • Generalizes searches in totally ordered structures to (one type of) partially ordered structures • Application to software testing and filesystem synchronization

  10. Related work • Searching in sorted lists • Worst-case • Binary search is optimal • Average-case • Knuth [Acta Informatica 71]: O(n2) • de Prisco, de Santis [IPL 93]: good approximation in linear time

  11. Related work • Searching in trees • Worst-case • Ben-Asher et al. [SIAM J. Comput. 99]: O(n4 log3 n) • Onak, Parys [FOCS 06]: O(n3) • Mozes et al. [SODA 08]: O(n) • Average-case • Kosaraju et al. [WADS 99]: O(logn)-approximation

  12. Related work • Searching in posets • Worst-case • Arkin et al. [Int. J. Comput. Geometry Appl. 98]: O(log n)-approximation • Carmo et al. [TCS 04] • Finding optimal strategy is NP-Hard • Constant-factor approximation for random posets • Average-case • Kosaraju et al. [WADS 99]: O(logn)-approximation

  13. Our results • First constant-factor approximation for searching in trees (average-case metric) • Linear running time

  14. Overview • We know how to search in sorted lists with probabilities • Searching in paths = searching in ordered lists

  15. Overview • Search strategy

  16. Algorithm • Find a (heavy) path • Compute a decision tree for this path • Append decision trees for querying the hanging arcs • Recursively find strategies for the hanging subtrees and append them

  17. subtrees Tij input tree T Analysis • T – input tree • w(u) – likelihood of node u being the marked one • w(T’) = ∑u є T’ w(u) • Tij – Hanging subtrees of T • Cost of a decision tree – expected path length

  18. entropy of {w(u)} Analysis – upper bound ALGO(T) = expected path of the computed DT = cost(■) + cost(■) + cost(■) ≤H + w(T) + ∑i,j j w(Tij) + ∑i,jALGO(Tij) decision tree input tree T

  19. LB1: LB2: only when H is large for all H, ALGO(T) ≤α OPT(T) Analysis – lower bounds UB: • When H >> w(T) • UB andLB1 • When H ≤ w(T) • UB and (LB1 + LB2)

  20. These paths cost ≥ Analysis – entropy lower bound • OPT(T) = from root to (■) + from (■) to (■) + from (■) to leaves • from root to (■): using Shannon’s lossless coding theorem, we can lower bound byH / log 3 – w(T) • from (■) to (■): • There are at most 2 purple nodes per level • from (■) to leaves: • Every query to arcs in the trees Tij are descendants of purple nodes • Costs at least as much as searching inside the trees Tij, namely ∑i,jOPT(Tij) D*

  21. Analysis – alternative lower bound • OPT(T) ≥ from root to (■) + from (■) to leaves • from root to (■): • Costs = ∑i,j distance to i-th purple node . w(Tij) • At most one purple node can have distance 0 • w(Tij) ≤ w(T)/2 • Costs at least w(T)/2 • from (■) to leaves: • Costs at least as much as searching inside the trees Tij, namely ∑i,jOPT(Tij) D*

  22. Efficient implementation • Most steps take linear time • In order to find a good strategy, the algorithm uses sorting of weights • Use linear time approximate sorting • The algorithm can be implemented in linear time

  23. Conclusions • First constant-factor approximation for searching in trees (average-case) • Linear running time • Open questions • Is searching in trees polynomially solvable? • Improved approximations for more general posets

  24. Thank you!

More Related