Purely functional data structures for on line lca
This presentation is the property of its rightful owner.
Sponsored Links
1 / 43

Purely Functional Data Structures for On-line LCA PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Boston Haskell May 30 th 2012. Purely Functional Data Structures for On-line LCA. Edward Kmett. Overview. The Lowest Common Ancestor (LCA) Problem Tarjan’s Off-line LCA Off-line Tree-Like LCA Off-line Range-Min LCA Naïve On-line LCA Data Structures from Number Systems

Download Presentation

Purely Functional Data Structures for On-line LCA

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Purely functional data structures for on line lca

Boston Haskell

May 30th 2012

Purely Functional Data Structures for On-line LCA

Edward Kmett


Overview

Overview

  • The Lowest Common Ancestor (LCA) Problem

  • Tarjan’s Off-line LCA

  • Off-line Tree-Like LCA

  • Off-line Range-Min LCA

  • Naïve On-line LCA

  • Data Structures from Number Systems

  • Skew-Binary Random Access Lists

  • Skew-Binary On-line LCA


The lowest common ancestor problem

The Lowest Common Ancestor Problem

  • Given a tree, and two nodes in the tree, find the lowest entry in the tree that is an ancestor to both.


The lowest common ancestor problem1

The Lowest Common Ancestor Problem

  • Given a tree and two nodes in the tree, find the lowest entry in the tree that is an ancestor to both.

  • Applications:

    • Computing Dominators in Flow Graphs

    • Three-Way Merge Algorithms in Revision Control

    • Common Word Roots/Suffixes

    • Range-Min Query (RMQ) problems

    • Computing Distance in a Tree


The lowest common ancestor problem2

The Lowest Common Ancestor Problem

  • Given a tree and two nodes in the tree, find the lowest entry in the tree that is an ancestor to both.

  • First formalized by Aho, Hopcraft, and Ullman in 1973.

  • They provided ephemeral on-line and off-line versions of the problem in terms of two operations, with their off-line version of the algorithm requiring O(n log*(n)) and their online version requiring O(n log n) steps.

  • Research has largely focused on the off-line versions of this problem where you are given the entire tree a priori.


C ons link or grow

cons, link, or grow?

  • The original formulation of LCA was in terms of two operations link x y which grafts an unattached tree x on as a child of y, and lca x y which computes the lowest common ancestor of x and y.

  • Alternately, we can work with lcax y and cons a y, which returns a new extended version of the path y grown downward with the globally unique node ID a, and

  • We can replace cons a y with a monadic grow y, which tracks the variable supply internally. By using a concurrent variable supply like the one supplied by the concurrent-supply package enables you to grow the tree in parallel.


Tarjan s off line lca

Tarjan’s Off-line LCA

  • In 1979, Robert Tarjan found a way to compute a predeterminedset of distinct LCA queries at the same time given the complete tree by creatively using disjoint-set forests in O(nα(n)). (This is stronger condition than the usual offline problem statement.)

function TarjanOLCA(u)

MakeSet(u);

u.ancestor := u;

for each v in u.children do

TarjanOLCA(v);

Union(u,v);

Find(u).ancestor := u;

u.colour := black;

for each v such that {u,v} in P do

if v.colour == black

print "The LCA of “+u+" and “+v+" is " + Find(v).ancestor;


Tarjan s off line lca1

Tarjan’s Off-line LCA

  • In 1979, Robert Tarjanfound a way to compute a predeterminedset of distinct LCA queries at the same time given the complete tree by creatively using disjoint-set forests in O(nα(n)).

  • In 1983, Harold Gabow and Robert Tarjan improved the asymptotics of the preceding algorithm to O(n) by noting special-case opportunities not available in general purpose disjoint-set forest problems.


Tree like off l ine lca

Tree-Like Off-line LCA

  • In 1984, DovHarel and Robert Tarjan provided the first asymptotically optimal off-line solution, which converts the tree in O(n) into a structure that can be queried in O(1).

  • In 1988, Baruch Scheiber and Uzi Vishkin simplified that structure, by building arbitrary-fanout trees out of paths and binary trees, and providing fast indexing into each case.


Range min off line lca

Range-Min Off-line LCA

  • In 1993, Omer Berkman and Uzi Vishkin found another conversion with the same O(n) preprocessing using an Euler tour to convert the tree structure into a Range-Min structure, that can be queried in O(1) time.

  • This was improved in 2000 by Michael Bender and Martin Farach-Colton.

  • Alstrup, Gavoille, Kaplan and Rauhefocused on distributing this algorithm.

  • Fischer and Heun reduced the memory requirements, but also show logarithmically slower RMQ algorithms are often faster the common problem sizes of today!


Backup plans

Backup Plans


Na ve on line lca

Naïve On-line LCA

  • Build paths as lists of node IDs, using cons as you go.

    x = [5,4,3,2,1] :# 5

    y = [6,3,2,1] :# 4

  • To compute lca x y, first cut both lists to have the same length.

    x’ = [4,3,2,1], y’ = [6,3,2,1], len = 4

  • Then keep dropping elements from both until the IDs match.

    lcax y = [3,2,1] :# 3


Na ve on line lca1

Naïve On-line LCA

  • No preprocessing step.

  • O(h) LCA query time where h is the length of the path.

  • O(1) to extend a path.

  • No need to store the entire tree, just the paths you are currently using. This helps with distribution and parallelization.

  • As an on-line algorithm, the tree can grow without requiring costly recalculations.


Na ve on line lca2

Naïve On-line LCA

  • To go faster we’d need to extract a common suffix in sublinear time. Very Well…


Data structures from number systems

Data Structures from Number Systems

  • We are already familiar with at least one data structure derived from a number system.

    data Nat = Zero | Succ Nat

    data List a = Nil | Cons a (List a)

    O(1)succ grants us O(1)cons


Binary random access lists

Binary Random-Access Lists

  • We could construct a data structure from binary numbers as well, where you have a linked list of “flags” with 2n elements in them.

  • However, adding 1 to a binary number can affect all log n digits in the number, yielding O(log n) cons.


Skew binary numbers

Skew-Binary Numbers

  • The nth digit has value 2n+1-1, and each digit has a value of 0,1, or 2.

  • We only allow a single 2 in the number, which must be the first non-zero digit.

  • Every natural number can be uniquely represented by this scheme.

  • succ is an O(1) operation.

  • There are 2n+1-1 nodes in a complete tree of height n.


Skew binary random access lists

Skew-Binary Random Access Lists

  • We store a linked list of complete trees, where we are allowed to have two trees of the same size at the front of the list, but after that all trees are of strictly increasing height.

data Tree a = Tip a | Bin a (Tree a) (Tree a)

data Path a = Nil | Cons !Int !Int (Tree a) (Path a)

length :: Path a -> Int

length Nil = 0

length (Cons n _ _ _) = n

I call these random-access lists a Path here, because of our use case.


Skew binary on line lca

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca1

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca2

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca3

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca4

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca5

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Purely functional data structures for on line lca

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca6

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca7

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Purely functional data structures for on line lca

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.


Skew binary on line lca8

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.

-- O(1)

cons :: a -> Path a -> Path a

cons a (Cons n w t (Cons _ w' t2 ts))

| w == w' = Cons (n + 1) (2 * w + 1) (Bin a t t2) ts

cons a ts = Cons (length ts + 1) 1 (Tip a) ts


Skew binary on line lca9

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.

lca:: Eqa => Path a -> Path a -> Path a

lcaxsys = case compare nxsnys of

LT -> lca' xs (keep nxsys)

EQ -> lca' xsys

GT -> lca' (keep nysxs) ys

where

nxs = length xs

nys = length ys


Skew binary keep

Skew-Binary Keep

keep 2 (fromList [6,5,4,3,2,1])

  • O(log (h - k)) to keep the top kelements of path of height h


Skew binary keep1

Skew-Binary Keep

keep 2 (fromList [6,5,4,3,2,1])

=

keep 2 (fromList [3,2,1])

  • O(log (h - k)) to keep the top kelements of path of height h


Skew binary keep2

Skew-Binary Keep

keep 2 (fromList[6,5,4,3,2,1])

  • O(log (h - k)) to keep the top kelements of path of height h


Skew binary keep3

Skew-Binary Keep

keep :: Int -> Path a -> Path a

keep _ Nil = Nil

keep k xs@(Cons n w t ts)

| k >= n = xs

| otherwise = case compare k (n - w) of

GT -> keepT (k - n + w) w t ts

EQ -> ts

LT -> keep k ts

consT:: Int -> Treea -> Patha -> Patha

consT w t ts = Cons (w + lengthts) w t ts

keepT :: Int -> Int -> Treea -> Patha -> Patha

keepT n w (Bin _ l r) ts = casecompare n w2 of

LT -> keepT n w2 r ts

EQ -> consT w2 r ts

GT | n == w - 1 -> consT w2 l (consT w2 r ts)

| otherwise -> keepT (n - w2) w2 l (consT w2 r ts)

where w2 = div w 2

keepT _ _ _ ts = ts

  • O(log (h - k)) to keep the top kelements of path of height h


Skew binary on line lca10

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go.

  • To compute lca x y, first cut both lists to have the same length.

  • Then keep dropping elements until the IDs match.

lca:: Eqa => Path a -> Path a -> Path a

lcaxsys = case compare nxsnys of

LT -> lca' xs (keep nxsys)

EQ -> lca' xsys

GT -> lca' (keep nysxs) ys

where

nxs = length xs

nys = length ys


Comparing node ids

Comparing Node IDs

  • We can check to see if two paths have the same head or are both empty in O(1).

infix 4 ~=

(~=) :: Eqa => Path a -> Path a -> Bool

Nil ~= Nil = True

Cons _ _ s _ ~= Cons _ _ t _ = sameT s t

_ ~= _ = False

sameT :: Eqa => Tree a -> Tree a -> Bool

sameTxsys = root xs == root ys

root :: Tree a -> a

root (Tip a) = a

root (Bin a_ _) = a


Monotonicity

Monotonicity

  • We can modify the algorithm for keep into an algorithm that takes any monotone predicate that only transitions from False to True once during the walk up the path and yields a result in O(log h)

  • We have exactly one shape for a given number of elements, so we can walk the spine of the two random access lists at the same time in lock-step. This lets us, modify this algorithm to work with a pair of paths, because the shapes agree.

  • (~=) is monotone given using globally unique IDs.


Finding the match

Finding the Match

  • lca’requires the invariant that both paths have the same length. This is provided by the fact that lca, shown earlier, trims the lists first.

lca' :: Eqa => Path a -> Path a-> Path a

lca' h@(Cons _ w x xs) (Cons _ _ y ys)

| sameT x y = h

| xs ~= ys = lcaT w x y xs

| otherwise = lca' xsys

lca' _ _ = Nil

lcaT :: Eqa => Int -> Tree a -> Tree a -> Path a -> Path a

lcaT w (Bin _ la ra) (Bin _ lbrb) ts

| sameT la lb = consT w2 la (consT w2 rats)

| sameTrarb = lcaT w2 la lb (consT w rats)

| otherwise = lcaT w2 rarbts

where w2 = div w 2

lcaT _ _ _ ts = ts


Skew binary on line lca11

Skew-Binary On-line LCA

Naïve On-line LCA:

  • Build paths as lists of node IDs, using cons as you go. O(1)

  • To compute lca x y, first cut both lists to have the same length. O(h)

  • Then keep dropping elements until the IDs match. O(h)

    Skew-Binary On-line LCA:

  • Build paths as lists of node IDs, using cons as you go. O(1)

  • To compute lca x y, first cut both lists to have the same length. O(log h)

  • Then keep dropping elements until the IDs match. O(log h)


Skew binary on line lca12

Skew-Binary On-line LCA

  • No preprocessing step.

  • O(log h) LCA query time where h is the length of the path.

  • O(1) to extend a path.

  • No need to store the entire tree, just the paths you are currently using. This helps with distribution and parallelization when working on large trees.

  • As an on-line algorithm, the tree can grow without requiring costly recalculations.

  • Preserves all of the benefits of the naïve algorithm, while drastically reducing the costs.


Now what

Now What?

  • We found that skew-binary random access lists can be used to accelerate the naïve online LCA algorithm while retaining the desirable properties.

  • You can install a working version of this algorithm from hackage

    cabal install lca

  • Next time I’ll talk about the applications of this algorithm to a “revision control” monad which can be used for parallel and incremental computation in Haskell.

  • I am working with Daniel Peebles on a proof of correctness and asymptotic performance in Agda.


Any questions

Any Questions?


  • Login