Download Presentation

Loading in 3 Seconds

This presentation is the property of its rightful owner.

X

Sponsored Links

- 73 Views
- Uploaded on
- Presentation posted in: General

Recursion and Divide-and-Conquer Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Recursion and Divide-and-Conquer Algorithms

CS 331, Fall 2013

Tandy Warnow

- Recursion and Divide-and-Conquer are very similar (mainly differing in the first step)
- Sorting algorithms:
- Bubble-sort
- Merge-sort

- Input: list of n integers, A1, A2, …, An
- Output: sorted version, so that smallest entry is in the first position, and largest entry is in the last position.
- Algorithm:
- Make a left-to-right sweep, swapping adjacent elements if they are out of order. If nothing is out of order, return the list.
- After the first sweep, the largest element is in the last position – so subsequent sweeps are only applied to the first n-1 elements.
- In other words, after the first sweep, you recurse on a smaller list.

- Let T(n) denote the time to run bubble-sort on a list of n elements. Count comparisons and input/output operations as all being unit cost.
- T(1) = C (for some constant C)
- T(n) = T(n-1) + C’n (for some constant C’)
- Solving: T(n) <= CC’n2

Same input, same output

- Step 1: Split list into two roughly equal parts
- Step 2: Recursively merge each part
- Step 3: Merge the two lists together (taking smallest entry off the top of the two lists, until both lists are empty)

- Let T(n) denote the time to run MergeSort
- Assume n=2k
- T(1) = C
- T(n) = 2T(n/2) + C’n (for some constant C’), n>1
- Solving for T(n) we obtain
- T(n) = CC’nk
- i.e. T(n) is O(n log n)

- The main difference between these two algorithms is how free you are to pick the division into subproblems– both are recursive!
- Note that MergeSort could have been used on a division into 3 sets instead of 2, or into log n sets instead of 2, etc. – and the algorithm would still work (but with a different running time).
- However, BubbleSort, by design, recurses after you successfully move the largest element to the end of the array – not before.

- Analyze the running time of MergeSort when each division is into three sets of roughly equal size
- Analyze the running time of MergeSort when the division is into sqrt(n) sets of size sqrt(n)

- Input: set X of k rooted three-leaf trees (on set S of n leaves)
- Output: tree T on the entire set S that agrees with the set X – if it exists.
Example:

{((a,b),c) , ((b,c),d), ((d,e),f)), ((e,f),g)}

Solution: ((g,(f,(e,(d,(c,(a,b)))))))

- Input: set X of k rooted three-leaf trees (on set S of n leaves)
- Output: tree T on the entire set S that agrees with the set X – if it exists.
Example:

{((a,b),c) , ((b,c),d), ((d,e),f)), ((e,f),g)}

Solution: ((g,(f,(e,(d,(c,(a,b)))))))

- Input: set X of rooted three-leaf trees (on leafset S)
- Output: tree T on the entire set S that agrees with the set X – if it exists.
Example:

{((a,b),c) , ((b,c),d), ((d,e),f)), ((e,f),g) (a,(e,d))}

Solution: No tree exists

- Input: set X of rooted three-leaf trees (on leafset S)
- Output: tree T on the entire set S that agrees with the set X – if it exists.
Example:

{((a,b),c) , ((b,c),d), ((d,e),f)), ((e,f),g) (a,(e,d))}

Solution: No tree exists

- If X is compatible, then there is a tree T that agrees with X. Suppose there are j children of the root of T, and they have leaf sets A1, A2, …, Aj below them.
- Consider the 3-leaf trees in X. They fall into three possible types:
- Leaves from three different sets: (a,(b,c))
- Leaves from two different sets: (a,(b,b’)) or (a,(a’,b))
- Leaves from same set: (a,(a’,a’’))

- There are j children of the root of T, and they have leaf sets A1, A2, …, Aj below them.
- The 3-leaf trees in X:
- Leaves from three different sets (a,(b,c)) impossible
- Leaves from two different sets
- (a,(b,b’))
- (a,(a’,b)) – impossible

- Leaves from same set (a,(a’,a’’))

- There are j children of the root of T, and they have leaf sets A1, A2, …, Aj below them.
- The 3-leaf trees in X:
- Leaves from three different sets (a,(b,c)) impossible
- Leaves from two different sets
- (a,(b,b’)) - possible
- (a,(a’,b)) – impossible

- Leaves from same set (a,(a’,a’’)) - possible

- Definition: The bipartition of S into A and B is contradicted by X if some tree in X has form (a,(a’,b))
- Lemma 1: If set X is compatible, then there is a bipartition of S that is not contradicted by X.

- Lemma 1: If the set X is compatible, then there exists a bipartition of S into non-empty sets A and B, so that no triplet (u,(v,w)) in X has v in A and w in B.
- Proof: by contradiction.
- Suppose the set X is compatible and T is a rooted tree that agrees with every tree in X. Let u1, u2, …, uk be the children of the root of T, and let A1, A2, …, Ak be the leafsets below u1, u2,…, uk.
- Let A=A1and let B be the union of the remaining sets.
- Now suppose there is a triplet (u,(a,b)) in X with a in A and b in B. Then this triplet does not agree with the tree T, contradicting our hypothesis.

Definitions:

- The bipartition of S into A and B is contradicted by X if some tree has form (a,(a’,b))
- XA is the set of trees in X with all leaves in A
- XB is the set of trees in X with all leaves in B
Lemma 2: If set X is compatible and bipartition A|B is not contradicted by X, then the sets XA and XB are compatible.

Definitions:

- The bipartition of S into A and B is contradicted by X if some tree has form (a,(a’,b))
- XA is the set of trees in X with all leaves in A
- XB is the set of trees in X with all leaves in B
Lemma 3: If bipartition A|B is not contradicted by X and the sets XA and XB are both compatible, then set X is compatible.

- Given set X of k rooted three-leaf trees on leafset S, find a bipartition (if one exists) that is not contradicted by X.
- If no such bipartition exists, then return “Fail”.
- Else let A|B be the bipartition that is not contradicted by X.
- Recurse on A, constructing rooted binary tree TA
- Recurse on B, constructing rooted binary tree TB
- If either fails, then return “Fail”. Else, return rooted binary tree T with trees TA and TB as children.

- Given set X of k 3-leaf rooted trees, how can we find a bipartition A|B that is not contradicted by any tree in X? (Or determine that no such bipartition exists?)
- Consider a 3-leaf tree, (u,(v,w)). Note that v and w must be in the same subtree off the root of any tree T that is compatible with X. Hence v and w must be in the same piece of the bipartition (both in A or both in B).

- Algorithm:
- Make a graph G=(V,E) with one vertex for every element of S, and an edge (s,s’) for every 3-leaf tree (u,(s,s’)) in X.
- If the graph is connected, then there is no tree T that is compatible with X (because there is no bipartition that is not contradicted by X).
- If the graph is not connected, then let A be one component of the graph, and S-A the other component.

- Divide step:
- Constructing graph: O(k)
- Determining if the graph is connected: O(k+n)

- After division, the subsets partition the taxa and also the triplets. The worst case time is where one subset has a single leaf, and the other subset has n-1 leaves.
- Combining the trees together takes O(1) time.
- Hence the recurrence relation for the running time is:T(n) <= O(k) + T(n-1). Solving for T(n) we obtain T(n) is O(kn).

- Developed by Aho, Sagiv, Szymanski and Ullman in 1981 for a problem related to relational databases.
- Has also been used for other problems
- Most commonly used now in phylogenetic inference!

- Input: Set Y of unrooted quartet trees, each with leaves from set S.
- Output: Tree T (if it exists) on leafset S that agrees with every tree in Y.
- NP-complete!!!

- Special cases that can be solved in polynomial time:
- All trees have leaf x:
- Root all the trees at x, and apply algorithm for testing compatibility of rooted three-leaf trees. Check if result is compatible with remaining three-leaf trees.
- O(kn)

- Set Y has a tree on every four leaf subset of S

- All trees have leaf x:

- Special cases that can be solved in polynomial time:
- All trees have leaf x:
- Root all the trees at x, and apply algorithm for testing compatibility of rooted three-leaf trees. Check if result is compatible with remaining three-leaf trees.
- O(kn)

- Set Y has a tree on every four leaf subset of S

- All trees have leaf x:

- Input: set Y containing a tree on every four leaves in S
- Output: tree T that agrees with all the trees in Y, if it exists; otherwise “Fail”
- This can be solved in O(n5) time using the Aho, Sagiv, Szymanski and Ullman algorithm. However, there is another way, too.

- Find a sibling pair a,b (that is a pair that is not contradicted by any quartet tree in Y)
- Remove a, and recurse on quartet trees that do not contain leaf a.
- If no tree is found, return “Fail”.
- Otherwise, let T be the tree that is returned, and add leaf a to T, by making it a sibling of b.

- How to find siblings?
- What is the running time?

- Let a and b be elements of S.
- Then a and b cannot be siblings if any quartet tree splits them: ab|cd.
- If no quartet tree splits them, then you can consider them siblings!

- Finding a sibling pair takes O(k) = O(n4) time
- Hence the algorithm takes O(n5) time, too.

- Implement one of the two methods for Quartet Tree Compatibility, assuming input has a tree on every quartet.
- Make sure that the code returns a tree that agrees with all the input quartet trees, if they are compatible.

- Input: set Y of quartet trees and a tree T
- Output: YES if all trees in Y are compatible with T, and NO otherwise.
- Obviously polynomial – O(kn) time will suffice (k=|Y| and n is the number of leaves in T)
- Can we do this faster?

- Preprocessing step so that each subsequent query (“Is quartet tree t compatible with T?”) can be answered in constant time.
- If preprocessing takes p(n) time, then total time is O(p(n) + k)
- How do we do the preprocessing?

- Least Common Ancestor (LCA)
- Input: Leaves x and y, tree T
- LCA(x,y) is the node v that is a common ancestor of both x and y, and furthest from the root of T

- Algorithm to compute LCA
- Start at x and write down sequence of nodes on path from x to root
- Do the same for y
- Compare the paths – first node in common on both paths is the LCA of x and y
- O(n) time

- Input: tree T
- Output: LCA matrix -- LCA[x,y] is the LCA of leaves x and y
Trivial to do in O(n3) time

Also easy to do this in O(n2) time.

- Start at leaves of tree T, and visit nodes of T only after visiting the children
- Let L(v)={v} if v is a leaf in T
- For each node v in T with children x and y
- For all a in L(x) and b in L(y), set LCA(a,b):=v
- Let L(v) be L(x) union L(y)

- The LCA of each pair a, b of leaves is set only once, and so overall contributes only O(n2) time.
- The other expense of visiting a node v is constant time per node.
- Hence, the total cost is O(n2).

- Back to the problem – determining if a quartet tree t on leafset {a,b,c,d} is compatible with an unrooted tree T.
- Solution: root the tree T, construct the tree on {a,b,c,d} induced by T, and compare it to t.
- Can we use the LCA matrix to make this efficient?
- Idea: include all nodes in T (not just leaves) for the LCA matrix.

- Visiting a node v (after visiting its children x and y) costs
- O(1) to compute L(v)
- O(|L(x)| |L(y)|) time to set LCA(a,b) = v for all a in L(x) and b in L(y)
- So, naively, O(n2) to visit node v

- This would suggest an O(n3) time.
- However, an amortized analysis gives a better bound.

- Start at leaves of tree T, and visit nodes of T only after visiting the children
- Let L(v)={v} if v is a leaf in T
- For each node v in T with children x and y
- For all a in L(x) and b in L(y), set LCA(a,b):=v
- Set L(v) = L(x) union L(y) union {v}

- Note that now the LCA matrix has 2n-1 rows and columns, and we can find LCA of any two vertices – not just the leaves.

- Do the O(n2) time preprocessing to compute the LCA matrix (for all pairs of vertices, not just leaves)
- Then given {a,b,c,d} (leaves in the tree), compute all LCAs. The result produces 3 distinct LCAs, and at least one pair (a,b, without loss of generality) whose LCA is not obtained by any other pair. The unrooted tree induced on this set of 4 leaves is then ab|cd.
- To find the rooted version of the induced tree takes only a little bit more analysis.

- Given unrooted tree T and set Y of weighted quartet trees, to compute the total weight of the compatible quartet trees of Y, DO:
- Root T arbitrarily (on some edge), and compute the LCA matrix (for all nodes in T’, the rooted version of T).
- Initialize compat-weight = 0.
- For every quartet tree t in Y, compute the induced quartet tree in T, and check if it agrees with t (as unrooted trees). If so, add weight(t) to compat-weight.
- Return compat-weight.
Running time: O(n2 + k), where k is the number of quartet trees.

- There are better LCA query algorithms! Less preprocessing time (linear instead of quadratic) and still constant time for the queries. These use more sophisticated techniques – and we’ll discuss them later.
- Hence the running time of this algorithm could be reduced to O(n+k).
- Phylogenetic estimation from quartet trees is pretty popular. Unfortunately, it’s just about never the case that the quartet trees are compatible! And finding the maximum weight subset of compatible quartets is NP-hard (even if all quartets have unit weight).

- Programming: Given tree T and set Y of weighted quartet trees, compute the total weight of the quartet trees in Y that are compatible with T.
- Theory: Show how to calculate the rooted subtree induced by T on {a,b,c} (for any three nodes a,b,c – not just leaves – in the tree).