Download Presentation
## Random Partition via Shifting

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Random Partition via Shifting**Seminar on Geometric Approximation Algorithms Speaker – Dina Bilchinsky TAU**Outline of the lecture**• Definitions. • Applications: • Covering by Disks. • Shifting Quadtrees. • Hierarchical Representation of a Point Set: • Low Quality Approximation by HST. • Fast & Dirty HST in High Dimensions. • Low Quality ANN Search.**Outline of the Lecture**• In this lecture we investigate a simple technique for partitioning a geometric domain. • This idea can be extended to shifting multi resolution grid over . • This yields some simple algorithms for Clustering and Nearest Neighbor search.**Shifted Partition of the Real Line**• Δ>0 – real number. • b – uniformly distributed number in [0,Δ]. • This induces a natural partition of the real line into intervals, by the function • Each interval has size Δ. • The origin is shifted to the right by an amount b.**Shifted Partition of the Real Line**• induce same partition , but not same function. • b can be picked uniformly from any interval of length multiple of Δ. • Specifically from**Lemma 11.2**For any Proof: Assume , define 0 Δ - r Δ**Shifted Partition of Space**• - point in . • b = , randomly and uniformly chosen from hypercube . • - grid with origin b and side length ∆. • For a point the ID of the grid cell containing it**Lemma 11.3**• - randomly shifted grid. • B – ball of radius or an axis parallel hypercube with side length .**Lemma 11.3 - Proof**• 2r > Δ the probability is 1. • 2r < Δ: • Project B into it’s coordinate. • It becomes interval of length 2r. • becomes one dimensional shifted grid . • B is contained in a single cell of is contained in a single cell of**Applications: Covering by Disks**• Given a set P of n points in the plane we would like to cover them by a minimal number of unit disks. • Canonical Disks • Type A – The boundary circle contains two points of P. • Type B – The top point of this circle is a point of P. • Any set of points can be covered by Canonical Disks only.**Number of Canonical Disks**• Every pair of input points determines two possible disks. • If a pair of input points is at distance larger than 2, than the Canonical Disk they define is invalid. • Therefore, there are such Canonical Disks. • We assume the cover uses only such disks.**Disk Cover**• Disk Cover Verification • Given k disks, we can verify the cover in • Lemma 11.4 • Given a set P of n points in the plane, we can compute in time, a cover of P by at most k unit disks, if such a cover exists. • For every point , check if it contained in one of the disks.**Lemma 11.4 - Proof**• Use the Verification Algorithm , trying all covers of size . • The Algorithm returns the first cover found. • Running time – Dominant by the last iteration. • There are different covers to consider. • Each verification . • Thus total running time .**Disk Cover**• The problem with this algorithm is that k might be quite large, say n/4. • Fortunately, the shifting grid saves the day. Theorem 11.5 • P – set of n points in the plane. • > 0 is a parameter. • We can compute using randomized algorithm in time, a cover of P , by unit disks .**Theorem 11.5 - Proof**• Choose and consider a randomly shifted grid . • Compute used cells ,grid cells that contain points of P, by computing for each point its and sort it in hash table. • - points of P falling into grid cell . • Each can be covered by unit disks. ∆ ∆**Theorem 11.5 - proof**• For each ,compute the minimum number of unit disks required to cover . ( Bounded by ) • By Lemma 11.4 we can compute it in . • There are at most used cells. • Thus the total running time is .**Theorem 11.5 - proof**• Overall Cover : merge together the covers of each grid cell .**Proof – Bounding Expectation**• optimal solution. • We will generate a feasible solution from • is one of the possible solutions considered by the algorithm. • - set of disks of the optimal solution that intersect . • Consider the multi-set • The algorithm returns for each grid cell minimal cover , that is of a size at most . * it returns smallest possible cover**Proof – Bounding Expectation**• The cover returned by the algorithm is of a size at most . • Disk of the optimal solution can appear in at most 4 times (can intersect at most 4 cells of the grid) • Disk will appear in more than once it is not fully contained in a greed cell of . • By Lemma 11.3 the probability for that is bounded by**Proof – Bounding Expectation**• The running time can be improved to**Shifting One Dimensional Quadtree**• P – set of n points contained in the interval . • Randomly and uniformly choose a number . • - one dimensional quadtree of using the interval for the root cell.**Bit Index**Let be two real numbers. Assume these numbers in base 2, are written as is the index of the first bit after the period in which they differ. Reminder: A node in the quadtree that corresponds to an interval of length has level.**Shifting One Dimensional Quadtree**• For let • This is the last level of the shifted grid that contains in the same interval. • This is the level of the node of that is the least common ancestor containing both numbers. • That is the level of is**Example – Without Shifting**• We assume that can computed in a constant time. • The value of depends only on and , but independent of the other points of P .**Lemma 11.7**• Let be two numbers , and consider a random number . For any we have • This Lemma bounds the probability that the of two numbers in charge of the interval, is considerably longer than the difference between them.**Lemma 11.7 - proof**• Let . • Consider shifted partition of the real line by intervals of length and a shift . • Assume ,then the highest level such that both lie in the same shifted interval is . • As such, going one level down, these numbers are in different intervals.**Lemma 11.7 - proof**• By Lemma 11.2**Corollary 11.8**• Let be two numbers , and consider a random . For any parameter , we have**Higher Dimensions Quadtrees**• P – Set if n points in . • point uniformly and randomly picked. • - shifted and compressed quadtree of with as the root cell. • two points. • one dimensional quadtrees built on each of the coordinates of the point set.**Higher Dimensions Quadtrees**• is a combination of these quadtrees. • The level where p and q get separated in is the first level in any quadtrees. • The level of is • The value of is independent of the other point points of . • is a well behaved random variable.**Lemma 11.9**• For any two fix points : • For any integer :**Hierarchical**Representation of a Point Set**Metric Space**• We will carry out the discussion in a more general setting than low dimensional Euclidean space. We will use the notion of metric space. • Definition:A metric space is a pair where is a set and is a metric.satisfying the following axioms: • . • . • .**Hierarchically Separated Tree**• Definition : – Set of elements • – Tree having the elements of as leaves. The tree defines Hierarchically Separated Tree (HST) over the points of , if for each vertex there is associated a label , such that: • is a leaf of . • is a child of .**HST**• The distance between two leaves is defined as . • BHST – every internal node of H has exactly two children. • It is easily to verify that the distances defined by HST defines a metric. • The metric defined has a very simple structure, and can be easily manipulated algorithmically.**Example - HST**• - point set in . • - compressed quadtree storing . • For each node : • We will work with BHST’s, since any HST can be converted into a binary HST in linear time, retaining the underlying distances. • For every vertex ,we will associate arbitrary representative point - point stored in the sub-tree rooted at . • We require**Metric Space t-approximate**• Definition: A metric space is said to t-approximate the metric , if they are defined over the same set of points and for any Any -point metric is by some HST.**Lemma 11.13**Given a weighted connected graph G on vertices and edges, it is possible to construct in time, BHST that the shortest path metric of G. – distance of the shortest path between vertices in weighted graph .**Lemma 11.13 - Proof**• Compute MST of G in . • The HST is built bottom up: • Sort edges of in non-decreasing order, and add them to the graph one by one, starting with an empty graph on . • At each stage , we have a collection of HST’s, each corresponding to a connected component of the current graph. • Each added edge merge two connected components. • We merge two corresponding HST’s into a single HST , by adding a new common root , and labeling it with , - set of points stored in this sub-tree with as its root.**Proof - Approx. factor**• two vertices of . • is the first edge added such that and are in the same component , created by merging two connected components . • is the lightest edge in the cut , between and . As such, any path between and in must contain an edge of weight at least . • is the heaviest edge in (added last ). • .**Spanners**• - distance of the shortest path between vertices in weighted graph • A t-spanner of a set of points in is a weighted graph whose vertices are points of and for any : • is a metric. • We can compute a of with ) edges in ).**Corollary 11.14**For a set of points in a metric space , we can compute in a HST , that the metric .**Corollary 11.15**- set of points in . We can construct in time ( the constant in the depends exponentially on the dimension) a BHTS , that the distances of points in .**Corollary 11.15 - Proof**• In we can compute 2-spanner for in size , in time. • Let be this spanner. • Apply Lemma 11.13 on . • resulting HST metric : For any • 2-spanner**Fast&Dirty HST in High Dimension**• The above construction of HST has exponential dependency on the dimension. • Next we will show how we can get an approximate HST of low quality , but in polynomial time in the dimension.**Lemma 11.16**set of points in . is picked uniformly and randomly. τ shifted compressed quadtree of having as a root cell. For any ,with probability , we have for all and for all pair of points of it holds . • This implies that τ is HST for .**Lemma 11.16- Proof**• Consider and coordinate . • By corollary 11.8 • . • There are coordinates, and possible pairs. • By union bound .**Lemma 11.16 – Proof(2)**• With probabilitythe level of of is at most The diameter of a cell at level is at most . τ is HST.**Claim 11.17**• Verifying quickly that τ is acceptable HST, as far as the quality of the approximation goes , is quite challenging in general. • Claim 11.17: We can check that Eq.11.2 holds for a quadtree computed by the algorithm of Lemma 11.16 in time.