Random Partition via Shifting Seminar on Geometric Approximation Algorithms Speaker – Dina Bilchinsky TAU
Outline of the lecture • Definitions. • Applications: • Covering by Disks. • Shifting Quadtrees. • Hierarchical Representation of a Point Set: • Low Quality Approximation by HST. • Fast & Dirty HST in High Dimensions. • Low Quality ANN Search.
Outline of the Lecture • In this lecture we investigate a simple technique for partitioning a geometric domain. • This idea can be extended to shifting multi resolution grid over . • This yields some simple algorithms for Clustering and Nearest Neighbor search.
Shifted Partition of the Real Line • Δ>0 – real number. • b – uniformly distributed number in [0,Δ]. • This induces a natural partition of the real line into intervals, by the function • Each interval has size Δ. • The origin is shifted to the right by an amount b.
Shifted Partition of the Real Line • induce same partition , but not same function. • b can be picked uniformly from any interval of length multiple of Δ. • Specifically from
Lemma 11.2 For any Proof: Assume , define 0 Δ - r Δ
Shifted Partition of Space • - point in . • b = , randomly and uniformly chosen from hypercube . • - grid with origin b and side length ∆. • For a point the ID of the grid cell containing it
Lemma 11.3 • - randomly shifted grid. • B – ball of radius or an axis parallel hypercube with side length .
Lemma 11.3 - Proof • 2r > Δ the probability is 1. • 2r < Δ: • Project B into it’s coordinate. • It becomes interval of length 2r. • becomes one dimensional shifted grid . • B is contained in a single cell of is contained in a single cell of
Applications: Covering by Disks • Given a set P of n points in the plane we would like to cover them by a minimal number of unit disks. • Canonical Disks • Type A – The boundary circle contains two points of P. • Type B – The top point of this circle is a point of P. • Any set of points can be covered by Canonical Disks only.
Number of Canonical Disks • Every pair of input points determines two possible disks. • If a pair of input points is at distance larger than 2, than the Canonical Disk they define is invalid. • Therefore, there are such Canonical Disks. • We assume the cover uses only such disks.
Disk Cover • Disk Cover Verification • Given k disks, we can verify the cover in • Lemma 11.4 • Given a set P of n points in the plane, we can compute in time, a cover of P by at most k unit disks, if such a cover exists. • For every point , check if it contained in one of the disks.
Lemma 11.4 - Proof • Use the Verification Algorithm , trying all covers of size . • The Algorithm returns the first cover found. • Running time – Dominant by the last iteration. • There are different covers to consider. • Each verification . • Thus total running time .
Disk Cover • The problem with this algorithm is that k might be quite large, say n/4. • Fortunately, the shifting grid saves the day. Theorem 11.5 • P – set of n points in the plane. • > 0 is a parameter. • We can compute using randomized algorithm in time, a cover of P , by unit disks .
Theorem 11.5 - Proof • Choose and consider a randomly shifted grid . • Compute used cells ,grid cells that contain points of P, by computing for each point its and sort it in hash table. • - points of P falling into grid cell . • Each can be covered by unit disks. ∆ ∆
Theorem 11.5 - proof • For each ,compute the minimum number of unit disks required to cover . ( Bounded by ) • By Lemma 11.4 we can compute it in . • There are at most used cells. • Thus the total running time is .
Theorem 11.5 - proof • Overall Cover : merge together the covers of each grid cell .
Proof – Bounding Expectation • optimal solution. • We will generate a feasible solution from • is one of the possible solutions considered by the algorithm. • - set of disks of the optimal solution that intersect . • Consider the multi-set • The algorithm returns for each grid cell minimal cover , that is of a size at most . * it returns smallest possible cover
Proof – Bounding Expectation • The cover returned by the algorithm is of a size at most . • Disk of the optimal solution can appear in at most 4 times (can intersect at most 4 cells of the grid) • Disk will appear in more than once it is not fully contained in a greed cell of . • By Lemma 11.3 the probability for that is bounded by
Proof – Bounding Expectation • The running time can be improved to
Shifting One Dimensional Quadtree • P – set of n points contained in the interval . • Randomly and uniformly choose a number . • - one dimensional quadtree of using the interval for the root cell.
Bit Index Let be two real numbers. Assume these numbers in base 2, are written as is the index of the first bit after the period in which they differ. Reminder: A node in the quadtree that corresponds to an interval of length has level.
Shifting One Dimensional Quadtree • For let • This is the last level of the shifted grid that contains in the same interval. • This is the level of the node of that is the least common ancestor containing both numbers. • That is the level of is
Example – Without Shifting • We assume that can computed in a constant time. • The value of depends only on and , but independent of the other points of P .
Lemma 11.7 • Let be two numbers , and consider a random number . For any we have • This Lemma bounds the probability that the of two numbers in charge of the interval, is considerably longer than the difference between them.
Lemma 11.7 - proof • Let . • Consider shifted partition of the real line by intervals of length and a shift . • Assume ,then the highest level such that both lie in the same shifted interval is . • As such, going one level down, these numbers are in different intervals.
Lemma 11.7 - proof • By Lemma 11.2
Corollary 11.8 • Let be two numbers , and consider a random . For any parameter , we have
Higher Dimensions Quadtrees • P – Set if n points in . • point uniformly and randomly picked. • - shifted and compressed quadtree of with as the root cell. • two points. • one dimensional quadtrees built on each of the coordinates of the point set.
Higher Dimensions Quadtrees • is a combination of these quadtrees. • The level where p and q get separated in is the first level in any quadtrees. • The level of is • The value of is independent of the other point points of . • is a well behaved random variable.
Lemma 11.9 • For any two fix points : • For any integer :
Hierarchical Representation of a Point Set
Metric Space • We will carry out the discussion in a more general setting than low dimensional Euclidean space. We will use the notion of metric space. • Definition:A metric space is a pair where is a set and is a metric.satisfying the following axioms: • . • . • .
Hierarchically Separated Tree • Definition : – Set of elements • – Tree having the elements of as leaves. The tree defines Hierarchically Separated Tree (HST) over the points of , if for each vertex there is associated a label , such that: • is a leaf of . • is a child of .
HST • The distance between two leaves is defined as . • BHST – every internal node of H has exactly two children. • It is easily to verify that the distances defined by HST defines a metric. • The metric defined has a very simple structure, and can be easily manipulated algorithmically.
Example - HST • - point set in . • - compressed quadtree storing . • For each node : • We will work with BHST’s, since any HST can be converted into a binary HST in linear time, retaining the underlying distances. • For every vertex ,we will associate arbitrary representative point - point stored in the sub-tree rooted at . • We require
Metric Space t-approximate • Definition: A metric space is said to t-approximate the metric , if they are defined over the same set of points and for any Any -point metric is by some HST.
Lemma 11.13 Given a weighted connected graph G on vertices and edges, it is possible to construct in time, BHST that the shortest path metric of G. – distance of the shortest path between vertices in weighted graph .
Lemma 11.13 - Proof • Compute MST of G in . • The HST is built bottom up: • Sort edges of in non-decreasing order, and add them to the graph one by one, starting with an empty graph on . • At each stage , we have a collection of HST’s, each corresponding to a connected component of the current graph. • Each added edge merge two connected components. • We merge two corresponding HST’s into a single HST , by adding a new common root , and labeling it with , - set of points stored in this sub-tree with as its root.
Proof - Approx. factor • two vertices of . • is the first edge added such that and are in the same component , created by merging two connected components . • is the lightest edge in the cut , between and . As such, any path between and in must contain an edge of weight at least . • is the heaviest edge in (added last ). • .
Spanners • - distance of the shortest path between vertices in weighted graph • A t-spanner of a set of points in is a weighted graph whose vertices are points of and for any : • is a metric. • We can compute a of with ) edges in ).
Corollary 11.14 For a set of points in a metric space , we can compute in a HST , that the metric .
Corollary 11.15 - set of points in . We can construct in time ( the constant in the depends exponentially on the dimension) a BHTS , that the distances of points in .
Corollary 11.15 - Proof • In we can compute 2-spanner for in size , in time. • Let be this spanner. • Apply Lemma 11.13 on . • resulting HST metric : For any • 2-spanner
Fast&Dirty HST in High Dimension • The above construction of HST has exponential dependency on the dimension. • Next we will show how we can get an approximate HST of low quality , but in polynomial time in the dimension.
Lemma 11.16 set of points in . is picked uniformly and randomly. τ shifted compressed quadtree of having as a root cell. For any ,with probability , we have for all and for all pair of points of it holds . • This implies that τ is HST for .
Lemma 11.16- Proof • Consider and coordinate . • By corollary 11.8 • . • There are coordinates, and possible pairs. • By union bound .
Lemma 11.16 – Proof(2) • With probabilitythe level of of is at most The diameter of a cell at level is at most . τ is HST.
Claim 11.17 • Verifying quickly that τ is acceptable HST, as far as the quality of the approximation goes , is quite challenging in general. • Claim 11.17: We can check that Eq.11.2 holds for a quadtree computed by the algorithm of Lemma 11.16 in time.