1 / 20

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications. Written by: Amihood Amir, Oren Kapah and Ely Porat. Motivation – Point Set Matching. Integer 1-D Point Set Matching : T: ( t 1 ,t 2 ,…,t n ) P: ( p 1 ,p 2 ,…,p m ) Where t i and p i are integers.

tate
Download Presentation

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deterministic Length Reduction: Fast Convolution in Sparse Dataand Applications Written by: Amihood Amir, Oren Kapah and Ely Porat

  2. Motivation – Point Set Matching • Integer 1-D Point Set Matching: • T: (t1,t2,…,tn) • P: (p1,p2,…,pm) • Where ti and pi are integers. • Let N=tn, M=pm. (the maximal index) • Time: O(nm), O(N·log(M))

  3. Motivation – Point Set Matching • 2-D Point Set Matching – Searching in Music: • T: (i1,j1),(i2,j2),…,(in,jn) • P: (i1,j1),(i2,j2),…,(im,jm) Pattern Text • Dimension Reduction: (i,j) →i·N + j

  4. Motivation – Generalized Case • The generalized case of these problems is the d-Dimensional sparse wildcard matchingproblem. • Problem Definition: Given d-Dimensional text T with zeros and non-zeros, and a d-Dimensional pattern P with wildcards and non-zeros. Find all the locations where P matches T. • Applications:d-Dimensional point set matching, searching in music, protein activity research, etc.

  5. Length Reduction • Goal: Given two vectors V1&V2, obtain two vectors V’1&V’2 of size O(n1) such that all non-zero in V1 and in V2 will appear as singletons in respectively while maintaining the distance property. • The Distance Property: If V’2[f(0)] is aligned with V’1[f(i)], then V’2[f(j)] will be aligned with V’1[f(i + j)]. • Using the reduced size vectors, matching can be done in time O(n1log(n1)) using convolutions.

  6. Example: Length Reduction The vectors are given as sets of pairs:(index, value). V1:(0, 5), (6, 2), (13, 3), (19, 1) V2:(0, 2), (7, 3) Length Reduction Function:mod(5) V’1: V’2:

  7. The Randomized Algorithm(Cole & Hariharan – STOC02) • Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. • Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). • If s is c·n, then the probability of a non-zero appearing as a multiple is constant. • Using log(n) different hash functions will reduce the failure probability exponentially.

  8. The Randomized AlgorithmSources of Errors • Some non-zeros may appear only as multiples in all the set of vectors. • The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). • This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

  9. Deterministic Length Reduction • Our Goal: Find a set of log(n) hash functions, which will ensure that each non-zero appears as a singleton at least once. • Finding the hash functions is done in a preprocessing step based on V1. • The algorithm distinguish between 2 cases: • N1 is polynomial in n1. • N1 is exponential in n1.

  10. The Polynomial case: N<nc • Let q be a prime number of size O(n), and mod(q) be the suggested hash function. • Let i,j be the indices of two non-zeros. • Observation: If i and j are mapped into the same location, it means that q divides dij. • Observation: There are at most c prime numbers of size O(n), which divides dij. • Corollary: A non-zero can appear as a multiple in at most c·n prime numbers.

  11. Choosing Prime Numbers • Test 2c·n prime numbers (of size O(nlogn) ), and build the following table: • Each column represents a non-zero (n columns). • Each row represents a prime number (2c·n rows). • Reminder: Each non-zero can appear as a multiple at most c·n times. • Corollary: The table is at least half full with ones.

  12. Choosing Prime Numbers: Cont. • Select a prime number which generates a row that is at least half full. (for example P2) • Delete the row and all the columns in which there was 1 in the deleted row. • Repeat steps 1 and 2 until the whole table is deleted Slected Primes: P2, P4, Time: O(n2)

  13. The Exponential Case: n<2n • Idea: Reduce the length of the vector to polynomial and continue with the previous algorithm. • Any distance dij can be divided by at most n prime numbers. • There are at most n2 different distances. • Corollary: There are at most n3 prime numbers which generates multiples.

  14. The Reduction Algorithm. • Choose a prime number q of size O(n4). • Create the reduced size vector using the mod(q) hash function. • Repeat steps 1&2 if a multiple was created. • Duplicate the obtained vector (create a vector of size 2q), to allow further reduction of the vector. Time: O(n4)

  15. The Randomized AlgorithmSources of Errors • Some non-zeros may appear only as multiples in all the set of vectors. • The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). • This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

  16. The Convolution Algorithm • For each prime number Pi: • Create the reduced size vectors V’1,i &V’2,i using the indices of the non-zeros and perform shift matching. • Create the reduced size vectors V’1,i &V’2,i using 1’s instead of the non-zeros and perform convolution. • Create the reduced size vectors V’1,i &V’2,i using the values of the non-zeros and perform convolution. • Zero the value of the non-zeros appeared as singletons. • For all indices where shift matching was found: • Sum the results of the 1’s convolutions. • If the result is n2 then sum the results of the values convolutions and report the result. Time: O(nlog3(n))

  17. Example V1:(0, 5), (5, 2), (13, 3), (20, 1) V2:(0, 2), (8, 3) Prime Numbers:5,7 V’1,1: V’2,1: (5, 1, 9), (13, 1, 6) V’1,2: V’2,2: (0, 1, 10), (5, 1, 4)

  18. Conclusions and Open Problems • A deterministic algorithm for length reductionand fast convolution was presented. • Preprocessing time: O(n2) – Polynomial case, O(n4) – Exponential case. • Running time: O(nlog2n) • Open problems: • Can the preprocessing time be reduced? • Can the size of the vectors be reduced? • Can the number of vectors be reduced?

  19. THE END Thank You!

  20. Questions?

More Related