330 likes | 489 Views
On Designing Fast Nonuniformly Distributed IP Address Lookup Hashing Algorithms. Author: Christopher J. Martinez, Devang K. Pandya, and Wei-Ming lin Publisher: IEEE/ACM Transactions on Networking, 2009 Presenter : Yuen- Shuo Li Date: 2013/01/09. Outline. Introduction
E N D
On Designing Fast Nonuniformly Distributed IP Address Lookup Hashing Algorithms • Author: Christopher J. Martinez, Devang K. Pandya, and Wei-Ming lin • Publisher: IEEE/ACM Transactions on Networking, 2009 • Presenter: Yuen-Shuo Li • Date: 2013/01/09
Outline • Introduction • Proposed Hashing Algorithm • Simulation Results • Implementation
Introduction(1/4) Hashing has been widely used for fast IP address, but performance from known hashing schemes is far from optimal due to the nonuniformity in actual IP address distribution.
Introduction(2/4) • there exist a set of well-established hash algorithms such as MD4, MD5, SHA-1, and SHA-2, which have found use in the cryptography field. • These algorithms rely on a series of addition, bit rotation, and logic operations through many cycles. Too slow!
Introduction(3/4) • CRC-based hash functions have proven to be excellent means, but have some potential shortcomings. • Compared to a simple XOR folding hash algorithm that can be implemented in a fast parallel circuit, the CRC-based hash function requires a sequential circuit and a much longer time to determine the hash value. can’t be implement in parallel !
Introduction(4/4) • The goal of this paper is to develop a universal hashing methodology applicable to nonuniformly distributed data sets. • Our proposed designs allow the application of a standard XOR folding hashing to produce a significantly improved performance. A New Hash Function (improve XOR folding hashing) balance!
Proposed Hashing Algorithm(1/13) The hashing process is to hash each of the n-bit entries into an m-bit hash value. n bits hash m bits
Proposed Hashing Algorithm(2/13) Intuitively, using the bits with smaller d values for hashing would lead to a probabilistically better hash distribution. n bits 1 1 0 1 0 0 0 1 0 1 0 1 d= 2 0 2 d: the difference between the number of 0’s and 1’s
Proposed Hashing Algorithm(3/13) Employ a simple preprocessing step in rearranging the n-bit vectors according to their d values sorted into a increasing order. n bits
Proposed Hashing Algorithm(4/13) A bit-extraction hashing is to simply extract m bits from the n-bit entry as its hash value sort by d n bits n bits m bits m bits EXT d-EXT
Proposed Hashing Algorithm(5/13) n=32, m= varied MSL: the largest number of entries that are mapped into any hash bin. ASL: the average maximum number of matching steps needed for any given record to match.
Proposed Hashing Algorithm(6/13) Group-XOR is a commonly used hashing technique by simply grouping the n-bit key into m-bit hash result through a simple process XORing every n/m key bits into a final hash bit. m bits n bits m bits m bits m bits 12
Proposed Hashing Algorithm(7/13) The goal of this paper is to use the extracted information from the preprocessing (d values) to facilitate a better hash design with the XOR operator.
Proposed Hashing Algorithm(8/13) In order not to degrade the hash performance, every intended XOR operation to be taken between two bits should lead to a value such that .
Proposed Hashing Algorithm(9/13) • Bit vectors with smaller d values are XORed with larger d-value bits in order to have a better chance for further reduction. • Bit vectors in the middle range are XORed together to provide the most reductions available.
Proposed Hashing Algorithm(10/13) Two straightforward ways to exploit the benefit from the d-value-based sorted sequence are to perform XOR hashing on the preprocessed database.
Proposed Hashing Algorithm(11/13) The traditional group-XOR process may easily lead to detrimental effect, while both d-IOX and d-SOX avoid XORing two bits –- • both with small values (the worst possible XORing) • both with large values (the XORing leading to minimal gain).
Proposed Hashing Algorithm(12/13) • Natural-Fold XOR(d-NFX) folds the sorted bit sequence from both ends’ matching pair of bits accordingly. • Natural-Fold with Duplication XOR(d-NFD) duplicates the middle subsegmentsto patch up the missing portion for uniformity.
Proposed Hashing Algorithm(13/13) d-NFD may lead to overduplicationor underduplication on the center subsegments. A simple method is adopted in simply truncating the bits overshot or duplicating more the once.
Simulation Results(1/12) The data set used for our simulation is randomly generated such that the value for each bit position is uniformly distributed. 16384(214)entries
Simulation Results(2/12) • The simulation results for n = 32 and are given in Fig. 12 in terms of MSL and ASL by taking an average of results from 1000 runs. RS hash MSL: the largest number of entries that are mapped into any hash bin. ASL: the average maximum number of matching steps needed for any given record to match.
Simulation Results(3/12) RS Hash(additional)
Simulation Results(4/12) a summary of performance gain in MSL from each of the three proposed techniques and the two reference techniques over the group-XOR.
Simulation Results(5/12) RS Hash The RS is a multiplicative hash algorithm that requires two multiply and one addition steps for every 8 bits of hash key to generate a hash value. CRC-32 Hash The CRC-32 requires 32 iterations to generate the final hash value for a given hash key, requiring additional control logic to properly maintain the sequential process.
Simulation Results(6/12) the average d value of each final hash bit for m=14
Simulation Results(7/12) a collection of real IP addresses gathered from three different sources: • general IP traffic addresses; • ad/spam IP addresses; • P2P IP addresses.
Simulation Results(8/12) Performance comparison in terms of MSL and ASL on general IP traffic addresses.
Simulation Results(9/12) Performance comparison in terms of MSL and ASL on AD/SPAM IP traffic addresses.
Simulation Results(10/12) Performance comparison in terms of MSL and ASL on P2P IP traffic addresses.
Simulation Results(11/12) To further analyze potential performance difference between the d-value XOR folding algorithms and the well-established CRC and RS hashing algorithms, the 2 analysis is conducted.
Simulation Results(12/12) the 2 analysis
Implementation(1/2) The mapping from the original bit position to the sorted position and then through the d-SOX hashing.
Implementation(2/2) d-NFD