1 / 7

Faster postings merges: Skip pointers/Skip lists

Faster postings merges: Skip pointers/Skip lists. Sec. 2.3. Recall basic merge. Walk through the two postings simultaneously, in time linear in the total number of postings entries. 2. 4. 8. 41. 48. 64. 128. Brutus. 2. 8. 1. 2. 3. 8. 11. 17. 21. 31. Caesar.

ctaylor
Download Presentation

Faster postings merges: Skip pointers/Skip lists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Faster postings merges:Skip pointers/Skip lists

  2. Sec. 2.3 Recall basic merge • Walk through the two postings simultaneously, in time linear in the total number of postings entries 2 4 8 41 48 64 128 Brutus 2 8 1 2 3 8 11 17 21 31 Caesar If the list lengths are m and n, the merge takes O(m+n) operations. Can we do better? Yes (if the index isn’t changing too fast).

  3. Sec. 2.3 128 17 2 4 8 41 48 64 1 2 3 8 11 21 Augment postings with skip pointers (at indexing time) • Why? • To skip postings that will not figure in the search results. • How? • Where do we place skip pointers? 128 41 31 11 31

  4. Sec. 2.3 17 2 4 8 41 48 64 1 2 3 8 11 21 But the skip successor of 11 on the lower list is 31, so we can skip ahead past the intervening postings. Query processing with skip pointers 128 41 128 31 11 31 Suppose we’ve stepped through the lists until we process 8 on each list. We match it and advance. We then have 41 and 11 on the lower. 11 is smaller.

  5. Sec. 2.3 Where do we place skips? • Tradeoff: • More skips  shorter skip spans  more likely to skip. But lots of comparisons to skip pointers. • Fewer skips  few pointer comparison, but then long skip spans  few successful skips.

  6. Sec. 2.3 Placing skips • Simple heuristic: for postings of length L, use L evenly-spaced skip pointers [Moffat and Zobel 1996] • This ignores the distribution of query terms. • Easy if the index is relatively static; harder if L keeps changing because of updates. • This definitely used to help; with modern hardware it may not unless you’re memory-based [Bahle et al. 2002] • The I/O cost of loading a bigger postings list can outweigh the gains from quicker in memory merging!

  7. Faster postings merges:Skip pointers/Skip lists

More Related