Download
dictionary search n.
Skip this Video
Loading SlideShow in 5 Seconds..
Dictionary search PowerPoint Presentation
Download Presentation
Dictionary search

Dictionary search

2 Views Download Presentation
Download Presentation

Dictionary search

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Dictionary search Exact string search Paper on Cuckoo Hashing

  2. Exact String Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support searches for a pattern P over them. Hashing

  3. Hashing with chaining

  4. Key issue: a good hash function Basic assumption:Uniform hashing • Avg #keys per slot = n * (1/m) = n/m • =a(load factor)

  5. Search cost m = Q(n)

  6. In practice A trivial hash function is: prime

  7. A “provably good”hash is l = max string len m = table size ≈log2 m • Each ai is selected at random in [0,m) a0 k0 k1 a1 k2 a2 kr ar K prime r ≈ L / log2 m a not necessarily: (...mod p) mod m

  8. Cuckoo Hashing A B C E D 2 hash tables, and 2 random choices where an item can be stored

  9. A running example A B C F E D

  10. A running example A B C F E D

  11. A running example A B C F G E D

  12. A running example E G B C F A D

  13. Cuckoo Hashing Examples A B C G E D F Random (bipartite) graph: node=cell, edge=key

  14. Natural Extensions • More than 2 hashes (choices) per key. • Very different: hypergraphs instead of graphs. • Higher memory utilization • 3 choices : 90+% in experiments • 4 choices : about 97% • 2 hashes + bins of B-size. • Balanced allocation and tightly O(1)-size bins • Insertion sees a tree of possible evict+ins paths but more insert time (and random access) more memory ...but more local

  15. Dictionary search Making one-side errors Paper on Bloom Filter

  16. Crawling How to keep track of the URLs visited by a crawler? • URLs are long • Check should be very fast • No care about small errors (≈ page not crawled) Bloom Filter over crawled URLs

  17. Searching with errors...

  18. Problem: false positives

  19. 2 TTT

  20. Not perfectly true but...

  21. Opt k = 5.45... m/n = 8 We do have an explicit formula for the optimal k

  22. Dictionary search Prefix-string search Reading 3.1 and 5.2

  23. Prefix-string Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support prefix searches for a pattern P over them.

  24. 2 2 0 5 1 1 4 5 6 7 2 3 Trie: speeding-up searches s y z omo aibelyite stile zyg czecin etic ygy ial Pro: O(p) search time Cons: edge + node labels and tree structure

  25. 5 5 2 3345% 0 http://checkmate.com/All/Natural/Washcloth.html... Front-coding: squeezing strings ….systile syzygetic syzygial syzygy…. 0 http://checkmate.com/All_Natural/ 33 Applied.html 34 roma.html 38 1.html 38 tic_Art.html 34 yate.html 35 er_Soap.html 35 urvedic_Soap.html 33 Bath_Salt_Bulk.html 42 s.html 25 Essence_Oils.html 25 Mineral_Bath_Crystals.html 38 Salt.html 33 Cream.html http://checkmate.com/All_Natural/ http://checkmate.com/All_Natural/Applied.html http://checkmate.com/All_Natural/Aroma.html http://checkmate.com/All_Natural/Aroma1.html http://checkmate.com/All_Natural/Aromatic_Art.html http://checkmate.com/All_Natural/Ayate.html http://checkmate.com/All_Natural/Ayer_Soap.html http://checkmate.com/All_Natural/Ayurvedic_Soap.html http://checkmate.com/All_Natural/Bath_Salt_Bulk.html http://checkmate.com/All_Natural/Bath_Salts.html http://checkmate.com/All/Essence_Oils.html http://checkmate.com/All/Mineral_Bath_Crystals.html http://checkmate.com/All/Mineral_Bath_Salt.html http://checkmate.com/All/Mineral_Cream.html http://checkmate.com/All/Natural/Washcloth.html ... Gzip may be much better...

  26. Internal Memory Disk 2-level indexing • 2 advantages: • Search ≈ typically 1 I/O • Space ≈ Front-coding over buckets CT on a sample • A disadvantage: • Trade-off ≈ speed vsspace (because of bucket size) systileszaielyite ….70systile 92zygeti c85ial 65y 110szaibelyite 82czecin92omo….