230 likes | 522 Views
Advanced Indexing Techniques with Apache Lucene - Payloads. Agenda. Part 1: Inverted Index 101Posting ListsStored Fields vs. PayloadsPart 2: Use cases for PayloadsBoostingTermQuerySimple facet counting. Advanced Indexing Techniques with Apache Lucene - Payloads. Lucene's data structures. Inver
E N D
1. Advanced Indexing Techniques with Apache Lucene - Payloads Advanced Indexing Techniques with
Michael Busch
(buschmi@apache.org)
2. Advanced Indexing Techniques with Apache Lucene - Payloads Agenda Part 1: Inverted Index 101
Posting Lists
Stored Fields vs. Payloads
Part 2: Use cases for Payloads
BoostingTermQuery
Simple facet counting
3. Advanced Indexing Techniques with Apache Lucene - Payloads
4. Advanced Indexing Techniques with Apache Lucene - Payloads
5. Advanced Indexing Techniques with Apache Lucene - Payloads
6. Advanced Indexing Techniques with Apache Lucene - Payloads
7. Advanced Indexing Techniques with Apache Lucene - Payloads
8. Advanced Indexing Techniques with Apache Lucene - Payloads
9. Advanced Indexing Techniques with Apache Lucene - Payloads So far… String comparison slow
Inverted index used to accelerate search
Store positions in posting lists to allow phrase searches
Store payloads in posting lists to store arbitrary data with each position
10. Advanced Indexing Techniques with Apache Lucene - Payloads
11. Advanced Indexing Techniques with Apache Lucene - Payloads
12. Advanced Indexing Techniques with Apache Lucene - Payloads
13. Advanced Indexing Techniques with Apache Lucene - Payloads
14. Advanced Indexing Techniques with Apache Lucene - Payloads Agenda Part 1: Inverted Index 101
Posting Lists
Stored Fields vs. Payloads
Part 2: Use cases for Payloads
BoostingTermQuery
Simple facet counting
15. Advanced Indexing Techniques with Apache Lucene - Payloads org.apache.lucene.analysis.Token
16. Advanced Indexing Techniques with Apache Lucene - Payloads Analyzer:
17. Advanced Indexing Techniques with Apache Lucene - Payloads Similarity:
18. Advanced Indexing Techniques with Apache Lucene - Payloads
19. Advanced Indexing Techniques with Apache Lucene - Payloads Analyzer:
20. Advanced Indexing Techniques with Apache Lucene - Payloads Hitcollector: Use different PriorityQueues for different sites
Instead of returning top-n results of the whole data set, return top-n results per site
21. Advanced Indexing Techniques with Apache Lucene - Payloads Summary In this example: facet (site) used for scoring, but extendable for facet counting
Good performance due to locality of facet values
22. Advanced Indexing Techniques with Apache Lucene - Payloads Payloads offer great flexibility
Payloads are stored very space-efficient
Sophisticated data structures enable efficient skipping over payloads
Payloads should be used whenever special data is required for finding hits and scoring
23. Advanced Indexing Techniques with Apache Lucene - Payloads Finalize API (currently Beta)
Add more out-of-the-box query types
Per-document Payloads
24. Advanced Indexing Techniques with Apache Lucene - Payloads Advanced Indexing Techniques with
Questions ?