1 / 17

Lucene/SOLR 2: Lucene search API

Lucene/SOLR 2: Lucene search API. TU Delft Library Digitale Productontwikkeling. voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht: Query, Similarity, QueryParser toetje: Hits, Highlighter, SpellChecker. Egbert Gramsbergen. org.apache.lucene.search. Searcher. int i. int i.

blake-chang
Download Presentation

Lucene/SOLR 2: Lucene search API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lucene/SOLR2: Lucene search API TU Delft Library Digitale Productontwikkeling • voorgerecht: Searcher, Term, Sort, Filter • hoofdgerecht: Query, Similarity, QueryParser • toetje: Hits, Highlighter, SpellChecker Egbert Gramsbergen

  2. org.apache.lucene.search.Searcher int i int i class VerbasterdUMLclass diagram Document Document Searcher *docdocFreqexplainsearchgetSimilaritysetSimilarity +lower level methods(performance tuning) Term ([]) constructor int ([]) argument ---return value --> Explanation int doc Query optional ... Filter Sort methods Hits Similarity

  3. FSDirectory org.apache.lucene.search.Searcher RAMDirectory DbDirectory JEDirectory IndexSearcher * Directory Searcher String path IndexReader MultiSearcher * FilterIndexReader MultiReader [] [] Searcheable ParallelReader ParallelMultiSearcher * RemoteSearcheable

  4. org.apache.lucene.index.Term Term *createTermfieldtextcompareTo String field String text int Gebruik: o.a. bouwsteen van Query en Filter

  5. org.apache.lucene.search.Sort N.B.Lucene kent geen strongly typed fields,SOLR wel Sort **setSort ([]) SortField int AUTO, CUSTOM, DOC, SCORE, INT, LONG, FLOAT, DOUBLE, STRING * String field boolean reverse ([]) [] String field setSortgetSort boolean reverse int type SortComparatorSource Locale * String languageString countryString variant

  6. org.apache.lucene.search.Filter BooleanFilter ChainedFilter Filter DuplicateFilter PrefixFilter QueryWrapperFilter gebruik:bijv. infaceted search RangeFilter SpanFilter CachingWrapperFilter voorbeeld: TermsFilter * addTerm Term more…

  7. org.apache.lucene.search.Query FuzzyQuery TermQuery WildcardQuery MultiTermQuery RegexQuery BooleanQuery Query PhraseQuery PrefixQuery SpanFirstQuery MultiPhraseQuery SpanNearQuery RangeQuery SpanNotQuery SpanQuery SpanOrQuery BoostingQuery SpanRegexQuery ConstantScoreQuery SpanTermQuery ConstantScoreRangeQuery DisjunctionMaxQuery BoostingTermQuery FilteredQuery FuzzyLikeThisQuery MatchAllDocsQuery ValueSourceQuery FieldScoreQuery MoreLikeThisQuery CustomScoreQuery

  8. org.apache.lucene.search.Query Query setBoostgetBoostrewrite Float boost IndexReader TermQuery *getTerm Term PhraseQuery *addgetTermssetSlop [ ] int position int slop

  9. org.apache.lucene.search.BooleanQuery BooleanQuery *addgetClausessetMinimumNumberShouldMatch boolean disableCoord BooleanClause * [ ] int Query  and/or-ish query//exampleBooleanQuery bq;float andNess = 0.5; // 0.:OR(default), 1.:AND…BooleanClause[] clauses = bq.getClauses();int numOpt = 0;for (int 1 = 0; i<clauses.length; i++ { if (clauses[i].getOccur()==BooleanClause.Occur.SHOULD) numOpt++;}bq.setMinimumNumberShouldMatch(Math.round(numOpt*andNess));//NOTE: if there is no MUST clause at least 1 SHOULD clause must match BooleanClause.Occurint MUST, MUST_NOT, SHOULD

  10. org.apache.lucene.search.tunction.CustomScoreQuery CustomScoreQuery *customScore Query ([]) ValueSourceQuery int docfloat subQueryScorefloat([]) valSrcScore(s) float FieldScoreQuery * String field Use cases:* Meewegen pub. type+jaar (bibliotheek)* Geografische nabijheid (search “pizza”) override FieldScoreQuery.Typeint BYTE, SHORT, INT, FLOAT Default:subQueryScore* valSrcScores[0] * valSrcScores[1]* … Pub.jaar: score = 1+a/(1+τ), τ=(t-tp)/t0 a 1 t0 t-tp

  11. org.apache.lucene.search.Similarity Hier wordt het echte werk verricht: http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Similarity.html Query, Document  Scorevolgens Vector Space model

  12. org.apache.lucene.queryParser.QueryParser String  Query (hoera!)::=def. ()nesting *repetition []optional |or | | | | | Query ::= ( Clause )* | |Clause ::= ["+"|"-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )| | | | | AND NOT field | nested query single term or phrase Voorbeelden:aaa bbb ccc year:[2000 TO 2005](inclusive) +aaa bbb –ccc price:{020 TO 100}(not inclusive)"aaa bbb" aaa^3 bbb (boost)title:aaa "aaa bbb"^0.5 title:(+aaa bbb) AND author:"ddd e f" 1/+1 (/ escape char)aaa* bb*b cc?caaa~0.8 (fuzzy/min.similarity)"aaa bbb"~10 (proximity/slop) gaat ook nog doorAnalyzer  Strings: 20<100Lucene: alleen StringsSOLR: strongly typed fields!  NIET: "aaa* bbb"  NIET: *aaa, ?aaa

  13. org.apache.lucene.queryParser.QueryParser Niet iedere Query kan door QueryParser worden gemaakt(te ingewikkeld of bescherming performance) “New Yor*”*ork“New York” binnen 10 woorden afstand van “Broadway” en max. 5 woorden na het begin van het veld Niet iedere Query wil door QueryParser worden gemaakt Doe aan Interface ontwerp, bijv.* vrije text invoer (geQueryParsed)* aparte interface elementen voor: * velden * ranges * facetten, more like this, …

  14. org.apache.lucene.queryParser.QueryParser StandardAnalyzer RussianAnalyzer QueryParser *parsesetDefaultOperatorsetPhraseSlopsetFuzzyMinSim… String defaultField BrazilianAnalyzer Analyzer DutchAnalyzer * String query … Query File stopwordsString[] stopwordsHashSet stopwords QueryParser.OperatorAND_OPERATOR, OR_OPERATOR floatint

  15. org.apache.lucene.search.Hits Searchersearch Document getgetFields… String fieldNameString value List fields Hits docscoreiteratorlength FieldnamegetValue… int nfloat score Hit getDocumentgetScore HitIterator nexthasNextlength boolean hasNext int length N.B. gebruik HitCollector (low-level API) voor grote aantallen hits

  16. org.apache.lucene.search.highlight.Highlighter Highlighter *setTextFragmentergetBestFragments… QueryScorer * Query Scorer(fragmentScorer) IndexReader String fieldName Formatter SimpleHTMLFormatter * String preTagString postTagFloat maxScoreString minForegroundcolorString maxForegroundcolor String minBackgroundcolorString maxBackgroundcolor Analyzer String fieldNameString textint maxNumFragments GradientFormatter SpanGradientFormatter * String[] bestFragments Fragmenter int fragmentSize SimpleFragmenter *

  17. org.apache.lucene.search.spell.SpellChecker N-gram index SpellChecker *indexDictionarysuggestSimilarsetAccuracy… PlainTextDictionary * Directory(spellIndex) FileInputStreamReader Dictionary LuceneDictionary * IndexReader Stringfieldboolean morePopular String wordintnumSug String[] words float minScore

More Related