1 / 5

Lucene/Solr Architecture

Lucene/Solr Architecture. Request Handlers. Response Writers. Update Handlers. /admin. /select. /spell. XML. Binary. JSON. XML. CSV. binary. Extracting Request Handler (PDF/WORD). Search Components. Schema. Update Processors. Query. Highlighting. Signature. Spelling.

colm
Download Presentation

Lucene/Solr Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lucene/Solr Architecture Request Handlers Response Writers Update Handlers /admin /select /spell XML Binary JSON XML CSV binary Extracting Request Handler (PDF/WORD) Search Components Schema Update Processors Query Highlighting Signature Spelling Statistics Logging Faceting Debug Indexing Apache Tika More like this Clustering Query Parsing Config Distributed Search Data Import Handler (SQL/RSS) Analysis Faceting Filtering Search Caching High-lighting Index Replication Apache Lucene Core Search IndexReader/Searcher Indexing IndexWriter Text Analysis

  2. Lucene/Solr plugins • RequestHandlers – handle a request at a URL like /select • SearchComponents – part of a SearchHandler, a componentized request handler • Includes, Query, Facet, Highlight, Debug, Stats • Distributed Search capable • UpdateHandlers – handle an indexing request • Update Processor Chains – per-handler componentized chain that handle updates • Query Parser plugins • Mix and match query types in a single request • Function plugins for Function Query • Text Analysis plugins: Analyzers, Tokenizers, TokenFilters • ResponseWriters serialize & stream response to client

  3. Lucene/Solr Query Plugin Architecture Declarative Analysis per-field - Tokenizer to split text - TokenFilter to transform tokens - Analyzer for completely custom - Separate query / index analyzer QParser plugins - Support different query syntaxes - Support different query execution - Function Query supports pluggable custom functions - Excellent support for nesting/mixing different query types in the same request. schema.xml // declaratively defines types // and analyzers for fields <fieldType name=“text1”> <filter=“whitespace”> <filter=“customFilter” …> <filter=“synonyms” file=..> <filter=“porter” except=..> <field name=“title” type=“text1” <field name=“cust1” class=… solrconfig.xml Analyzer for “title” Whitespace Tokenizer Analyzer for “cust1” (potentially completely custom architecture not using tokenizer/filters) CustomFilter SynonymFilter Porter Stemmer < index configuration /> < caching configuration /> < request handler config /> < search component config /> < update processor config /> < misc – HTTP cache, JMX > <parser name=“mycustom” … <func name=“custom” class=… MyCustom QParser Lucene QParser Function Range Q XML QParser DisMax QParser Function QParser sum max pow log sqrt custom

  4. Lucene/Solr Request Plugins {“response”={ “docs”={ http://.../select?q=cheese&wt=json /select /admin/luke /mypath RequestHandler Request Handler (non-component based) Request Handler (custom) XML response writer Query Component Facet Component XSLT response writer Highlight Component Binary response writer Distributed Search Debug Component JSON response writer Query Response Custom response writer Additional plug-n-play search components TermVector QueryElevation Spellcheck Terms MoreLikeThis Statistics My Custom Clustering

  5. Lucene/Solr Indexing PDF <doc> <title> HTTP POST Remove Duplicates processor HTTP POST /update /update/csv /update/xml /update/extract XML Update Handler CSV Update Handler XML Update with custom processor chain Extracting RequestHandler (PDF, Word, …) Custom Transform processor Logging processor Update Processor Chain (per handler) Text Index Analyzers Data Import Handler Database pull RSS pull Simple transforms RSS feed pull Lucene Index processor SQL DB pull Lucene Index

More Related