1 / 12

Solr Integration and Enhancements

Solr Integration and Enhancements. Solr has a lot of extensive features. Todd Hatcher. What is Solr?. Solr offers advanced, optimized, scalable searching capabilities Communicate with Solr using XML, JSON and HTTP Includes a HTML admin interface Solr is built on top of Lucene

mireya
Download Presentation

Solr Integration and Enhancements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solr Integration and Enhancements Solr has a lot of extensive features Todd Hatcher

  2. What is Solr? • Solr offers advanced, optimized, scalable searching capabilities • Communicate with Solr using XML, JSON and HTTP • Includes a HTML admin interface • Solr is built on top of Lucene • Rich features of Lucene can be leveraged when using Solr • Solr is very configurable

  3. Integration with ColdFusion • Very little direct integration with ColdFusion • ColdFusion communicates with Solr using HTTP • Solr runs in its own JVM, does not share with ColdFusion • Using ColdFusion installation, Solr runs in a jetty servlet container on port 8983 (http://localhost:8983/solr) • Solr is exposed in production by default • Important files located C:\ColdFusion9\solr\multicore • Solr offers a lot more than what is available using cfindex cfcollection cfsearch

  4. Solr • What is a core? – it’s like a verity collection (a searchable data group) • Single Core (one index) vs Multicore (multiple isolated configurations/schemas/indexes using same Solr instance) • C:\ColdFusion9\solr\multicore\solr.xml is the central file that points to locations of the Solr cores’ configuration and data (this what CF administrator reads/writes to when creating and using Solr collections) • You can put your Solr cores under you project directory and keep them in source control

  5. [core]/conf/solrconfig.xml • Main configuration for solr core • <queryResponseWriter name=“json” /> determines the format of the results. ColdFusion uses xslt by default • You can return JSON, XML, python, ruby, php • Multiple query response writers can be configured, one can be set as default others can be specified by passing parameter wt:[name] (eg. wt:json) • cfsearch type of methods will not work if the response writer is not what ColdFusion is expecting

  6. [core]/conf/schema.xml • Field Types maps custom types to the solr/lucene type • type solr.TextField allows for analyzers • Analyzers can be run at index time or query time • They allow for manipulations of the data (typically filtering) • The order in which filters are declared is the order processed • StopFilterFactory removes common words that do not help the search results • WordDelimiterFilterFactory can adds words like WiFi, Wi, Fi by splitting the original into subwords

  7. [core]/conf/schema.xml cont. • EnglishPorterFilterFactory determines root word using word variations like -ing determines root word and adds to index • SynonymFilterFactory treats words as same • DoubleMetaphoneFilterFactory for phonetic logic (better than Soundex which Verity uses) • TextSpell/TextSpellPhrase feedback “did you mean” • <copyField source=“fieldName” dest=“d”/> destfieldtype can run different analyzers on source field and store result • wiki.apache.org/solr/AnalyzersTokenizersTokenFilters • Adobe adds quite a bit to the file to create fieldtypes to be compatible with what was in verity

  8. [core]/conf/schema.xml cont. • Similar to creating a database table. Maps field names to types using <field /> • Gives you the ability to store additional data • Field can be indexed (searchable) • Field can be stored (referenced and returned with results) • Field can be required • <uniqueKey>[field name]</uniqueKey> • <solrQueryParse defaultOperator=“OR” />

  9. Indexing • Data is sent using api - HTTP POST to Solr as XML/JSON/Binary • Commit is an intensive task. Do bulk adds first then call commit • <cfindex /> calls commit after each index (confirmed?) • Commit after each would noticeably increase index time • Efficient Process : add data (queue), commit, optimize

  10. Search Syntax • field:term (*:* returns everything) • A score is generated at query time, the value itself doesn’t have any meaning, the scores are relevant only when relative to each other (a scale) • fq can filter query based on some supplied condition • wt is the return type of the results (xml,json, etc.) • qt is the request handler used to process the request (default is “standard”) • fl is the list of fields to return (field must be stored) • q is the query string • You can specify the start value and maxrows

  11. DisMaxRequestHandler • Declared in solrconfig.xml • Allows simplified searching without strict syntax • Can be configured with default weighted parameters (which can be overriden) • Causes the q parameters to be parsed differently

  12. Resources • Lucene In Action • http://wiki.apache.org/solr/ • http://cfadminsearcher.riaforge.org/ • http://cfsolrlib.riaforge.org/ • CF Solr Lib written by Shannon Hicks – Wrapper for Solr functionality

More Related