using the lucene search engine
Download
Skip this Video
Download Presentation
Using the Lucene Search Engine

Loading in 2 Seconds...

play fullscreen
1 / 38

Using the Lucene Search Engine - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Using the Lucene Search Engine. Team. Concepts. Lucene. Full Text Search Cross Platform Lucene Document Inverted Index. Lucene. iViewXT. Search Improvements. Test Document Collections. UAT. Super Mario. Implementation. Derek. Performance. Lucene Implementation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using the Lucene Search Engine' - lamar-ferrell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lucene
Lucene

Full Text Search

Cross Platform

Lucene Document

Inverted Index

text extraction
Text Extraction
  • Lucene not a complete application.
  • PDF files text extraction
  • Microsoft files text extraction
luke lucene index toolbox
Luke - Lucene Index Toolbox
  • Client application to link directly into your index.
  • Java-webstart app
    • http://www.getopt.org/luke/
  • Handy for testing searches and performance.
some problems encountered
Some problems encountered
  • Max clause count exception:
    • Take care automatically adding wildcards!!
  • Performance:
    • Do the work while indexing, not while searching.
    • Pagination: Get one page at a time from the Hits.
  • Our security model
    • Stored collection of allowed containers in UserSession.
  • Visibility of indexing job.
    • Added logging “Indexing document 426 of 204,532”
resources general

http://lucene.apache.org/

http://www.ibm.com/developerworks/web/library/wa-lucene2/

http://www.ibm.com/developerworks/library/wa-lucene/

An open source document management system in php with a java lucene search engine

Resources (general)‏

Handy ajax autocomplete component.

resources text extraction
Resources (text extraction)‏

http://pdfbox.org

Text extractor for pdf files

JXL http://jexcelapi.sourceforge.net/

Text extractor for excel files.

Text extractor for word documents.

API to access Microsoft format files. (xls/doc/ppt). I would recommend this one over jxl or text-mining above.

summary
Summary

Lucene querying is fast (take care what you do with the results)

Indexing is slow (Make indexing job visible)

Use Luke

Add lots to the index (Do the work while indexing)

ad