lucene demo l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lucene-Demo PowerPoint Presentation
Download Presentation
Lucene-Demo

Loading in 2 Seconds...

play fullscreen
1 / 16

Lucene-Demo - PowerPoint PPT Presentation


  • 213 Views
  • Uploaded on

Lucene-Demo . Brian Nisonger. Intro. No details about Implementation/Theory See Treehouse Wiki- Lucene for additional info Set of Java classes Not an end to end solution Designed to allow rapid development of IR tools. Index.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lucene-Demo' - emlyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lucene demo

Lucene-Demo

Brian Nisonger

intro
Intro
  • No details about Implementation/Theory
    • See Treehouse Wiki- Lucene for additional info
  • Set of Java classes
  • Not an end to end solution
  • Designed to allow rapid development of IR tools
index
Index
  • The first step is to take a set of text documents and build an Index
    • Demo:IndexFiles on Pongo
    • Two major classes
      • Analyzer
        • Used to Tokenize data
        • More on this later
      • IndexWriter
        • IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true);
index writer
Index Writer
  • Index Writer creates an index of documents
    • First argument is a directory of where to build/find the index
    • Second argument calls an Analyzer
    • Third argument determines if a new index should be created
analyzer
Analyzer
  • Standard Analyzer
    • Porter Stemming w/ Stop Words
  • Krovetz Stemmer-Example
    • package org.apache.lucene.analysis;
    • import org.apache.lucene.analysis.Analyzer;
    • import org.apache.lucene.analysis.standard.*;
    • import org.apache.lucene.analysis.TokenStream;
    • import org.apache.lucene.analysis.StopFilter;
    • import org.apache.lucene.analysis.LowerCaseTokenizer;
    • import org.apache.lucene.analysis.KStemFilter;
    • import java.io.Reader;
    • public class KStemAnalyzer extends Analyzer
    • {
    • public final TokenStream tokenStream(String fieldName, Reader reader)
    • {
    • return new KStemFilter(new LowerCaseTokenizer(reader));
    • }
    • }
analyzer ii
Analyzer-II
  • Snowball Stemmer
    • A stemmer language created by Porter used to build Stemmers
      • Multilingual analyzers/Stemmers
    • Porter2
    • Fully Integrated with Lucene 1.9.1
  • MyAnalyzer(Home Built)
    • Demo
adding documents
Adding Documents
  • The Next step after creating an index is to add documents
    • writer.addDocument(FileDocument.Document(file));
    • Remember we already determined how the document will be tokenized
  • Fields
    • Can split document in to parts such as document title,body,date created, paragraphs
adding documents ii
Adding Documents-II
  • Assigns Token/doc ID
      • For why this is important see Lucene –TreeHouse Wiki
  • Create some type of loop to add all the documents
  • This is the actual creation of the Index before we merely set the Index parameters
finalizing index creation
Finalizing Index Creation
  • After that the Index is optimized with writer.optimize();
    • Merges etc.
  • The Index is close with writer.close();
searching an index
Searching an Index
  • Open Index
    • IndexReader reader = IndexReader.open(index);
  • Create Searcher
    • Searcher searcher = new IndexSearcher(reader);
  • Assign Analyzer
    • Use the same Analyzer used to create Index (Why?)
searching an index ii
Searching an Index-II
  • Parse/Create query
    • Query query = QueryParser.parse(line, field, analyzer);
    • Takes a line, looks for a particular field, and runs it through an analyzer to create query
  • Determine which documents are matches
    • Hits hits = searcher.search(query);
retrieving documents
Retrieving Documents
  • Hits creates a collection of documents
  • Using a loop we can reference each doc
    • Document doc = hits.doc(i);
    • This allows us to get info about the document
      • Name of document, date is was created, words in document
      • Relevancy Score(TF/IDF)
        • Demo
finishing searching
Finishing Searching
  • Return list of documents
  • Close Reader
other functions
Other Functions
  • Spans (Example from http://lucene.apache.org/java/docs/api/index.html)
    • Useful for Phrasal matching
    • Allows for Passage Retrieval
questions
Questions?
  • Any Questions, comments, jokes, opinions??