Search bootstrapping
Download
1 / 4

Search Bootstrapping - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Search Bootstrapping. How / Where to get started. Crawling. Start with Nutch http:// nutch.apache.org / Index directly to SOLR http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr / Create a seed list from DMOZ rdf http://www.dmoz.org/ rdf.html

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Search Bootstrapping' - arch


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Search bootstrapping

Search Bootstrapping

How / Where to get

started


Crawling
Crawling

  • Start with Nutch

    • http://nutch.apache.org/

  • Index directly to SOLR

    • http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/

  • Create a seed list from DMOZ rdf

    • http://www.dmoz.org/rdf.html

    • http://wiki.apache.org/nutch/NutchTutorial


Understanding content
Understanding Content

  • Entity Extraction

    • LingPipehttp://alias-i.com/lingpipe/

    • OpenNLPhttp://incubator.apache.org/opennlp/

  • Entity Identification / Taxonomies

    • Freebase http://www.freebase.com/


Some additional links
Some Additional Links

  • Basic Web Page Parser

    • https://github.com/pjaol/Webcrawler

  • Example of OpenNLP usage

    • https://github.com/pjaol/entity_extractor


ad