search bootstrapping
Download
Skip this Video
Download Presentation
Search Bootstrapping

Loading in 2 Seconds...

play fullscreen
1 / 4

Search Bootstrapping - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Search Bootstrapping. How / Where to get started. Crawling. Start with Nutch http:// nutch.apache.org / Index directly to SOLR http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr / Create a seed list from DMOZ rdf http://www.dmoz.org/ rdf.html

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Search Bootstrapping' - arch


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
search bootstrapping

Search Bootstrapping

How / Where to get

started

crawling
Crawling
  • Start with Nutch
    • http://nutch.apache.org/
  • Index directly to SOLR
    • http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/
  • Create a seed list from DMOZ rdf
    • http://www.dmoz.org/rdf.html
    • http://wiki.apache.org/nutch/NutchTutorial
understanding content
Understanding Content
  • Entity Extraction
    • LingPipehttp://alias-i.com/lingpipe/
    • OpenNLPhttp://incubator.apache.org/opennlp/
  • Entity Identification / Taxonomies
    • Freebase http://www.freebase.com/
some additional links
Some Additional Links
  • Basic Web Page Parser
    • https://github.com/pjaol/Webcrawler
  • Example of OpenNLP usage
    • https://github.com/pjaol/entity_extractor
ad