Search Bootstrapping

Search Bootstrapping How / Where to get started

Crawling • Start with Nutch • http://nutch.apache.org/ • Index directly to SOLR • http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/ • Create a seed list from DMOZ rdf • http://www.dmoz.org/rdf.html • http://wiki.apache.org/nutch/NutchTutorial

Understanding Content • Entity Extraction • LingPipehttp://alias-i.com/lingpipe/ • OpenNLPhttp://incubator.apache.org/opennlp/ • Entity Identification / Taxonomies • Freebase http://www.freebase.com/

Some Additional Links • Basic Web Page Parser • https://github.com/pjaol/Webcrawler • Example of OpenNLP usage • https://github.com/pjaol/entity_extractor

Search Bootstrapping

Search Bootstrapping

Presentation Transcript

Bootstrapping

BOOTSTRAPPING LINEAR MODELS

What is ‘bootstrapping’?

Bootstrapping

Introduction to Bootstrapping

Bootstrapping

Bootstrapping

Bootstrapping

Bootstrapping

Introduction to Bootstrapping

ISCSI Bootstrapping Draft

BOOTSTRAPPING STRATEGIES

Bootstrapping

Bootstrapping

What is ‘bootstrapping’?

Security Bootstrapping

International Bootstrapping

What is ‘bootstrapping’?

What is ‘bootstrapping’?