Building a Vertical Search Site

Building a VerticalSearch Site (using lots of Apache software, of course)

Just the Facts, Ma’am • Ken Krugler - CTO/co-founder of Krugle • We use lots of Apache S/W at krugle.org • Httpd, Lucene, Nutch, Solr, Xerces, etc. • I’ll describe our architecture • And the sometimes painful lessons learned

Three Faces of Krugle • Free public site - http://www.krugle.org • Partner sites • http://sourceforge.krugle.com • http://developerworks.krugle.com • http://aws.krugle.com • Enterprise appliance

Krugle.org free public site • Search code, projects, & technical web pages • 150,000 projects • 2.5billion lines of code • 40million web pages

Krugle.org Architecture (web) • Web tier runs Apache • Also mod_perl • “glue” for Javascript to backend RESTful API • Partner APIs • “Dirty” side of system

Krugle.org Architecture (API) • API server uses Resin • Webapps provide RESTful API services • Filer is big disk array • LightTPD, NFS • Searchers run Hadoop, Lucene

Krugle.org Architecture (CPI) • Page crawl uses Nutch • Code crawl uses bits of Nutch, custom stuff • Fuzzy parsers created using ANTLR • Project data in MySQL, pushed to Solr • Code index is Lucene

Krugle partner sites • IBM developerWorks • Sourceforge.net • Amazon Web Services • Yahoo! Dev Network • Collabnet

Krugle Architecture (partners) • Higher level API • Wraps RESTful API • Handled in web tier • Big chunks of Perl • LightTPD cache

Krugle enterprise server • Krugle inside firewall • Talks to major SCMs • SCM Comment search • Includes public site info

Krugle Architecture (enterprise) • Collapses web tier, API server, code searchers, filer, and DB server • Separate admin system (DB, GUI, code crawler, configuration) as Jetty-hosted webapp

RESTful API • HTTP requests, XML responses • Works well with Perl middleware • Some load/memory issues • Solr integration challenges • Integration test challenges

Key Lessons • If it isn’t broke, don’t upgrade • There’s always a newer version • That includes the build system • Be prepared to pay for free software • Motivating project contributors to do things • Moderation in architectural abstraction • There’s always a higher and lower option

Building a Vertical Search Site

Building a Vertical Search Site

Presentation Transcript

Building a Web Site

Search Engine Optimization 101 Building a Search Engine Friendly Web Site.

Search Engine Optimization 101 Building a Search Engine Friendly Web Site.

Object-Level Vertical Search

Specialty/Vertical Search Engines

BUILDING YOUR SITE

Building site – game

Building Blocks for developing a Project SEARCH site Texas Transition Conference

Site-wide Search

Internship Site Search

Building a Great News Web Site

Building a mobile site at UNC

Building a Web Site

MP3JUICE Music Search Site

Building a Web Site

Building a Vertical Search Site

7 Signs of a Broken Site Search

Vertical Search Engine Platadata.io

Plato Blockchain Vertical Search Engine