What won t change
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

What won’t change PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

What won’t change. Harvest’s basic design SOIF for inter-component communication Development model. General Goals. Increase search speed Shift focus to HTTP and HTML Internationalisation Improve scalability Increase availability Improve access control. General Goals.

Download Presentation

What won’t change

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


What won t change

What won’t change

  • Harvest’s basic design

  • SOIF for inter-component communication

  • Development model

http://harvest.sourceforge.net/


General goals

General Goals

  • Increase search speed

  • Shift focus to HTTP and HTML

  • Internationalisation

  • Improve scalability

  • Increase availability

  • Improve access control

http://harvest.sourceforge.net/


General goals1

General Goals

  • Integration of other search systems into Harvest system

  • Remove all non GPLed components

  • Improve ranking

  • Promote Harvest to attract more users and developers

http://harvest.sourceforge.net/


Gatherer

Gatherer

  • Shift focus to HTTP

  • Improve gathering over slow connection

  • Improve HTTP gatherer

  • Create multiple Gatherers “on the fly” where possible

  • Evaluate larbin and curl

  • Migrate from GDBM to Sleepycat’s DB for local disc cache management

http://harvest.sourceforge.net/


Gatherer1

Gatherer

  • Remove local disc cache

  • Implement candidate selection filter for HTTP enumerator based on mime type

  • Trust mime type sent by HTTP servers

  • Add HTTPS support

  • Evaluate improvements of HTTP 1.1 over HTTP 1.0

  • Replace unnesters with exploders

http://harvest.sourceforge.net/


Gatherer2

Gatherer

  • Improve object storage system

  • Improve expiring objects

  • Evaluate viability of an expire daemon

  • Split file: and news: rootnodes into leafnodes

  • Remove All-Templates

  • Make SOIF objects shareable between Gatherer and Broker if possible

http://harvest.sourceforge.net/


Summarizer

Summarizer

  • Shift focus to HTML

  • Improve existing HTML summarizers

  • Create HTML summarizer which “understands” HTML

  • Improve support for Microsoft Office documents

http://harvest.sourceforge.net/


Broker

Broker

  • Add Indexdata’s Zebra as fulltext indexer

  • Implement method to retrieve an SOIF object by URL

  • Improve temporary file/directory handling used for paging search results

  • Improve SOIF object storage

  • Extend “shell indexer” functionality

http://harvest.sourceforge.net/


Broker1

Broker

  • Implement an user interface in PHP

  • Separate data from metadata when storing SOIF objects

  • Minimise size of Registry

  • Use cookies to save user preferences of the search interface

  • Evaluate and write SOIF filter for Namazu

http://harvest.sourceforge.net/


Broker2

Broker

  • Evaluate RDBMS (Postgresql, MySQL)

  • Evaluate Xquery and SOAP

http://harvest.sourceforge.net/


Documentation

Documentation

  • Switch from linuxdoc to docbook for manual and FAQ

http://harvest.sourceforge.net/


Problems

Problems

  • PostScript and PDF summarizers

  • Apache’s multiviews

  • IMS Gathering

  • Stemming and Soundex are language dependant

  • Language recognition

  • No free thesauri available

http://harvest.sourceforge.net/


  • Login