What won’t change - PowerPoint PPT Presentation

What won t change
Download
1 / 12

  • 93 Views
  • Uploaded on
  • Presentation posted in: General

What won’t change. Harvest’s basic design SOIF for inter-component communication Development model. General Goals. Increase search speed Shift focus to HTTP and HTML Internationalisation Improve scalability Increase availability Improve access control. General Goals.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

What won’t change

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


What won t change

What won’t change

  • Harvest’s basic design

  • SOIF for inter-component communication

  • Development model

http://harvest.sourceforge.net/


General goals

General Goals

  • Increase search speed

  • Shift focus to HTTP and HTML

  • Internationalisation

  • Improve scalability

  • Increase availability

  • Improve access control

http://harvest.sourceforge.net/


General goals1

General Goals

  • Integration of other search systems into Harvest system

  • Remove all non GPLed components

  • Improve ranking

  • Promote Harvest to attract more users and developers

http://harvest.sourceforge.net/


Gatherer

Gatherer

  • Shift focus to HTTP

  • Improve gathering over slow connection

  • Improve HTTP gatherer

  • Create multiple Gatherers “on the fly” where possible

  • Evaluate larbin and curl

  • Migrate from GDBM to Sleepycat’s DB for local disc cache management

http://harvest.sourceforge.net/


Gatherer1

Gatherer

  • Remove local disc cache

  • Implement candidate selection filter for HTTP enumerator based on mime type

  • Trust mime type sent by HTTP servers

  • Add HTTPS support

  • Evaluate improvements of HTTP 1.1 over HTTP 1.0

  • Replace unnesters with exploders

http://harvest.sourceforge.net/


Gatherer2

Gatherer

  • Improve object storage system

  • Improve expiring objects

  • Evaluate viability of an expire daemon

  • Split file: and news: rootnodes into leafnodes

  • Remove All-Templates

  • Make SOIF objects shareable between Gatherer and Broker if possible

http://harvest.sourceforge.net/


Summarizer

Summarizer

  • Shift focus to HTML

  • Improve existing HTML summarizers

  • Create HTML summarizer which “understands” HTML

  • Improve support for Microsoft Office documents

http://harvest.sourceforge.net/


Broker

Broker

  • Add Indexdata’s Zebra as fulltext indexer

  • Implement method to retrieve an SOIF object by URL

  • Improve temporary file/directory handling used for paging search results

  • Improve SOIF object storage

  • Extend “shell indexer” functionality

http://harvest.sourceforge.net/


Broker1

Broker

  • Implement an user interface in PHP

  • Separate data from metadata when storing SOIF objects

  • Minimise size of Registry

  • Use cookies to save user preferences of the search interface

  • Evaluate and write SOIF filter for Namazu

http://harvest.sourceforge.net/


Broker2

Broker

  • Evaluate RDBMS (Postgresql, MySQL)

  • Evaluate Xquery and SOAP

http://harvest.sourceforge.net/


Documentation

Documentation

  • Switch from linuxdoc to docbook for manual and FAQ

http://harvest.sourceforge.net/


Problems

Problems

  • PostScript and PDF summarizers

  • Apache’s multiviews

  • IMS Gathering

  • Stemming and Soundex are language dependant

  • Language recognition

  • No free thesauri available

http://harvest.sourceforge.net/


  • Login