What won t change
Download
1 / 12

What won’t change - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

What won’t change. Harvest’s basic design SOIF for inter-component communication Development model. General Goals. Increase search speed Shift focus to HTTP and HTML Internationalisation Improve scalability Increase availability Improve access control. General Goals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' What won’t change' - terah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
What won t change
What won’t change

  • Harvest’s basic design

  • SOIF for inter-component communication

  • Development model

http://harvest.sourceforge.net/


General goals
General Goals

  • Increase search speed

  • Shift focus to HTTP and HTML

  • Internationalisation

  • Improve scalability

  • Increase availability

  • Improve access control

http://harvest.sourceforge.net/


General goals1
General Goals

  • Integration of other search systems into Harvest system

  • Remove all non GPLed components

  • Improve ranking

  • Promote Harvest to attract more users and developers

http://harvest.sourceforge.net/


Gatherer
Gatherer

  • Shift focus to HTTP

  • Improve gathering over slow connection

  • Improve HTTP gatherer

  • Create multiple Gatherers “on the fly” where possible

  • Evaluate larbin and curl

  • Migrate from GDBM to Sleepycat’s DB for local disc cache management

http://harvest.sourceforge.net/


Gatherer1
Gatherer

  • Remove local disc cache

  • Implement candidate selection filter for HTTP enumerator based on mime type

  • Trust mime type sent by HTTP servers

  • Add HTTPS support

  • Evaluate improvements of HTTP 1.1 over HTTP 1.0

  • Replace unnesters with exploders

http://harvest.sourceforge.net/


Gatherer2
Gatherer

  • Improve object storage system

  • Improve expiring objects

  • Evaluate viability of an expire daemon

  • Split file: and news: rootnodes into leafnodes

  • Remove All-Templates

  • Make SOIF objects shareable between Gatherer and Broker if possible

http://harvest.sourceforge.net/


Summarizer
Summarizer

  • Shift focus to HTML

  • Improve existing HTML summarizers

  • Create HTML summarizer which “understands” HTML

  • Improve support for Microsoft Office documents

http://harvest.sourceforge.net/


Broker
Broker

  • Add Indexdata’s Zebra as fulltext indexer

  • Implement method to retrieve an SOIF object by URL

  • Improve temporary file/directory handling used for paging search results

  • Improve SOIF object storage

  • Extend “shell indexer” functionality

http://harvest.sourceforge.net/


Broker1
Broker

  • Implement an user interface in PHP

  • Separate data from metadata when storing SOIF objects

  • Minimise size of Registry

  • Use cookies to save user preferences of the search interface

  • Evaluate and write SOIF filter for Namazu

http://harvest.sourceforge.net/


Broker2
Broker

  • Evaluate RDBMS (Postgresql, MySQL)

  • Evaluate Xquery and SOAP

http://harvest.sourceforge.net/


Documentation
Documentation

  • Switch from linuxdoc to docbook for manual and FAQ

http://harvest.sourceforge.net/


Problems
Problems

  • PostScript and PDF summarizers

  • Apache’s multiviews

  • IMS Gathering

  • Stemming and Soundex are language dependant

  • Language recognition

  • No free thesauri available

http://harvest.sourceforge.net/


ad