what won t change
Download
Skip this Video
Download Presentation
What won’t change

Loading in 2 Seconds...

play fullscreen
1 / 12

What won’t change - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

What won’t change. Harvest’s basic design SOIF for inter-component communication Development model. General Goals. Increase search speed Shift focus to HTTP and HTML Internationalisation Improve scalability Increase availability Improve access control. General Goals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' What won’t change' - terah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what won t change
What won’t change
  • Harvest’s basic design
  • SOIF for inter-component communication
  • Development model

http://harvest.sourceforge.net/

general goals
General Goals
  • Increase search speed
  • Shift focus to HTTP and HTML
  • Internationalisation
  • Improve scalability
  • Increase availability
  • Improve access control

http://harvest.sourceforge.net/

general goals1
General Goals
  • Integration of other search systems into Harvest system
  • Remove all non GPLed components
  • Improve ranking
  • Promote Harvest to attract more users and developers

http://harvest.sourceforge.net/

gatherer
Gatherer
  • Shift focus to HTTP
  • Improve gathering over slow connection
  • Improve HTTP gatherer
  • Create multiple Gatherers “on the fly” where possible
  • Evaluate larbin and curl
  • Migrate from GDBM to Sleepycat’s DB for local disc cache management

http://harvest.sourceforge.net/

gatherer1
Gatherer
  • Remove local disc cache
  • Implement candidate selection filter for HTTP enumerator based on mime type
  • Trust mime type sent by HTTP servers
  • Add HTTPS support
  • Evaluate improvements of HTTP 1.1 over HTTP 1.0
  • Replace unnesters with exploders

http://harvest.sourceforge.net/

gatherer2
Gatherer
  • Improve object storage system
  • Improve expiring objects
  • Evaluate viability of an expire daemon
  • Split file: and news: rootnodes into leafnodes
  • Remove All-Templates
  • Make SOIF objects shareable between Gatherer and Broker if possible

http://harvest.sourceforge.net/

summarizer
Summarizer
  • Shift focus to HTML
  • Improve existing HTML summarizers
  • Create HTML summarizer which “understands” HTML
  • Improve support for Microsoft Office documents

http://harvest.sourceforge.net/

broker
Broker
  • Add Indexdata’s Zebra as fulltext indexer
  • Implement method to retrieve an SOIF object by URL
  • Improve temporary file/directory handling used for paging search results
  • Improve SOIF object storage
  • Extend “shell indexer” functionality

http://harvest.sourceforge.net/

broker1
Broker
  • Implement an user interface in PHP
  • Separate data from metadata when storing SOIF objects
  • Minimise size of Registry
  • Use cookies to save user preferences of the search interface
  • Evaluate and write SOIF filter for Namazu

http://harvest.sourceforge.net/

broker2
Broker
  • Evaluate RDBMS (Postgresql, MySQL)
  • Evaluate Xquery and SOAP

http://harvest.sourceforge.net/

documentation
Documentation
  • Switch from linuxdoc to docbook for manual and FAQ

http://harvest.sourceforge.net/

problems
Problems
  • PostScript and PDF summarizers
  • Apache’s multiviews
  • IMS Gathering
  • Stemming and Soundex are language dependant
  • Language recognition
  • No free thesauri available

http://harvest.sourceforge.net/

ad