1 / 12

The SMART System:

The SMART System:. Progress Report on System Acquisition and Set-Up. March 8, 2000 IS 240: Principles of Information Retrieval. Danyel Fisher Jonathan Henke Jason Hong. Jonathan Huang Jeane Stetson. Background. Developed 1961-64 at Harvard Maintained at Cornell University

Download Presentation

The SMART System:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SMART System: Progress Report on System Acquisition and Set-Up March 8, 2000 IS 240: Principles of Information Retrieval Danyel Fisher Jonathan Henke Jason Hong Jonathan Huang Jeane Stetson

  2. Background • Developed 1961-64 at Harvard • Maintained at Cornell University • Tested at every TREC conference • Emphasis: automatic retrieval (rather than interactive) • Vector-based analysis, tf x idf weighting • Current version: 13.3 (we have 11.0)

  3. Bibliography • Salton, Gerard. The SMART retrieval system; experiments in automatic document processing. Englewood Cliffs, N.J., Prentice-Hall. 1971 • Salton, Gerard. “Developments in Automatic Text Retrieval.” Science, 1991 Aug 30, v253 n5023:974-980. • TREC Proceedings • SMART Staff, “User's Manual for the SMART Information Retrieval System’”. Technical Report 71-95, Revised April 1974. Cornell University (1974). • C. Buckley, Implemetation of the SMART Information Retrieval System. Technical Report 85-686, Cornell University (1985).

  4. Indexing (Creating a Collection) • Document pre-parsing • recognize document structure and convert to a standard format • Finding & handling indexable information • parsing, stopword removal, stemming, term clustering, synonym dictionaries, etc. • Query handling • parsing, stopword removal, stemming, etc. (parallel to document handling)

  5. Indexing (Creating a Collection) • Retrieval methods • term weighting and similarity evaluation • Default: standard tf x idf weighting, vector inner product • Output format & display

  6. Indexing: Customizable Elements • Document location & format • Indexable information & index format • Query format • Retrieval method (document/query comparison) • Output/display format

  7. System Architecture • 350 source files • 45,000 lines of code • Can include user-programmed modules

  8. Set-up Procedure • Download source code • ftp://ftp.cs.cornell.edu/pub/smart • Compile • Look for documentation • Indexing completed using default settings • Unable to complete query yet • Unable to examine index • Cannot verify success of indexing!

  9. System Documentation • Minimal • Poorly explained • Cryptic • Uses their own specific terminology

  10. Problems Faced • Virtually every feature is customizable • Somewhere there are people who know how to do the customization….. • “SMART suffers from the advantages and disadvantages of most academic research software. It's designed to be extremely flexible (as long as you know what you're doing!)” - SMART manual • Documentation is too high level.

  11. Further Steps • Complete a query using default settings. • Identify specific files for adjusting each customizable feature. • Determine how to modify each feature.

  12. Recommendations & Advice • Find someone who has actually worked with the system before. • Understanding operation requires examination of C source code. • Customization requires modifying / creating C code.

More Related