1 / 16

State Department Cables Information Retrieval System

State Department Cables Information Retrieval System. Fall 2007 LBSC 796 Erica Cooper, Linda Melchor Chris Reed, Jo-Han Rong Dave Rouff, Jess Snyder. Overview. About the collection Nature of the expected users About the search tool Batch evaluation Results User Study Results

kedma
Download Presentation

State Department Cables Information Retrieval System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. State Department Cables Information Retrieval System Fall 2007 LBSC 796 Erica Cooper, Linda Melchor Chris Reed, Jo-Han Rong Dave Rouff, Jess Snyder

  2. Overview • About the collection • Nature of the expected users • About the search tool • Batch evaluation Results • User Study Results • Next Steps (Features to add, given time to do it)

  3. About the Collection • The U.S. State Department • Branch of the Federal Government responsible for U.S. foreign relations, diplomatic policy, and protecting U.S. citizens abroad. • 1973 to 1975 diplomatic communications • Behind the scenes look at international relations. • World events of the time: End of Vietnam war, Watergate, Bush (senior) becomes ambassador to China then DCI

  4. Anticipated User Base • For the nature of the collection, the IR system will be used by researchers who want to see US opinion change as events unfold over time. • Users will not be looking for only one message on a topic, but all messages on a topic • Users may not know telegram format for addresses and TAGS

  5. About the Search Tool • State Cables IR system developed in Java using the following resources • NetBeans IDE • Apache • Lucene toolkit - Information Retrieval tools • Digester - import XML • Two major components • Importing XML formatted messages and building index • User GUI and Index Searcher

  6. Japan Tokyo 4793 3173 9326 Africa XZ 10946 Iraq IZ 2534 258 960 4172 1515 2783 3236 5440 Benefits: Geographic and TAGS abstraction • In the early 1970’s, telegram authors were encouraged to be brief, so left out key terms assumed to be known to the recipient.

  7. Batch Evaluation Results • Inherent OR for search terms and abstraction terms causes increased recall, but lower precision.

  8. Our System

  9. NARA’s System

  10. Side-by-side Comparisons

  11. User Study Results • Do novice users find the system easy to learn? • All the volunteers considered the system easy enough for a novice to use. However, one volunteer stated, “A person used to Google would expect more. ” • Can users easily learn to formulate effective queries using our system? • The responses were yes. • However, observations showed that we should emphasize that Boolean queries can be used. Initial search results were very large due to vague queries.

  12. User Study Results • Are there common mistakes or misunderstandings that can be addressed for a better design? • “AND” should be automatically capitalized so it is understood as a Boolean term not a query keyword. • ”It would be nice to be able to see the whole Subject/Title" so that it is easier to select which ones she wants to read. • What are their expectations for the system? • Users found that the system met their expectations. • One volunteer stated that that system was more "user friendly compared to NARA's current system.”

  13. User Study Results • What would they like to see in the design of the system? • Being able to hit the “Enter” key instead of having to click on the “Perform Search” button • Limit the number of hits per search • Make the interface wider • Provide summaries of articles from the result • Provide feedback that tells user that their request is being processed • A layout of that would separate the results to make it easier to read

  14. User Study Results • Were the added features, geographic abstraction” and TAGS abstraction used? If so, were they useful? • One volunteer used the added features, but could not tell if they worked. • Any suggestions or comments? • Highlight search terms • Jazz it up more visually

  15. Next Steps • Bugs to fix and features to add given additional time • option to sort results by date rather than score • Telegram DTG format interpreted as a string, resulting in string based sorting. • Warning for large result set and option to cancel search before committing to wait • pull search function out of button click/hit list click so it is persistent past the click event • currently the system requires when a message is selected from the hit list • option to export results to a file • web GUI • option to accept or reject proposed abstraction tools • ability to recognize multi-word search terms

  16. Backup GUI Screenshot • …TBD

More Related