60 likes | 171 Views
This paper by Hemali Majithia explores a caching mechanism designed to enhance the performance of agent-based distributed information retrieval (DIR) systems. By caching new queries and their results, the solution minimizes redundant searches, improving both speed and resource utilization. The two-level caching system employs strategies such as Least Recently Used (LRU) and Least Frequently Used (LFU) algorithms, alongside a similarity metric for efficient query handling. By leveraging previous query results, the system can significantly reduce round trip time and improve retrieval efficiency.
E N D
Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia Hemali Majithia - CADIP, UMBC
Problem Definition • DIR (IR) systems access their collections to perform searches and answer queries • Query resolution on large corpora is expensive in terms of time and resources • Similar queries produce similar results • Repetitive and redundant searching of the collections • Resource Wastage and Inefficiency • Solution – “ CACHING QUERIES ” Hemali Majithia - CADIP, UMBC
Solution • Caching Mechanism • Cache new queries along with the results • Answer future similar queries using the cached queries • New Query • Query which has not been answered before • Similar Query • Query which is identical or similar to the queries existing in the cache • Emphasis • If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match • Retrieval linear time collection size Hemali Majithia - CADIP, UMBC
Caching Mechanism • Two level Caching Mechanism • First level Exact Match • Second level Inverted Index of the queries • Caching Algorithm • Least Recent Used (LRU) • Least Frequent Used (LFU) • Lowest Relative Value (LRV) • Similarity Metric • Cosine Similarity Hemali Majithia - CADIP, UMBC
Secondary Cache Secondary Cache 9.. Update cache 5. Miss 3. MISS 4. Query forwarded 10. Results returned 8. HIT 2. Lookup 11. Response 7. Lookup 1. User query 6. Query forwarded to best C2 Primary cache Primary cache Primary cache Primary cache Primary cache Primary cache Caching in CARROT–II Node I Node II Query Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent Hemali Majithia - CADIP, UMBC
Metrics for Evaluation of Caching Mechanism • Efficiency • Round Trip Time (RTT) = Total time to answer queries fired at the system • Hit Rate = For each agent cache and total hit rate • Cost of caching = The over head caused by caching (assuming that the HIT rate is 0) • Effectiveness • Precision =fraction of retrieved documents that are relevant • Recall =fraction of relevant documents that are retrieved Hemali Majithia - CADIP, UMBC