1 / 19

Weimao Ke wke@indiana School of Library and Information Science Indiana University Bloomington

Collaborative Classifier Agents Studying the Impact of Learning in Distributed Information Retrieval. Weimao Ke, Javed Mostafa, and Yueyu Fu. Submitted to SIGIR 2006. Weimao Ke wke@indiana.edu School of Library and Information Science Indiana University Bloomington. Layout. Introduction

Download Presentation

Weimao Ke wke@indiana School of Library and Information Science Indiana University Bloomington

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaborative Classifier AgentsStudying the Impact of Learning in Distributed Information Retrieval Weimao Ke, Javed Mostafa, and Yueyu Fu. Submitted to SIGIR 2006. Weimao Ke wke@indiana.edu School of Library and Information Science Indiana University Bloomington

  2. Layout • Introduction • Classification and Knowledge Distribution • Learning in a Distributed Environment • Experiment Design and Setup • Experimental Results • Future work

  3. Introduction • Distributed nature of knowledge • Collaboration is important • e.g., WWW • Distributed Information Retrieval • vs. traditional/centralized IR • e.g., intra-system retrieval fusion, cross-system communication, decentralized P2P network, and distributed information storage and retrieval algorithms, etc. • Our focus • Modeling distributed agent collaboration for text/info classification Motivation: Why distributed? Why not centralized?

  4. Knowledge Distribution • In classification, knowledge = class vectors • Vector Space Model (VSM) • Traditional centralized approach: • All class vectors in one place • Global knowledge • Our distributed approach: • A subset of class vectors  each agent • Local distributed/partial/limited knowledge Motivation: Why distributed? Why not centralized?

  5. Distributed Information Classification Presentation: Agent topology? Everyone knows each other? 2 1 Motivation: Why not centralized in AdminAgent? 2 Admin Agent 1 3 Documents # Agent (#: collaboration range) Motivation: Why ? The need of this model streaming? 4 Collaboration request Collaboration response

  6. Learn to Collaborate Document Agent Compare the doc to every local class Max similarity score >= Threshold? Yes Label the doc (classified) No Learning… Ask for help…but WHO should I ask?

  7. Pursuit Learning • Pursuit Learning – a reinforcement learning • Action probability vector P= [p1..pn]N • N: # actions = # neighbor agents • Exploration rate: r • The rate of predicting a helping agent randomly • To “explore” without using learned knowledge • To predict a helping agent when one fails • Randomize a rand • If rand< r, randomly choose an agent for help • Otherwise, predict based on vector P • To learn when another agent has helped • Reward • Update vector P Presentation: Redundant description in PL and NCL algorithms?

  8. Nearest Centroid Learning • Nearest Centroid Learning – content sensitive • Neighbor centroid vector C= [c1..cn]N • N: # actions = # neighbor agents • Each element ci is the centroid of documents the ith neighbor agent has helped with • Exploration rate: r • To predict when one fails to classify a doc • Randomize a rand • If rand< r, randomly choose an agent for help; • Otherwise, find the nearest centroid in vector C to the current document and ask the corresponding agent for help • To learn when another agent has helped • Update the corresponding centroid by including the document • Nearest Centroid Learning – content sensitive • Pursuit Learning – notcontent sensitive

  9. Experiment Design • Reuters Corpus Volumes 1 (RCV1) • Training set: 6,394 documents • Test set: 2,500 documents • Feature selection: 4,084 unique terms • Evaluation measures • Precision = a / (a + b) • Recall = a / (a + c) • F1 = 2 * P * R / (P + R) Impact: - Small test collection size, scalability, etc. - Larger, more recent collections suggested.

  10. Hardware and Software Setup • MACCI • Multi-Agent Collaboration forClassification ofInformation • Cougaar Agent Architecture • Weka machine learning framework • Hardware • Dual Intel Xeon 2.8 GHz CPUs • 3.5 GB RAM (2GB reserved) • Software • Redhat Linux AS 4 • Java Runtime Environment 1.5.0 0

  11. Results - Effectiveness Baselines • Presentation/Argument: • Use both Micro & Macro F scores; • ROC curves suggested.

  12. Results – Effectiveness of Learning PL optimal zone random

  13. Cumulative effectiveness over time Results - Learning Progression and Latency Pursuit Learning Nearest Centroid Learning Pursuit Learning Nearest Centroid L

  14. In-session classification effectiveness over time Results - Learning Progression and Latency

  15. Efficiency baseline Results - Classification efficiency

  16. Results – Classification efficiency

  17. Results – Efficiency vs. Effectiveness

  18. Summary • Classification effectiveness decreases dramatically when knowledge becomes increasingly distributed. • Pursuit Learning • Efficient – without analyzing contents • Effective, although not content sensitive • “The Pursuit Learning approach did not depend on document content. By acquiring knowledge through reinforcements based on collaborations this algorithm was able to construct/build paths for documents to find relevant classifiers effectively and efficiently.” • Nearest Centroid Learning • Inefficient – to analyze content • Effective • Learning did not converged in some experiments • A larger test set for future study • Exploration rate (r): a function instead of a constant

  19. Thank you  • Questions… • Comments… A copy of the submitted paper is available at: http://tara.slis.indiana.edu/macci/docs/sigir-agent-ke.pdf

More Related