1 / 19

Collaborative Classifier Agents

Studying the Impact of Learning in Distributed Document Classification. Collaborative Classifier Agents. b y Weimao Ke , Javed Mostafa , and Yueyu Fu { wke , jm , yufu }@ indiana.edu Laboratory of Applied Informatics Research Indiana University, Bloomington.

Download Presentation

Collaborative Classifier Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Studying the Impact of Learning in Distributed Document Classification Collaborative Classifier Agents by WeimaoKe, JavedMostafa, and YueyuFu {wke, jm, yufu}@indiana.edu Laboratory of Applied Informatics Research Indiana University, Bloomington Laboratory of Applied Informatics Research (LAIR)

  2. Outline • Introduction • Motivation and research questions • Research model and methodology • Experimental design • Experimental results • Summary and future work

  3. Intro: Centralized classification A Centralized Classifier Classifier Knowledge

  4. Intro: Why not centralized? • Global repository is rarely realistic • Scalability • Intellectual property restrictions • … ?

  5. Intro: Knowledge distribution Arts Sciences Distributed repositories History Politics

  6. Intro: Distributed classification Distributed Classification Classifier Knowledge Assignment Classification

  7. Intro: Collaborative classification Collaboration for classification Classifier Knowledge Assignment Classification

  8. Motivation & Research Questions • Why distributed classification? • The nature of knowledge distribution • Advantages of distributed classification: fault tolerance, adaptability, flexibility, privacy, etc. • Problems/Questions • Classifier agents have to learn and collaborate. But how? • Effectiveness and efficiency of agent collaboration? • The dynamics of agent collaboration?

  9. Methodology: Agent simulation • To compare • Centralized approach (upper-bound) • Distributed approach without collaboration (lower-bound) • Distributed approach with collaboration • Two learning/collaboration algorithms: • Algorithm 1: Pursuit Learning • Algorithm 2: Nearest Centroid Learning • Two parameters • r: Exploration Rate • g: Maximum Collaboration Range • Evaluation Measure • Effectiveness: precision, recall, F measure • Efficiency: time for classification • Precision = a / (a + b) • Recall = a / (a + c) • F1 = 2 * P * R / (P + R)

  10. Exp design: Data collection • Reuters Corpus Volumes 1 (RCV1) • 37 classes / categories • “Knowledge” represented by the training class vectors • Training set: 6,394 documents • Test set: 2,500 documents • Feature selection: 4,084 unique terms

  11. Exp design: baselines Knowledge division 1 agent • Without collaboration Upper bound baseline 2 agents 4 agents … 37agents Lower bound baseline

  12. Exp design: agent collaboration Knowledge division 1 agent • With collaboration • Algorithms • Pursuit leaning • Nearest centroid learning • Parameters • Exploration rate • Max collaboration range Upper bound baseline 2 agents 4 agents … 37 agents Lower bound baseline

  13. Exp design: Agent simulation model 2 • Parameters for learning/prediction: • 1. Exploration rate (r) • 2. Max collaboration range (g) 1 2 Doc Distributor 1 3 Documents # Classifier agent 4 Collaboration request Collaboration response

  14. Exp results: non-collaboration baseline

  15. Exp results: Collaboration effectiveness PL optimal zone random

  16. Exp results: learning dynamics Learning converged?

  17. Exp results: Classification efficiency

  18. Exp results: Efficiency vs. Effectiveness

  19. Summary • Classification effectiveness decreases dramatically when knowledge becomes increasingly distributed. • Pursuit Learning • Efficient – without analyzing contents • Effective, although not content sensitive • “The Pursuit Learning approach did not depend on document content. By acquiring knowledge through reinforcements based on collaborations this algorithm was able to construct/build paths for documents to find relevant classifiers effectively and efficiently.” • Nearest Centroid Learning • Inefficient – analyzing content • Effective • Future work • Other text collections • Knowledge overlap among the agents • Local neighborhood

More Related