1 / 10

Public Email Conversations

CEES: Intelligent Access to Public Email Conversations William Lee, Hui Fang, Yifan Li, ChengXiang Zhai University of Illinois at Urbana-Champaign. Clustering and Similarity Function. Goal: Find commonly-discussed topics from a set of conversations (threads)

emmarichard
Download Presentation

Public Email Conversations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEES: Intelligent Access to Public Email Conversations William Lee, Hui Fang, Yifan Li, ChengXiang Zhai University of Illinois at Urbana-Champaign Clustering and Similarity Function • Goal: Find commonly-discussed topics from a set of conversations (threads) • Use agglomerative clustering with complete link • Learn similarity functions from different “perspectives” of threads: • authors, date, subject, contents, contents without quote, first message, reply, reply without quote. • Use Linear and Logistic Regression to learn the combined similarity function Public Email Conversations Existing technologies Architecture Browse Search • Information within newsgroups or mailing lists has largely been underutilized. • For now, access to those data restricted to traditional searching and browsing. • Mail traffic also grows exponentially. • Can we access those information more effectively? • CEES = Conversation Extraction and Evaluation Service • Provide a general framework for email-related research • Integrate with popular open source projects such as Lucene, Hibernate, Tapestry, Weka, and more… • Features object-to-relational mapping of mail metadata, mail threading, flexible indexing, conversation clustering, and a web-base GUI Conclusion Clustering Results Conversation Map • Clustering • Learning the similarity function can be effective in combing different “perspectives” • Conversation Map • Can give overview of a conversation group • Effective use of 2-D space • Future Work • Derive better algorithms to learn the similarity function • Faster clustering algorithms that work on mining patterns in conversations • Summarization of conversation clusters • Use 3 Computer Science class newsgroups from Univ. of Illinois at Urbana-Champaign for corpus • Three different human taggers to group messages into subtopics • 3-way cross validation, using one group’s judgment file as training set and test on the other two. • Use class and cluster entropy as comparison metric • Conversation visualization derived from Treemap • Clusters sorted by two extra time dimensions: Intra-Cluster Time and Inter-Cluster Time • Allows user to adjust the similarity threshold -- “zoom” to the more similar threads

  2. Build Autonomic Data Integration Systems! Autonomic Data Integration Systems Autonomic Computing • Shift workload from administrators & developers onto system • Self-tuning, self-maintaining, self-recovering from failures, self-improving • Examples: • Self-tuning databases and query optimizers • Self-profiling software for bug and bottleneck detection • Self-recovering distributed systems Data Integration Systems • Many complex components: • Global schema • Sources, wrappers, and source schemas • Semantic mappings between source and global schemas • Currently built (semi-)manually in error-prone and laborious process • Extremely difficult to maintain over changing sources

  3. The AIDA Project Automatic Integration of DAta • Fast system deployment • Minimal human effort • Automatic adjustment to changes • Continuous improvement • Improving Automatic Methods • Schema & ontology matching [SIGMOD-01, WWW-02, SIGMOD-04] • Entity matching & integration [IJCAI-03, IEEE Intelligent-03] • Global interface construction [SIGMOD-04] • Reducing Costs of System Construction • Mass collaboration to build systems [WebDB-03, IJCAI-03] • Monitoring Data Sources and Maintaining DI Systems • Recognition of changes in source data • Detection and repair of failures in DI system components The Focus of This Talk

  4. The MOBS Approach Mass COllaboration to Build Systems • Shift workload from developers onto user population • Build system accurately with low individual effort Developers Automatic Techniques System Initialization Source Discovery Form Recognition Query Translation Attribute Matching MOBS MOBS User Population

  5. MOBS Applied to the Deep Web • MOBS for Query Interface Matching • Decompose task into binary statements • Initializesmall functioning system • Solicit and merge user answers to expand the system • Statements for Matching “Writer” • “Writer = Author” ? • “Writer = Title” ? • “Writer = Price” ? Author: Title: Price: Year: Title: Author: Title: Writer: Title: Authors: Cost: Price: Price: Pub:

  6. How to Solicit User Answers • Incentive Models • Leverage a monopoly or better-service system • Piggy-back on a helper application • Deploy in a volunteer or community environment 1 0 HOOP 3 Is this form a Book Sales source? NO YES Barnes & Noble Author Title 2 Pub 0 Price

  7. How to Merge User Answers • Bayesian Learning • Use a dynamic Bayesian network as a generative feedback model • Estimate user behavioral parameters from evaluation answers • Converge statements from teaching answers Author: Title: Price: Year: Title: Author: Title: Writer: Title: Cost: Authors: Price: Price: Pub:

  8. Applicability of the MOBS Approach We Have Applied MOBS in Various Settings… • Scale: from a small community intranet to a highly trafficked website • Users: from cooperative expert volunteers to unpredictable novice users … and to Several DI Tasks • Deep Web: Form Recognition, Interface Matching • Surface Web: Hub Discovery, Data Extraction, Mini-CiteSeer

  9. Simulation and Real-World Results

  10. Conclusion & Future Work MOBS is an Effective Data Integration Tool • Requires small start-up and administrative costs • Solicits minimal effort per user • Constructs system accurately • Complements existing DI techniques • Applies to various scenarios and DI domains Future Work • Leverage implicit feedback • Intelligently maintain system • More tightly integrate with existing DI techniques • Deploy compelling real-world applications http://anhai.cs.uiuc.edu/home/projects/aida.html

More Related