1 / 16

Jolyon Hunter cs91jh@surrey.ac.uk jrth.co.uk Tuesday 6 th May 2003

Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation. Jolyon Hunter cs91jh@surrey.ac.uk www.jrth.co.uk Tuesday 6 th May 2003. Introduction. Aim

petula
Download Presentation

Jolyon Hunter cs91jh@surrey.ac.uk jrth.co.uk Tuesday 6 th May 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Classification of USENET messages for a Conversation Visualisation SystemFinal Year ProjectFinal Presentation Jolyon Hunter cs91jh@surrey.ac.uk www.jrth.co.uk Tuesday 6th May 2003

  2. Introduction • Aim • “To investigate how messages and conversations on USENET newsgroups can be classified automatically as part of a system to visually represent online discussions.” • Objectives • To review systems which visualise online discussions -enabling the identification of phenomena to be visualised • To analyse 250,000+ word corpus of text – try to identify potential cues for classification • To specify and design a system for automatic classification of messages/conversations • To implement, test and evaluate this system TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  3. Conversation Visualisation Systems?For example… “PeopleGarden” Others include:“Loom” (Donath et al), “Netscan” (Smith) and “Conversation Map” (Sack), and “CodeZebra” (Diamond et al) Xiong, Rebecca & Donath, Judith 1999 “PeopleGarden: Creating Data Portraits for Users” MIT Media Laboratory http://smg.media.mit.edu/~becca/ TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  4. Phenomena to Visualise… …and how to do it! • Emotion (“Happy”, “Sad”) • Agreement/Disagreement (“Argument”) • Involvement – Sense of Community • Character traits of users and many more… How to Classify? Automated Text Analysis • “Smokey” (Spertus) • “WebSOM” (Kohonen) • “CLUTO” (Karypis) TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  5. Analysis Overview • HOW?Initial Observations – phenomena +featuresIn-depth corpus analysis • WHAT?6000+ messages from various newsgroups (4 million+ words) • UniS/CodeZebraWorkshop – features (words) • Using System Quirk to extract words; frequency counting (Kontext) >> Relative Frequencies • Using gCLUTO to visualise data for interpretation • WHY?Formulate programmablerules to code into a system TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  6. gCLUTO Visualisations • Visualise clusters and the relationships between clusters • Possible to see patterns or heuristics to help derive rules • CLUTO has potential for future use within a system to automatically classify text - e.g. real-time clustering TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  7. Analysis: Creating Rules • Possible to derive example rules from analysis • More analysis – random sample using 6 classes: • Similar patterns emerge • Example rules also >>> SYSTEM! TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  8. System Development • Process Model of Software Engineering:Requirements, Design, Implementation, Testing and Evaluation • “System”:System Quirk > Rules > Program > CLASSIFICATION • Rule-Based Processor: IF..THEN.. Rules coded into Perl program to produce classifications TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  9. Generic Conversation Visualisation System TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  10. “Message Text Analysis” Module TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  11. Perl Code: Key points IF…THEN… RULES (as seen earlier) CLASS COUNTER: if(($word eq "agree") && ($relative{$word} > 0.003)) { $AGREEMENT++; } CLASSIFICATIONS… if ($AGREEMENT >= 2){ $classification = "AGREEMENT"; } if ($ARGUMENT >= 2) { $classification = "ARGUMENT"; } TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  12. Testing & Evaluation • Ten sample messages either “Agreement” or “Disagreement” • Small sample • Key excerpts given to human testers (ten people) – asked to rate • System vs. Humans! • System correct 3 times, most inconclusive • Human responses correlate with system, but ambiguities also exist • Conclusions?Results not conclusive but show promise > Larger sample; more research; TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  13. Recap: Mission Accomplished? • Aim • “To investigate how messages and conversations on USENET newsgroups can be classified automatically as part of a system to visually represent online discussions.” • Objectives • To review systems which visualise online discussions -enabling the identification of phenomena to be visualised • To analyse 250,000+ word corpus of text – try to identify potential cues for classification • To specify and design a system for automatic classification of messages/conversations • To implement, test and evaluate this system TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  14. Text Classification of USENET messages for a Conversation Visualisation System Thanks for listening… Any Questions? TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  15. Final Report The Final Report for this project is also available online at: www.jrth.co.uk TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

  16. REFERENCES • “Loom" Judith DonathDonath, Judith 2002 “A Semantic Approach to Visualising Online Conversation” Communications of the ACM 45(4): 45-49http://web.media.mit.edu/~kkarahal/loom/index.html • “Conversation Map” Warren SackSack, Warren 2000 “Design for Very Large-Scale Conversations” Ph.D. Thesis, February 2000, MIT Media Laboratory http://www.sims.berkeley.edu/~sack/cm/ • “Netscan” Marc SmithSmith, Marc. 2001. “Netscan: A tool for measuring and mapping social cyberspaces.” http://netscan.research.microsoft.com • “PeopleGarden” Rebecca Xiong & Judith DonathXiong, Rebecca & Donath, Judith 1999 “PeopleGarden: Creating Data Portraits for Users” MIT Media Laboratory http://smg.media.mit.edu/~becca/ • “CodeZebra”Sara DiamondDiamond, Sara (Project Leader) - Banff New Media Institute, Canada plus many others (inc. Dr. A. Salway, University of Surrey)http://www.codezebra.net • “Smokey” Ellen SpertusSpertus, Ellen 1997 "Smokey: Automatic Recognition of Hostile Messages,“ Innovative Applications of Artificial Intelligence ‘97http://www.spertus.com/ellen/ • “WebSOM” Teuvo KohonenKohonen, T. 1996 onwards: more details at http://websom.hut.fi/websom/ • “CLUTO” George KarypisKarypis, George - 2002 - “CLUTO”, “gCLUTO” and “wCLUTO” University of Minnesota, MN USA Software available from http://www-users.cs.umn.edu/~karypis/cluto/ TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM

More Related