1 / 48

Using Machine Learning to Monitor Collaborative Interactions

Using Machine Learning to Monitor Collaborative Interactions. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. VMT-Basilica (Kumar & Ros é, 2010). Labeled Texts. Labeled Texts. TagHelper. Behavior. Unlabeled Texts.

suchi
Download Presentation

Using Machine Learning to Monitor Collaborative Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Machine Learning to Monitor Collaborative Interactions Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. VMT-Basilica (Kumar & Rosé, 2010)

  3. Labeled Texts Labeled Texts TagHelper Behavior Unlabeled Texts A Model that can Label More Texts Time Monitoring Collaboration with Machine Learning Technology Download tools at: http://www.cs.cmu.edu/~cprose/TagHelper.html http://www.cs.cmu.edu/~cprose/SIDE.html <Triggered Intervention>

  4. TagHelper Tools and SIDE Define Summaries Annotate Data Visualize Annotated Data TagHelper Tools uses text mining technology to automate annotation of conversational data SIDE facilitates rapid prototyping of reporting interfaces for group learning facilitators http://www.cs.cmu.edu/~cprose/TagHelper.html http://www.cs.cmu.edu/~cprose/SIDE.html

  5. Important caveat!! • Machine learning isn’t magic • But it can be useful for identifying meaningful patterns in your data when used properly • Proper use requires insight into your data ?

  6. Data Target Representation Naïve Approach: When all you have is a hammer…

  7. Data Target Representation Naïve Approach: When all you have is a hammer… Problem: there isn’t one universally best approach!!!!!

  8. Data Target Representation Slightly less naïve approach: Aimless wandering…

  9. Data Target Representation Slightly less naïve approach: Aimless wandering… Problem 1: It takes too long!!!

  10. Data Target Representation Slightly less naïve approach: Aimless wandering… Problem 2: You might not realize all of the options that are available to you!

  11. Data Target Representation Expert Approach: Hypothesis driven

  12. Data Target Representation Expert Approach: Hypothesis driven You might end up with the same solution in the end, but you’ll get there faster.

  13. Data Target Representation Expert Approach: Hypothesis driven Today we’ll start to learn how!

  14. Classification Engine Learning Algorithm Data Model Prediction New Data What is machine learning? • Automatically or semi-automatically • Inducing concepts (i.e., rules) from data • Finding patterns in data • Explaining data • Making predictions

  15. Outlook: Sunny -> No Overcast -> Yes Rainy-> Yes A slightly more sophisticated rule learner will find the feature that gives the most information about the result class. What do you think that would be in this case? The simplest rule learner will learn to predict whatever is the most frequent result class. This is called the majority Class. <Feature Name>: <value> -> <prediction> <value> -> <prediction> … What will the rule be in this case? It will always predict yes. How does machine learning work?

  16. Outlook: Sunny -> No Overcast -> Yes Rainy-> Yes Yes What will be the prediction? Model New Data

  17. More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  18. More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  19. What will it do with this example? More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  20. What will it do with this example? More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  21. What will it do with this example? More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  22. What will it do with this example? More Complex Algorithm… • Two simple algorithms last time • 0R – Predict the majority class • 1R – Use the most predictive single feature • Today – Intro to Decision Trees • Today we will stay at a high level • We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

  23. Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  24. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  25. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  26. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  27. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  28. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake. Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  29. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake. Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  30. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake. Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  31. Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake. Now lets say you don’t know the shape, now what would you learn? Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  32. Why is it better? • Not because it is more complex • Sometimes more complexity makes performance worse • What is different in what the three rule representations assume about your data? • 0R • 1R • Trees • The best algorithm for your data will give you exactly the power you need

  33. What do concepts look like?

  34. R B S X X T X X X C X Clarification: Concepts as Lines

  35. Machine Learning Process Overview • Get to know your data • What distinguishes messages from different categories • Represent messages in terms of features • Use feature table tab • Build machine learning model • Use machine learning tab • Learn from mistakes, and try again • Use feature analyzer tab Coding Features

  36. Machine Learning

  37. Algorithms you will use • Decision Trees (J48): good with small feature sets, can find contingencies between features • Naïve Bayes: fast, makes decisions based on probabilities • Support Vector Machines (SMO), makes decisions based on weights, usually works well on text

  38. Setting Up Your Data

  39. I versus you is not a reliable predictor Not all WH words occur in questions Not all questions end in a question mark. How do you know when you have coded enough data? What distinguishes Questions and Statements? You need to code enough to avoid learning rules that won’t work

  40. Basic IdeaRepresent text as a vector where each position corresponds to a termThis is called the “bag of words” approach Cows make cheese 110001 Hens lay eggs 001110 Cheese Cows Eggs Hens Lay Make But same representation for “Cheese makes cows.”!

  41. What can’t you conclude from “bag of words” representations? • Causality: “X caused Y” versus “Y caused X” • Roles and Mood: “Which person ate the food that I prepared this morning and drives the big car in front of my cat” versus “The person, which prepared food that my cat and I ate this morning, drives in front of the big car.” • Who’s driving, who’s eating, and who’s preparing food?

  42. 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal 12.NN Noun, singular or mass 13.NNS Noun, plural 14.NNP Proper noun, singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative Part of Speech Tagging http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

  43. 23.RP Particle 24.SYM Symbol 25.TO to 26.UH Interjection 27.VB Verb, base form 28.VBD Verb, past tense 29.VBG Verb, gerund/present participle 30.VBN Verb, past participle 31.VBP Verb, non-3rd ps. sing. present 32.VBZ Verb, 3rd ps. sing. present 33.WDT wh-determiner 34.WP wh-pronoun 35.WP Possessive wh-pronoun 36.WRB wh-adverb Part of Speech Tagging http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

  44. Feature Space Design • Feature Space Design • Think like a computer! • Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful • Look for approximations • If you want to find questions, you don’t need to do a complete syntactic analysis • Look for question marks • Look for wh-terms that occur immediately before an auxilliary verb

  45. Feature Space Design • Feature Space Design • Punctuation can be a “stand in” for mood • “you think the answer is 9?” • “you think the answer is 9.” • Bigrams capture simple lexical patterns • “common denominator” versus “common multiple” • POS bigrams capture syntactic or stylistic information • “the answer which is …” vs “which is the answer” • Line length can be a proxy for explanation depth

  46. Feature Space Design • Feature Space Design • Contains non-stop word can be a predictor of whether a conversational contribution is contentful • “ok sure” versus “the common denominator” • Remove stop words removes some distracting features • Stemming allows some generalization • Multiple, multiply, multiplication • Removing rare features is a cheap form of feature selection • Features that only occur once or twice in the corpus won’t generalize, so they are a waste of time to include in the vector space

  47. Error Analysis

  48. Any Questions?

More Related