1 / 36

TagHelper Tools Supporting the Analysis of Conversational Data

TagHelper Tools Supporting the Analysis of Conversational Data. Carolyn P. Ros é Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University. Outline. What is TagHelper tools? What can TagHelper Tools do for YOU?

Download Presentation

TagHelper Tools Supporting the Analysis of Conversational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TagHelper ToolsSupporting the Analysis of Conversational Data Carolyn P. Rosé Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University

  2. Outline • What is TagHelper tools? • What can TagHelper Tools do for YOU? • How EASY is it to use TagHelper tools? • What are some TagHelper success stories? • What problems are we working on?

  3. What is TagHelper tools?

  4. What is TagHelper tools? • A PSLC Enabling Technology project • Machine learning technology for processing conversational data • Chat data • Newsgroup style conversational data • Short answers and explanations • Goal: automate the categorization of spans of text

  5. What is TagHelper tools? • An add-on to Microsoft Excel • Research Focus: identify and solve text classification problems specific to learning sciences • Types of categories, nature and size of data sets

  6. What can TagHelper tools do for YOU?

  7. Main Uses for TagHelper tools • Supporting data analysis involving conversational data • Triggering interventions • Supporting on-line assessment

  8. Example: Data Analysis

  9. Example: Triggering an Intervention • ST1: well what values do u have for the reheat cycle ? • ST2: for some reason I said temperature at turbine to be like 400 C • Tutor: Let's think about the motivation for Reheat. What process does the steam undergo in the Turbines ? • …

  10. Example: Supporting on-line assessment * Using instructor assigned ratings as gold standard * Best performance without TagHelper tools: .16 correlation coefficient * Best performance with TagHelper tools: .63 correlation coefficient

  11. How EASY is it to use TagHelper tools?

  12. Setting Up Your Data

  13. Iterative Process for Using TagHelper tools • Obtain data in natural language form • Iterative process • Decide on a unit of analysis • Single contributions, topic segments, whole messages, etc. • Decide on a set of categories or a rating system • Set up data in Excel • Assign categories to part of your data • Use TagHelper to assign categories to the remaining portion of your data

  14. Training and Testing • Start TagHelper tools by double clicking on the portal.bat icon • You will then see the following tool pallet • Train a prediction model on your coded data and then apply that model to uncoded data

  15. Loading a File First click on Add a File Then select a file

  16. Simplest Usage • Once your file is loaded, you have two options • The first option is to code your data using the default settings • To do this, simply click on “GO!” • The second option is to modify the default settings and then code • We will start with the first option • Note that the performance will not be optimal

  17. Results Performance on coded data Results on uncoded data

  18. A slightly more complex case…

  19. Example: Data Analysis

  20. Setting Up Your Data

  21. What are some TagHelper success stories?

  22. Success Story 1: Supporting Data Analysis • Peer tutoring in Algebra LearnLab • Data coded for high-level-help, low-level-help, and no-help • Important predictor of learning (e.g., Webb et al., 2003) • TagHelper achieves agreement of .82 Kappa • Can be used for follow-up studies in same domain * Contributed by Erin Walker

  23. Success Story 2: Triggering Interventions • Collaborative idea generation in the Earth Sciences domain • Chinese TagHelper learns hand-coded topic analysis • Human agreement .84 Kappa • TagHelper performance .7 Kappa • Trained models used in follow-up study to trigger interventions and facilitate data analysis

  24. Example Dialogue * Feedback during idea generation increases both idea generation and learning (Wang et al., 2007)

  25. Unique Ideas 12 Nom+N Nom+F Real+N 10 Real+F 8 #Unique Ideas 6 4 2 0 0 5 10 15 20 25 30 Time Stamp Process Analysis Process loss Pairs vs Individuals: F(1,24)=12.22, p<.005, 1 sigma Individuals+Feedback Individuals+NoFeedback Pairs+Feedback Pairs+NoFeedback Process loss Pairs vs Individuals: F(1,24)=4.61, p<.05, .61 sigma Negative effect of Feedback: F(1,24)= 7.23, p<.05, -1.03 sigma Positive effect of feedback: F(1,24)=16.43, p<.0005, 1.37 sigma

  26. What problems are we working on?

  27. Interesting Problems • Highly skewed data sets • Very infrequent classes are often the most interesting and important • Careful feature space design helps more than powerful algorithms • Huge problem with non-independence of data points from same student • Off-the shelf machine learning algorithms not set up for this • New sampling techniques offer promise • “Medium” sized data sets • Contemporary machine learning approaches designed for huge data sets • Supplementing with alternative data sources may help

  28. Example Lesson Learned Problem Context oriented coding Finding Careful feature space design goes farther than powerful algorithms

  29. Back to Argumentation Data

  30. Sequential Learning • Notes sequential dependencies • Perhaps claims are stated before their warrants • Perhaps counter-arguments are given before new arguments • Perhaps people first build on their partner’s ideas and then offer a new idea

  31. Thread Depth Best Parent Semantic Similarity Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Thread Structure Features

  32. Sequence Oriented Features • Notes whether text is within a certain proximity to quoted material

  33. Context-Based Feature Approach

  34. Sequential Learning

  35. What did we learn? • Intuition confirmed • Different dimensions responded differently to context based enhancements • Feature based approach was more effective • Thread structure features were especially informative for Social Modes dimension • Thread structure information is more difficult to extract from chat data • Best results of similar approach on chat data only achieved a kappa of .45

  36. Special Thanks To: William Cohen Pinar Donmez Jaime Arguello Gahgene Gweon Rohit Kumar Yue Cui Mahesh Joshi Yi-Chia Wang Hao-Chuan Wang Emil Albright Cammie Williams Questions?

More Related