1 / 55

Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study

Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study. Manny Rayner Geneva University (joint work with Beth Ann Hockey and Gwen Christian). Structure of talk. Background: Regulus and MedSLT Grammar-based language models and statistical language models.

raziya
Download Presentation

Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study Manny RaynerGeneva University (joint work with Beth Ann Hockey and Gwen Christian)

  2. Structure of talk • Background: Regulus and MedSLT • Grammar-based language models and statistical language models

  3. What is MedSLT? • Open Source medical speech translation system for doctor-patient dialogues • Medium-vocabulary (400-1500 words) • Grammar-based: uses Regulus platform • Multi-lingual: translate through interlingua

  4. MedSLT • Open Source medical speech translator for doctor – patient examinations • Main system unidirectional (patient answers non-verbally, e.g. nods or points) • Also experimental bidirectional system • Two main purposes • Potentially useful (could save lives!) • Vehicle for experimenting with underlying Regulus spoken dialogue engineering toolkit

  5. Regulus: central goals • Reusable grammar-based language models • Compile into recognisers • Infrastructure for using them in applications • Speech translation • Spoken dialogue • Multilingual • Efficient development environment • Open Source

  6. The full story… $25 (paperback edition) from amazon.com

  7. What kind of applications? • Grammar-based is • Good on in-coverage data • Good for complex, structured utterances • Users need to • Know what they can say • Be concerned about accuracy • Good target applications • Safety-critical • Medium vocabulary (~200 – 2000 words)

  8. In particular… • Clarissa • NASA procedure assistant for astronauts • ~250 word vocabulary, ~75 command types • MedSLT • Multilingual medical speech translator • ~400 – ~1000 words, ~30 question types • SDS • Experimental in-car system from Ford Research • First prize, Ford internal demo fair, 2007 • ~750 words

  9. Key technical ideas • Reusable grammar resources • Use grammars for multiple purposes • Parsing • Generation • Recognition • Appropriate use of statistical methods

  10. Reusable grammar resources • Building a good grammar from scratch is very challenging • Need a methodology for rational reuse of existing grammar structure • Use small corpus of examples to extract structure from a large resource grammar

  11. The Regulus picture R E G U L U S EBL Specialization UG to CFG Compiler Application Specific UG General UG CFG Grammar N U A N C E CFG to PCFGCompiler Lexicon PCFG Grammar Training Corpus (P)CFG to Recogniser Compiler OperationalityCriteria Recognizer

  12. The general English grammar • Loosely based on SRI Core Language Engine grammar • Compositional semantics (4 different versions) • ~200 unification grammar rules • ~75 features • Core lexicon, ~ 450 words (Also resource grammars for French, Spanish, Catalan, Japanese, Arabic, Finnish, Greek)

  13. General grammar  domain-specific grammar • “Macro-rule learning” • Corpus-based process • Remove unused rules and lexicon items • Flatten parsed examples to remove structure • Simpler structure  less ambiguity smaller search space

  14. EBL example (1) UTTERANCE S S VP VP VP VBAR VBAR NP NP NBAR PP PRO N V V PP do when you  get headaches

  15. EBL example (2) UTTERANCE S S VP VP VP VBAR VBAR NP NP NBAR PP PRO N V V PP do when you  get headaches

  16. EBL example (3) UTTERANCE S Main new rules:S  PP VBAR VBAR NP NP  N VBAR VBAR NP NP PP PRO N V V do when you get headaches

  17. Using grammars for multiple purposes • Parsing • Surface words  logical form • Generation • Logical form  surface words • Recognition • Speech  surface words

  18. Building a speech translator • Combine Regulus-based components • Source-language recognizer (speech  words) • Source-language parser (words  logical form) • Transfer from source to target, via interlingua (logical form  logical form) • Target-language generator (logical form  words) • (3rd party text to speech)

  19. Adding statistical methods Two different ways to use statistical methods: • Statistical tuning of grammar • Intelligent help system

  20. Impact of statistical tuning (Regulus book, chapter 11) • Base recogniser • MedSLT with English recogniser • Training corpus: 650 utterances • Vocabulary: 429 surface words • Test data: • 801 spoken and transcribed utterances

  21. Vary vocabulary size • Add lexical items (11 different versions) • Total vocabulary 429 – 3788 surface words • New vocabulary not used in test data • Expect degradation in performance • Larger search space • New possibilities just a distraction

  22. Impact of statistical tuningfor different vocabulary sizes Sem Error Rate Vocabulary size

  23. Intelligent help system • Need robustness somewhere • Add a backup statistical recogniser • Use it to advise the user • Approximate match with in-coverage examples • Show user similar things they could say • Original paper: Gorrell, Lewin and Rayner, ICSLP 2002

  24. MedSLT experiments (Chatzichrisafis et al, HLT workshop 2006) • French  English version of system • Basic questions • How quickly do novices become experts? • Can people adapt to limited coverage? • Let subjects use system several times, and track performance

  25. Experimental Setup • Subjects • 8 medical students, no previous knowledge of system • Scenario • Experimenter simulates headache • Subject must diagnose it • 3 sessions, 3 tasks per session • Instruction • ~20 min instructions & video (headset, push-to-talk) • All other instruction from help system

  26. Results - # Interactions Interactions

  27. Results – Time/Diagnosis

  28. Questionnaire results

  29. Summary • After 1.5 hours of use, subjects complete task in average of 4 minutes • System implementers average 3 minutes • All coverage learned from help system • Subjects’ impressions very positive

  30. A few words about interlingua • Coverage in different languages diverges if left to itself • Want to enforce uniform coverage • Many-to-many translation • “N2 problem” • Solution: translate through interlingua • Tight interlingua definition

  31. Interlingua grammar • Think of interlingua as a language • Define using Regulus • Mostly for constraining representations • Also get a surface form • “Semantic grammar” • Not linguistic, all about domain constraints

  32. Example of interlingua “YN-QUESTION pain become-better sc-when [ you sleep PRESENT] PRESENT”[[utterance_type, ynq], [symptom, pain], [event, become_better], [tense, present], [sc, when], [clause, [[utterance_type, dcl], [pronoun, you], [action, sleep], [tense, present]]]]

  33. Constraints from interlingua • Source language sentences licensed by grammar may not produce valid interlingua • Interlingua can act as a knowledge source to improve language modelling

  34. Structure of talk • Background: Regulus and MedSLT • Grammar-based language models and statistical language models

  35. Language models • Two kinds of language models • Statistical (SLM) • Trainable, robust • Require a lot of corpus data • Grammar-based (GLM) • Require little corpus data • Brittle

  36. Compromises between SLM and GLM • Put weights on GLM (CFG  PCFG) • Powerful technique, see earlier • Doesn’t address robustness • Put GLMs inside SLMs (Wang et al, 2002) • Use GLM to generate training data for SLM (Jurafsky et al 1995, Jonson 2005)

  37. Generating SLM training data with a GLM • Optimistic view • Need only small seed corpus, to build GLM • Will be robust, since finally an SLM • Pessimistic view • “Something for nothing” • Data for GLM could be used directly to build an SLM • Hard to decide • Don’t know what data went into GLM • Often just in grammar writer’s head

  38. Regulus permits comparison • Use Regulus to build GLM • Data-driven process with explicit corpus • Same corpus can be used to build SLM • Comparison is meaningful

  39. Two ways to build SLM • Direct • Seed corpus  SLM • Indirect • Seed corpus  GLM  corpus  SLM

  40. Parameters for indirect method • Size of generated corpus • Can generate any amount of data • Method of generating corpus • CFG versus PCFG • Filtering • Use interlingua to filter generated corpus

  41. CFG versus PCFG generation • CFG • Use plain GLM to do random generation • PCFG • Use seed corpus to weight GLM rules • Weights then used in random generation

  42. Interlingua filtering • Impossible to make GLM completely tight • Many in-coverage sentences make no sense • Some of these don’t produce valid interlingua • Use interlingua grammar as filter

  43. Example: CFG generated data what attacks of them 're your duration all day have a few sides of the right sides regularly frequently hurt where 's it increased what previously helped this headache have not any often ever helped are you usually made drowsy at home what sometimes relieved any gradually during its night 's this severity frequently increased before helping when are you usually at home how many kind of changes in temperature help a history

  44. Example: PCFG generated data does bright light cause the attacks are there its cigarettes does a persistent pain last several hours is your pain usually the same before were there them when this kind of large meal helped joint pain do sudden head movements usually help to usually relieve the pain are you thirsty does nervousness aggravate light sensitivity is the pain sometimes in the face is the pain associated with your headaches

  45. Example: PCFG generated data with interlingua filtering does a persistent pain last several hours do sudden head movements usually help to usually relieve the pain are you thirsty does nervousness aggravate light sensitivity is the pain sometimes in the face have you regularly experienced the pain do you get the attacks hours is the headache pain better are headaches worse is neck trauma unchanging

  46. Experiments • Start with same English seed corpus • 948 utterances • Generate GLM recogniser • Generate different types of training corpus • Train SLM from each corpus • Compare recognition performance • Word Error Rate (WER) • Sentence Error Rate (SER) • McNemar sign test on SER to get significance

  47. Experiment 1: different methods

  48. Experiment 1:significant differences • GLM >> all SLMs • seed corpus >> all generated corpora • PCFG generation >> CFG generation • filtered > not filtered However, generated corpora are small…

  49. Experiment 2: different sizes of corpus

  50. Experiment 2:significant differences • GLM >> all SLMs • large corpus > small corpus • large unfiltered generated corpus ~ seed corpus • SER for large unfiltered corpus about the same • large filtered generated corpus ~/> seed corpus • SER for large filtered corpus better, but not significant • filtered > not filtered

More Related