350 likes | 361 Views
CS5545: Natural Language Generation. Background Reading: Reiter and Dale, Building Natural Language Generation Systems, chaps 1, 2. Words instead of Pictures. Natural Language Generation (NLG): Generate English sentences to communicate data, instead of visualisations (or tables)
E N D
CS5545: Natural Language Generation Background Reading: Reiter and Dale, Building Natural Language Generation Systems, chaps 1, 2
Words instead of Pictures • Natural Language Generation (NLG): Generate English sentences to communicate data, instead of visualisations (or tables) • A research focus of Aberdeen CS Dept
Example : FoG • Produces textual weather reports in English and French • Input: • Graphical/numerical weather depiction • User: • Environment Canada (Canadian Weather Service)
Why use words? • Many potential reasons • Media restrictions (eg, text messages) • Users not knowledgeable enough to interpret a graph correctly • Words also communicate background info, emphasis, interpretation, … • People (in some cases) make better decisions from words than graphs
Easier for many people? • I’m afraid to say that you have a 1 in 3 chance of dying from a heart attack before your 65th birthday if you carry on as you are. But if you stop smoking, take your medicine, and eat better, a fatal heart attack will be much less likely (only a 1 in 12 chance).
Text vs Graph • Focus on key info (absolute risk, optimum risk) • Integrate with explanation (optimum risk means if you stop smoking, eat better, take medicine) • Add emphasis, perspective, “spin” (eg. “I’m afraid to say” indicates this is a serious problem)
Experiment: Decision Making • Showed 40 medical professionals (from junior nurses to senior doctors) data from a baby in neonatal ICU • Text summary of graphical depiction • Asked to make a treatment decision • Better decision when shown text • But said they preferred the graphic
What is NLG? • NLG systems are computer systems which produces understandable and appropriate texts in English or other human languages • Input is data (raw, analysed) • Output is documents, reports, explanations, help messages, and other kinds of texts • Requires • Knowledge of language • Knowledge of the domain
Natural Language Understanding Natural Language Generation Speech Recognition Speech Synthesis Language Technology Meaning Text Text Speech Speech
Aberdeen NLG Systems • STOP (smoking cessation letters) (demo) • SumTime (weather forecasts) (demos) • Ilex (museum description) (demo?) • SkillSum (feedback on assessment) • StandUp (help children make puns) • BabyTalk (summary of patient data) • Looking for a PhD student…
How do NLG Systems Work? • Usually three stages • Document planning: decide on content and structure of text • Microplanning: decide how to linguistically express text (which words, sentences, etc to use) • Realisation: actually produce text, conforming to rules of grammar
Scuba: example input • Input: three types • Raw data: eg dive data in scuba.mdb • Trends: segmented data (as in pract 2) • Patterns: eg, rapid ascent, sawtooth, reverse dive profile, etc (as in pract 2)
Scuba: target (human) output • Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, you should have taken more time to make this ascent. You also did not stop at 5m, we recommend that anyone diving beneath 12m should stop for 3 minutes at 5m. Your second ascent was fine.
Document Planning • Content selection: Of the zillions of things I could say, which should I say? • Depends on what is important • Also depends on what is easy to say • Structure: How should I organise this content as a text? • What order do I say things in? • Rhetorical structure?
Scuba: content • Probably focus on patterns indicating dangerous activities • Most important thing to mention • How much should we say about these? • Detail? Explanations? • Should we say anything for safe dives? • Maybe just acknowledge them?
Scuba: structure • Order by time (first event first) • Or should we mention the most dangerous patterns first? • Linking words (cue phrases) • Also, but, because, …
Microplanning • Lexical choice: Which words to use? • Aggegation: How should information be distributed across sentences and paras • Reference: How should the text refer to objects and entities?
SCUBA: microplanning • Lexical choice: • A bit rapid vs too fast vs unwise vs … • Ascended vs rose vs rose to surface vs … • Aggregation: 1 sentence or 3 sent? • “Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent.”
Scuba: Microplanning • Aggregation (continued) • Phrase merging • “Your first ascent was fine. Your second ascent was fine” vs • “Your first and second ascents were fine.” • Reference • Your ascent vs • Your first ascent vs • Your ascent from 33m at 3 min
Realisation • Grammars (linguistic): Form legal English sentences based on decisions made in previous stages • Obey sublanguage, genre constraints • Structure: Form legal HTML, RTF, or whatever output format is desired
Scuba: Realisation • Simple linguistic processing • Capitalise first word of sentence • Subject-verb agreement • Your first ascent was fine • Your first and second ascents were fine • Structure • Inserting line breaks in text (pouring) • Add HTML markups, eg, <P>
Multimodal NLG • Speech output • Feed text output into a speech synthesiser • Tight integration with synthesiser • Higher quality voice • Text and visualisations • Produce separately • Tight integration • Eg, text refers to graphic, graphs has text popup
Building NLG Systems • Knowledge • Representations • Algorithms • Systems
Building NLG Systems: Knowledge • Need knowledge • Which patterns most important? • What order to use? • Which words to use? • When to merge phrases? • How to form plurals • Etc • Where does this come from?
Knowledge Sources • Imitate a corpus of human-written texts • Most straightforward, will focus on • Ask domain experts • Useful, but experts often not very good at explaining what they are doing • Experiments with users • Very nice in principle, but a lot of work
Scuba: Corpus • See which patterns humans mention in the corpus, and have the system mention these • See the order used by humans, and have the system imitate these • etc
Building NLG Systems: Algorithms and Representations • Various algorithms and representations have been designed for NLG tasks • Will discuss in later lectures • But often can simply code NLG systems straightforwardly in Java, without special algorithms • Knowledge is more important
Building NLG Systems: Systems • Ideally should be able to plug knowledge into downloadable systems • Unfortunately very little in the way of downloadable NLG systems • Mostly specialised stuff primarily of interest to academics, eg http://openccg.sourceforge.net/ • I would like to improve situation
Aberdeen NLG group • 15 academic staff, researchers, PhD students • Leader: Prof Chris Mellish • (one of) the best NLG groups in world • Looking for more researchers and PhD students… (esp BabyTalk project) • Let me know if interested!