1 / 25

Introduction to Natural Language Generation

Introduction to Natural Language Generation. Yael Netzer Department of Computer Science Ben Gurion University. Outline. Introduction – what is NLG Traditional architecture of NLG system Statistical methods in NLG FUF/SURGE An example in Hebrew – the noun phrase

Download Presentation

Introduction to Natural Language Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Natural Language Generation Yael Netzer Department of Computer Science Ben Gurion University

  2. Outline • Introduction – what is NLG • Traditional architecture of NLG system • Statistical methods in NLG • FUF/SURGE • An example in Hebrew – the noun phrase • A statistical method for generation Yael Netzer BGU

  3. What is Natural Language Generation (NLG) NLG is the process of constructing natural language outputs from non-linguistic inputs. [VanLinden] NLG is mapping some communication goal to some surface utterance that satisfies the goal. [Reiter & Dale] Yael Netzer BGU

  4. Aspects in NLG • Theoretical and practical interests: • Theoretical: modeling various depths of human language representation and production. • Practical: engineering human/computer interfaces (computer as an author/authoring aid). Yael Netzer BGU

  5. Systems for examples: • NLG as an Author: • Weather reports (FoG) • Stock market descriptions • Museum artifacts descriptions (ILEX) • “Personal” letters to costumers (AlethGen) • NLG as an author aid • Integrated (partial) NLG uses: • NLG in augmentative and alternative communication • Summarization (integrate ‘cut and paste’ techniques with generation) • Machine Translation (generation from interlingua) Yael Netzer BGU

  6. Inputs of NLG systems Formally, a system can be defined as a four-tuple: {k,c,u,d} • k- knowledge source (tables of numbers, knowledge representation lang.) domain dependent, no generalizations. • c - communicative goal: the consequence of a given execution of the system (considering appropriate information) Yael Netzer BGU

  7. NLG input spec. cont. u - user model: characterization of the hearer or intended audience for whom the text is to be generated. d - discourse history: previous interactions between user and NLG controlling anaphoric forms, preventing repetitions. Yael Netzer BGU

  8. The output for an NLG system Any text conveying the communicative goal: It can be a word like ``yes'' in a dialogue - or a text consisting of many paragraphs in other cases. The output should be related to the medium: web pages with hyperlinks, voice stream etc. Yael Netzer BGU

  9. Main (Pipeline) Architecture • Content determination • What information should be included in the text? • Document structuring • how to organize text • Lexicalisation • choosing particular words or phrases • Aggregation • composing chunks of info into sentences. • Referring expression generation – • what properties should be used in referring to an entity. • Surface realization • mapping underlying content of text to a grammatically correct sentence that expresses the desired meaning. Yael Netzer BGU

  10. Content Determination Content determination: • The process of deciding what to say. • No general rules - domain specific. • what is important - what should always be included, what is exceptional information, etc. • Practically – constructs a set of messages from the underlying data (entities, concepts and relations). Yael Netzer BGU

  11. Document Structuring Document Structuring: imposing ordering and structure over the information. - conceptual grouping - rhetorical relationships. Yael Netzer BGU

  12. Lexical choice Lexical chooser: • determining the particular words to be used to express concepts and relations. • complexity of coding vs. richer language. • choosing content words: information is mapped from conceptual vocabulary. • LC should supply a variety of words, consider the user model [precise vs. general description of weather phenomenon], and account for pragmatic considerations (formal vs. casual style). Yael Netzer BGU

  13. Aggregation Aggregation - can be performed in various stages: • the planner: combines similar data. • In lexicalization: aggregates some concepts into one lexical element. • Aggregations of sentences: • The month was cooler than average. The month was drier than average into The month was cooler and drier than average Yael Netzer BGU

  14. Referring expression generation Referring Expression Generation: • an entity can be referred in many ways: initially, subsequently, distinguishing, definite, pronouns. • Proper names: • באר שבע • באר שבע בית הנגב • Definite descriptions: • The train that leaves at 10am • The next train. • Prounouns • it Yael Netzer BGU

  15. Syntactic realizer Syntactic Realizer: syntax and morphology. • Most general, domain independent (but definitely language dependent). • Various Usage Scenarios • Input to syntactic realization is not observable • Input for syntactic realizers in NLG • What knowledge is needed to prepare input? • Who supplies this knowledge? • Can we find a common abstraction, common across languages and applications? Yael Netzer BGU

  16. Possible techniques for realizers • Bi-directional grammar specification. • Grammar specifications tuned for generation. • Templates • Corpus statistics Yael Netzer BGU

  17. A note on bi-directional grammar • Realization, in some aspects, is easier than parsing: no need to handle the full range of syntax that a human might use, no need to resolve ambiguities, no need to recover ill-formed input. • A bi-directional grammar, is, theoretically, a possible elegant approach. • However, most NLG systems use a generation-oriented grammar Yael Netzer BGU

  18. Why not bi-directional? • Output of NLU parser is very different from the input to an NLG realizer. • Not obvious that lexicalization is a part of the realization. • Practically, not easy to engineer large bi-directional grammars. • And more: generation is the process of choices, even to use ‘canned text’ when needed. Yael Netzer BGU

  19. Syntactic Realizer • This work concerns Syntactic Realizers – the grammar • Input for grammar: lexicalized representation of a phrase in various levels of abstractions. • Output of grammar: a grammatical string, representing most accurately the info in the input. Yael Netzer BGU

  20. The input question is: Input?? Application Content planner And lexicon Knowledge base Syntactic Realizer Yael Netzer BGU

  21. FUF/SURGE - Implementation • The grammar is written in FUF – Functional Unification Formalism [Elhadad] • FD - a list of (att val) • val = atom\fd\path • Grammar: meta-FD: disjunction with ALT, control with • NONE, GIVEN, ANY. • All components in the generation process can be implemented with this formalism. Yael Netzer BGU

  22. Requirements for a syntactic realizer • Mapping thematic structure onto syntactic roles. • Control of syntactic paraphrasing and alternations. • Provision of default for syntactic features. • Propagation of agreement features. • Selection of closed class words. • The imposition of linear precedence constraints. • The inflection of open class words. Yael Netzer BGU

  23. SURGE [Elhadad&Robin 96] • Functional Grammar, HPSG and descriptive studies of language • Input for the grammar is a lexicalized representation of a phrase (a clause, NP, AP). • Minimal syntactic information in the input allows isolating earlier stages of the process from containing purely syntactic knowledge, it gives the grammar paraphrasing power, and it is also useful for multilingual application. Yael Netzer BGU

  24. Input for SURGE in general • Each constituent has the feature cat which determines which part of the grammar it will be unified with. • The representation of the clause is mostly semantic: a process (in SFL terms) and its participant. Paraphrasing can be done using one feature, like focus • The input of an NP uses mostly syntactic features. • Paraphrases requires different input. Yael Netzer BGU

  25. An Example The girl was kissed by John. John kissed the girl. ((cat clause) (tense past) (process ((type material) (agentless no) (lex “kiss”))) (participants ((agent ((cat proper) (lex “John”))) (affected ((cat common) (lex “girl”)))))) (focus {partic affected}) Yael Netzer BGU

More Related