1 / 42

Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings

Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings. Lori Levin Robert Frederking Alison Alvarez Language Technologies Institute School of Computer Science Carnegie Mellon University. Jeff Good Department of Linguistics

niesha
Download Presentation

Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings Lori Levin Robert Frederking Alison Alvarez Language Technologies Institute School of Computer Science Carnegie Mellon University Jeff Good Department of Linguistics Max Planck Institute for Evolutionary Anthropology

  2. Reverse Treebank (RTB) • What? • Create the syntactic structures first • Then add sentences • Why? • To elicit data from speakers of less commonly taught languages: • Decide what meaning we want to elicit • Represent the meaning in a feature structure • Add an English or Spanish sentence (plus context notes) to express the meaning • Ask the informant to translate it

  3. Bengali Example srcsent: The large bus to the post office broke down. context: tgtsent: ((actor ((modifier ((mod-role mod-descriptor) (mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific) (np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate) (np-person person-third)(np-function fn-actor)(np-general-type common-noun-type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a)(c-question-gap gap-n/a))

  4. Outline • Background • The AVENUE Machine Translation System • Contents of the RTB • An inventory of grammatical meanings • Languages that have been elicited • Tools for RTB creation • Future work • Evaluation • Navigation

  5. Type information Synchronous Context Free Rules Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) AVENUE Machine Translation System ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori Levin (Co-PI) Rule learning: Katharina Probst

  6. AVENUE • Rules can be written by hand or learned automatically. • Hybrid • Rule-based transfer • Statistical decoder • Multi-engine combinations with SMT and EBMT

  7. AVENUE systems(Small and experimental, but tested on unseen data) • Hebrew-to-English • Alon Lavie, Shuly Wintner, Katharina Probst • Hand-written and automatically learned • Automatic rules trained on 120 sentences perform slightly better than about 20 hand-written rules. • Hindi-to-English • Lavie, Peterson, Probst, Levin, Font, Cohen, Monson • Automatically learned • Performs better than SMT when training data is limited to 50K words

  8. AVENUE systems(Small and experimental, but tested on unseen data) • English-to-Spanish • Ariadna Font Llitjos • Hand-written, automatically corrected • Mapudungun-to-Spanish • Roberto Aranovich and Christian Monson • Hand-written • Dutch-to-English • Simon Zwarts • Hand-written

  9. Elicitation • Get data from someone who is • Bilingual • Literate • Not experienced with linguistics

  10. English-Hindi Example Elicitation Tool: Erik Peterson

  11. English-Chinese Example

  12. English-Arabic Example

  13. Elicitation srcsent: Tú caíste tgtsent: eymi ütrünagimi aligned: ((1,1),(2,2)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) fell srcsent: Tú estás cayendo tgtsent: eymi petu ütrünagimi aligned: ((1,1),(2 3,2 3)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) are falling srcsent: Tú caíste tgtsent: eymi ütrunagimi aligned: ((1,1),(2,2)) context: tú = María [femenino, 2a persona del singular] comment: You (Mary) fell

  14. Outline • Background • The AVENUE Machine Translation System • Contents of the RTB • An inventory of grammatical meanings • Languages that have been elicited • Tools for RTB creation • Future work • Evaluation • Navigation

  15. Size of RTB • Around 3200 sentences • 20K words

  16. Languages • The set of feature structures with English sentences has been delivered to the Linguistic Data Consortium as part of the Reflex program. • Translated (by LDC) into: • Thai • Bengali • Plans to translate into: • Seven “strategic” languages per year for five years. • As one small part of a language pack (BLARK) for each language.

  17. Languages • Feature structures are being reverse annotated in Spanish at New Mexico State University (Helmreich and Cowie) • Plans to translate into Guarani • Reverse annotation into Portuguese in Brazil (Marcello Modesto) • Plans to translate into Karitiana • 200 speakers • Plans to translate into Inupiaq (Kaplan and MacLean)

  18. Previous Elicitation Work • Pilot corpus • Around 900 sentences • No feature structures • Mapudungun • Two partial translations • Quechua • Three translations • Aymara • Seven translations • Hebrew • Hindi • Several translations • Dutch

  19. Mary is writing a book for John. Who let him eat the sandwich? Who had the machine crush the car? They did not make the policeman run. Mary had not blinked. The policewoman was willing to chase the boy. Our brothers did not destroy files. He said that there is not a manual. The teacher who wrote a textbook left. The policeman chased the man who was a thief. Mary began to work. Tense, aspect, transitivity Questions, causation and permission Interaction of lexical and grammatical aspect Volitionality Embedded clauses and sequence of tense Relative clauses Phase aspect Sample: clause level

  20. The man quit in November. The man works in the afternoon. The balloon floated over the library. The man walked over the platform. The man came out from among the group of boys. The long weekly meeting ended. The large bus to the post office broke down. The second man laughed. All five boys laughed. Temporal and locative meanings Quantifiers Numbers Combinations of different types of modifers My book Possession, definiteness A book of mine Possession, indefiniteness Sample: noun phrase level

  21. Example srcsent: The large bus to the post office broke down. ((actor ((modifier ((mod-role mod-descriptor) (mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific) (np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate) (np-person person-third)(np-function fn-actor)(np-general-type common-noun-type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a)(c-question-gap gap-n/a))

  22. Grammatical meanings vs syntactic categories • Features and values are based on a collection of grammatical meanings • Many of which are similar to the grammatemes of the Prague Treebanks

  23. YES Semantic Roles Identifiability Specificity Time Before, after, or during time of speech Modality NO Case Voice Determiners Auxiliary verbs Grammatical Meanings

  24. YES How is identifiability expressed? Determiner Word order Optional case marker Optional verb agreement How is specificity expressed? How are generics expressed? How are predicate nominals marked? NO How are English determiners translated? The boy cried. The lion is a fierce beast. I ate a sandwich. He is a soldier. Il est soldat. Grammatical Meanings

  25. Argument Roles • Actor • Roughly, deep subject • Undergoer • Roughly, deep object • Predicate and predicatee • The woman is the manager. • Recipient • I gave a book to the students. • Beneficiary • I made a phone call for Sam.

  26. Why not subject and object? • Languages use their voice systems for different purposes. • Mapudungun obligatorily uses an inverse marked verb when third person acts on first or second person. • Verb agrees with undergoer • Undergoer exhibits other subjecthood properties • Actor may be object. • Yes: How are actor and undergoer encoded in combination with other semantic features like adversity (Japanese) and person (Mapudungun)? • No: How is English voice translated into another language?

  27. Argument Roles • Accompaniment • With someone • With pleasure • Material • (out) of wood • About 20 more roles • From the Lingua checklist; Comrie & Smith (1977) • Many also found in tectogrammatical representations • Around 80 locative relations • From Lingua checklist • Many temporal relations

  28. Person Number Biological gender Animacy Distance (for deictics) Identifiability Specificity Possession Other semantic roles Accompaniment, material, location, time, etc. Type Proper, common, pronoun Cardinals Ordinals Quantifiers Given and new information Not used yet because of limited context in the elicitation tool. Noun Phrase Features

  29. Tense Aspect Lexical, grammatical, phase Type Declarative, open-q, yes-no-q Function Main, argument, adjunct, relative Source Hearsay, first-hand, sensory, assumed Assertedness Asserted, presupposed, wanted Modality Permission, obligation Internal, external Clause level features

  30. Other clause types(Constructions) • Causative • Make/let/have someone do something • Predication • May be expressed with or without an overt copula. • Existential • There is a problem. • Impersonal • One doesn’t smoke in restaurants in the US. • Lament • If only I had read the paper. • Conditional • Comparative • Etc.

  31. Outline • Background • The AVENUE Machine Translation System • Contents of the RTB • An inventory of grammatical meanings • Languages that have been elicited • Tools for RTB creation • Future work • Evaluation • Navigation

  32. Tools for RTB Creation • Change the inventory of grammatical meanings • Make new RTBs for other purposes

  33. The Process Tense & Aspect Feature Specification Clause-Level Noun-Phrase Modality … List of semantic features and values Feature Maps: which combinations of features and values are of interest Feature Structure Sets Reverse Annotated Feature Structure Sets: add English sentences The Corpus Sampling SmallerCorpus

  34. Feature Specification • XML Schema • XSLT Script • Human readable form • Feature: Causer intentionality • Values: intentional, unintentional • Feature: Causee control • Values: in control, not in control • Feature: Causee volitionality • Values: willing, unwilling • Feature: Causation type • Values: direct, indirect

  35. Feature Combination • Person and number interact with tense in many fusional languages. • In English, tense interacts with questions: • Will you go?

  36. ((predicatee ((np-general-type pronoun-type common-noun-type) (np-person person-first person-second person-third) (np-number num-sg num-pl) (np-biological-gender bio-gender-male bio-gender-female))) {[(predicate ((np-general-type common-noun-type) (np-person person-third))) (c-copula-type role)] [(predicate ((adj-general-type quality-type) (c-copula-type attributive)))] [(predicate ((np-general-type common-noun-type) (np-person person-third) (c-copula-type identity)))]} (c-secondary-type secondary-copula) (c-polarity #all) (c-general-type declarative) (c-speech-act sp-act-state) (c-v-grammatical-aspect gram-aspect-neutral) (c-v-lexical-aspect state) (c-v-absolute-tense past present future) (c-v-phase-aspect durative)) Feature Combination Template Summarizes 288 feature structures, which are automatically generated.

  37. Annotation Tool • Feature structure viewer • Various views of the feature structure • Omit features whose value is not-applicable • Group related features together • Aspect • causation

  38. Outline • Background • The AVENUE Machine Translation System • Contents of the RTB • An inventory of grammatical meanings • Languages that have been elicited • Tools for RTB creation • Future work • Evaluation • Navigation

  39. Evaluation • Current funding has not covered evaluation of the RTB. • Except for informal observations as it was translated into several languages. • Does it elicit the meanings it was intended to elicit? • Informal observation: usually • Is it useful for machine translation?

  40. Hard Problems • Reverse annotating meanings that are not grammaticalized in English. • Evidentiality: • He stole the bread. • Context: Translate this as if you do not have first hand knowledge. In English, we might say, “They say that he stole the bread” or “I hear that he stole the bread.”

  41. Hard Problems • Reverse annotating things that can be said in several ways in English. • Impersonals: • One doesn’t smoke here. • You don’t smoke here. • They don’t smoke here. • Credit cards aren’t accepted. • Problem in the Reflex corpus because space was limited.

  42. Navigation • Currently, feature combinations are specified by a human. • Plan to work in active learning mode. • Build seed RTB • Translate some data • Do some learning • Identify most valuable pieces of information to get next • Generate an RTB for those pieces of information • Translate more • Learn more • Generate more, etc.

More Related