Loading in 2 Seconds...

Statistical XFER: Hybrid Statistical Rule-based Machine Translation

Loading in 2 Seconds...

- By
**issac** - Follow User

- 319 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Statistical XFER: Hybrid Statistical Rule-based Machine Translation' - issac

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Statistical XFER:Hybrid Statistical Rule-based Machine Translation

Alon Lavie

Language Technologies Institute

Carnegie Mellon University

Joint work with:

Jaime Carbonell, Lori Levin, Bob Frederking, Erik Peterson, Christian Monson, Vamshi Ambati, Greg Hanneman, Kathrin Probst, Ariadna Font-Llitjos, Alison Alvarez, Roberto Aranovich

Outline

- Background and Rationale
- Stat-XFER Framework Overview
- Elicitation
- Learning Transfer Rules
- Automatic Rule Refinement
- Example Prototypes
- Major Research Challenges

Statistical XFER MT

Progression of MT

- Started with rule-based systems
- Very large expert human effort to construct language-specific resources (grammars, lexicons)
- High-quality MT extremely expensive only for handful of language pairs
- Along came EBMT and then Statistical MT…
- Replaced human effort with extremely large volumes of parallel text data
- Less expensive, but still only feasible for a small number of language pairs
- We “traded” human labor with data
- Where does this take us in 5-10 years?
- Large parallel corpora for maybe 25-50 language pairs
- What about all the other languages?
- Is all this data (with very shallow representation of language structure) really necessary?
- Can we build MT approaches that learn deeper levels of language structure and how they map from one language to another?

Statistical XFER MT

Rule-based vs. Statistical MT

- Traditional Rule-based MT:
- Expressive and linguistically-rich formalisms capable of describing complex mappings between the two languages
- Accurate “clean” resources
- Everything constructed manually by experts
- Main challenge: obtaining broad coverage
- Phrase-based Statistical MT:
- Learn word and phrase correspondences automatically from large volumes of parallel data
- Search-based “decoding” framework:
- Models propose many alternative translations
- Effective search algorithms find the “best” translation
- Main challenge: obtaining high translation accuracy

Statistical XFER MT

Main Principles of Stat-XFER

- Integrate the major strengths of rule-based and statistical MT within a common framework:
- Linguistically rich formalism that can express complex and abstract compositional transfer rules
- Rules can be written by human experts and also acquired automatically from data
- Easy integration of morphological analyzers and generators
- Word and basic phrase correspondences (i.e. base NPs) can be automatically acquired from parallel text when available
- Search-based decoding from statistical MT adapted to find the best translation within the search space: multi-feature scoring, beam-search, parameter optimization, etc.
- Framework suitable for both resource-rich and resource-poor language scenarios

Statistical XFER MT

Stat-XFER MT Approach

Semantic Analysis

Sentence Planning

Interlingua

Syntactic Parsing

Transfer Rules

Text Generation

Statistical-XFER

Source

(e.g. Quechua)

Target

(e.g. English)

Direct: SMT, EBMT

Statistical XFER MT

בשורה הבאה

Preprocessing

Morphology

Transfer Rules

Language Model + Additional Features

{NP1,3}

NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]

((X3::Y1)

(X1::Y2)

((X1 def) = +)

((X1 status) =c absolute)

((X1 num) = (X3 num))

((X1 gen) = (X3 gen))

(X0 = X1))

Transfer Engine

Translation Lexicon

Decoder

N::N |: ["$WR"] -> ["BULL"]

((X1::Y1)

((X0 NUM) = s)

((Y0 lex) = "BULL"))

N::N |: ["$WRH"] -> ["LINE"]

((X1::Y1)

((X0 NUM) = s)

((Y0 lex) = "LINE"))

Translation Output Lattice

(0 1 "IN" @PREP)

(1 1 "THE" @DET)

(2 2 "LINE" @N)

(1 2 "THE LINE" @NP)

(0 2 "IN LINE" @PP)

(0 4 "IN THE NEXT LINE" @PP)

English Output

in the next line

Statistical XFER MT

Type information

Part-of-speech/constituent information

Alignments

x-side constraints

y-side constraints

xy-constraints,

e.g. ((Y1 AGR) = (X1 AGR))

Transfer Rule Formalism;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]

(

(X1::Y1)

(X1::Y3)

(X2::Y4)

(X3::Y2)

((X1 AGR) = *3-SING)

((X1 DEF = *DEF)

((X3 AGR) = *3-SING)

((X3 COUNT) = +)

((Y1 DEF) = *DEF)

((Y3 DEF) = *DEF)

((Y2 AGR) = *3-SING)

((Y2 GENDER) = (Y4 GENDER))

)

Statistical XFER MT

Value constraints

Agreement constraints

Transfer Rule Formalism (II);SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]

(

(X1::Y1)

(X1::Y3)

(X2::Y4)

(X3::Y2)

((X1 AGR) = *3-SING)

((X1 DEF = *DEF)

((X3 AGR) = *3-SING)

((X3 COUNT) = +)

((Y1 DEF) = *DEF)

((Y3 DEF) = *DEF)

((Y2 AGR) = *3-SING)

((Y2 GENDER) = (Y4 GENDER))

)

Statistical XFER MT

Hebrew Manual Transfer Grammar (human-developed)

- Initially developed in a couple of days, with some later revisions by a CL post-doc
- Current grammar has 36 rules:
- 21 NP rules
- one PP rule
- 6 verb complexes and VP rules
- 8 higher-phrase and sentence-level rules
- Captures the most common (mostly local) structural differences between Hebrew and English

Statistical XFER MT

Hebrew Transfer GrammarExample Rules

{NP1,2}

;;SL: $MLH ADWMH

;;TL: A RED DRESS

NP1::NP1 [NP1 ADJ] -> [ADJ NP1]

(

(X2::Y1)

(X1::Y2)

((X1 def) = -)

((X1 status) =c absolute)

((X1 num) = (X2 num))

((X1 gen) = (X2 gen))

(X0 = X1)

)

{NP1,3}

;;SL: H $MLWT H ADWMWT

;;TL: THE RED DRESSES

NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]

(

(X3::Y1)

(X1::Y2)

((X1 def) = +)

((X1 status) =c absolute)

((X1 num) = (X3 num))

((X1 gen) = (X3 gen))

(X0 = X1)

)

Statistical XFER MT

The XFER Engine

- Input: source-language input sentence, or source-language confusion network
- Output: lattice representing collection of translation fragments at all levels supported by transfer rules
- Basic Algorithm: “bottom-up” integrated “parsing-transfer-generation” guided by the transfer rules
- Start with translations of individual words and phrases from translation lexicon
- Create translations of larger constituents by applying applicable transfer rules to previously created lattice entries
- Beam-search controls the exponential combinatorics of the search-space, using multiple scoring features

Statistical XFER MT

Source-language Confusion Network Hebrew Example

- Input word: B$WRH

0 1 2 3 4

|--------B$WRH--------|

|-----B-----|$WR|--H--|

|--B--|-H--|--$WRH---|

Statistical XFER MT

XFER Output Lattice

(28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')")

(29 29 "SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ")

(29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ")

(29 29 "EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ")

(30 30 "WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ")

(30 30 "FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ")

(30 30 "WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ")

(30 30 "SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ")

(30 30 "SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ")

(30 30 "BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ")

(30 30 "A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,34 'SLAVE')) ) ) ) ")

(30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")

Statistical XFER MT

The Lattice Decoder

- Simple Stack Decoder, similar in principle to simple Statistical MT decoders
- Searches for best-scoring path of non-overlapping lattice arcs
- No reordering during decoding
- Scoring based on log-linear combination of scoring components, with weights trained using MERT
- Scoring components:
- Statistical Language Model
- Fragmentation: how many arcs to cover the entire translation?
- Length Penalty
- Rule Scores
- Lexical Probabilities

Statistical XFER MT

XFER Lattice Decoder

0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL

Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0,

Words: 13,13

235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE')

(NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))>

918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0

(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100

(NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))>

584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A')

(NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>

Statistical XFER MT

Data Elicitation for Languages with Limited Resources

- Rationale:
- Large volumes of parallel text not available create a small maximally-diverse parallel corpus that directly supports the learning task
- Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool
- Elicitation corpus designed to be typologically and structurally comprehensive and compositional
- Transfer-rule engine and new learning approach support acquisition of generalized transfer-rules from the data

Statistical XFER MT

Elicitation Tool:English-Chinese Example

Statistical XFER MT

Elicitation Tool:English-Chinese Example

Statistical XFER MT

Elicitation Tool:English-Hindi Example

Statistical XFER MT

Elicitation Tool:English-Arabic Example

Statistical XFER MT

Elicitation Tool:Spanish-Mapudungun Example

Statistical XFER MT

Designing Elicitation Corpora

- Goal: Create a small representative parallel corpus that contains examples of the most important translation correspondences and divergences between the two languages
- Method:
- Elicit translations and word alignments for a broad diversity of linguistic phenomena and constructions
- Current Elicitation Corpus: ~3100 sentences and phrases, constructed based on a broad feature-based specification
- Open Research Issues:
- Feature Detection: discover what features exist in the language and where/how they are marked
- Example: does the language mark gender of nouns? How and where are these marked?
- Dynamic corpus navigation based on feature detection: no need to elicit for combinations involving non-existent features

Statistical XFER MT

Rule Learning - Overview

- Goal: Acquire Syntactic Transfer Rules
- Use available knowledge from the source side (grammatical structure)
- Three steps:
- Flat Seed Generation: first guesses at transfer rules; flat syntactic structure
- Compositionality Learning:use previously learned rules to learn hierarchical structure
- Constraint Learning: refine rules by learning appropriate feature constraints

Statistical XFER MT

Flat Seed Rule Generation

Statistical XFER MT

Compositionality Learning

Statistical XFER MT

Constraint Learning

Statistical XFER MT

Automated Rule Refinement

- Bilingual informants can identify translation errors and pinpoint the errors
- A sophisticated trace of the translation path can identify likely sources for the error and do “Blame Assignment”
- Rule Refinement operators can be developed to modify the underlying translation grammar (and lexicon) based on characteristics of the error source:
- Add or delete feature constraints from a rule
- Bifurcate a rule into two rules (general and specific)
- Add or correct lexical entries
- See [Font-Llitjos, Carbonell & Lavie, 2005]

Statistical XFER MT

Stat-XFER MT Prototypes

- General Statistical XFER framework under development for past five years (funded by NSF and DARPA)
- Prototype systems so far:
- Chinese-to-English
- Dutch-to-English
- French-to-English
- Hindi-to-English
- Hebrew-to-English
- Mapudungun-to-Spanish
- In progress or planned:
- Brazilian Portuguese-to-English
- Native-Brazilian languages to Brazilian Portuguese
- Hebrew-to-Arabic
- Iñupiaq-to-English
- Urdu-to-English
- Turkish-to-English

Statistical XFER MT

Chinese-English Stat-XFER System

- Bilingual lexicon: over 1.1 million entries (multiple resources, incl. ADSO, Wikipedia, extracted base NPs)
- Manual syntactic XFER grammar:76 rules! (mostly NPs, a few PPs, and reordering of NPs/PPs within VPs)
- Multiple overlapping Chinese word segmentations
- English morphology generation
- Uses CMU SMT-group’s Suffix-Array LM toolkit for LM
- Current Performance (GALE dev-test):
- NW:
- XFER: 10.89(B)/0.4509(M)
- Best (UMD): 15.58(B)/0.4769(M)
- NG
- XFER: 8.92(B)/0.4229(M)
- Best (UMD): 12.96(B)/0.4455(M)
- In Progress:
- Automatic extraction of “clean” base NPs from parallel data
- Automatic learning and extraction of high-quality transfer-rules from parallel data

Statistical XFER MT

Translation Example

- REFERENCE:When responding to whether it is possible to extend Russian fleet's stationing deadline at the Crimean peninsula, Yanukovych replied, "Without a doubt.
- Stat-XFER (0.3989): In reply to whether the possibility to extend the Russian fleet stationed in Crimea Pen. left the deadline of the problem , Yanukovich replied : " of course .
- IBM-ylee (0.2203): In response to the possibility to extend the deadline for the presence in Crimea peninsula , the Queen Vic said : " of course .
- CMU-SMT (0.2067): In response to a possible extension of the fleet in the Crimean Peninsula stay on the issue , Yanukovych vetch replied : " of course .
- maryland-hiero (0.1878): In response to the possibility of extending the mandate of the Crimean peninsula in , replied: "of course.
- IBM-smt (0.1862): The answer is likely to be extended the Crimean peninsula of the presence of the problem, Yanukovych said: " Of course.
- CMU-syntax (0.1639): In response to the possibility of extension of the presence in the Crimean Peninsula , replied : " of course .

Statistical XFER MT

Major Research Directions

- Automatic Transfer Rule Learning:
- From manually word-aligned elicitation corpus
- From large volumes of automatically word-aligned “wild” parallel data
- In the absence of morphology or POS annotated lexica
- Compositionality and generalization
- Identifying “good” rules from “bad” rules
- Effective models for rule scoring for
- Decoding: using scores at runtime
- Pruning the large collections of learned rules
- Learning Unification Constraints

Statistical XFER MT

Major Research Directions

- Extraction of Base-NP translations from parallel data:
- Base-NPs are extremely important “building blocks” for transfer-based MT systems
- Frequent, often align 1-to-1, improve coverage
- Correctly identifying them greatly helps automatic word-alignment of parallel sentences
- Parsers (or NP-chunkers) available for both languages: Extract base-NPs independently on both sides and find their correspondences
- Parsers (or NP-chunkers) available for only one language (i.e. English): Extract base-NPs on one side, and find reliable correspondences for them using word-alignment, frequency distributions, other features…
- Promising preliminary results

Statistical XFER MT

Major Research Directions

- Algorithms for XFER and Decoding
- Integration and optimization of multiple features into search-based XFER parser
- Complexity and efficiency improvements (i.e. “Cube Pruning”)
- Non-monotonicity issues (LM scores, unification constraints) and their consequences on search

Statistical XFER MT

Major Research Directions

- Discriminative Language Modeling for MT:
- Current standard statistical LMs provide only weak discrimination between good and bad translation hypotheses
- New Idea: Use “occurrence-based” statistics:
- Extract instances of lexical, syntactic and semantic features from each translation hypothesis
- Determine whether these instances have been “seen before” (at least once) in a large monolingual corpus
- The Conjecture: more grammatical MT hypotheses are likely to contain higher proportions of feature instances that have been seen in a corpus of grammatical sentences.
- Goals:
- Find the set of features that provides the best discrimination between good and bad translations
- Learn how to combine these into a LM-like function for scoring alternative MT hypotheses

Statistical XFER MT

Major Research Directions

- Building Elicitation Corpora:
- Feature Detection
- Corpus Navigation
- Automatic Rule Refinement
- Translation for highly polysynthetic languages such as Mapudungun and Iñupiaq

Statistical XFER MT

Questions?

Statistical XFER MT

Download Presentation

Connecting to Server..