Csa4050 advanced techniques in nlp
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

CSA4050: Advanced Techniques in NLP PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

CSA4050: Advanced Techniques in NLP. Machine Translation II Direct MT Transfer MT Interlingual MT. History – Pre ALPAC. 1952 – First MT Conference (MIT) 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences

Download Presentation

CSA4050: Advanced Techniques in NLP

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Csa4050 advanced techniques in nlp

CSA4050: Advanced Techniques in NLP

Machine Translation II

Direct MT

Transfer MT

Interlingual MT

CSA4050 Machine Translation II


History pre alpac

History – Pre ALPAC

  • 1952 – First MT Conference (MIT)

  • 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences

  • 1954 – 1965 – Much investment into brute force empirical approach – crude word-for-word techniques with limited reshuffling of output

  • ALPAC (Automatic Language Processing Advisory Committee) Report concludes that research funds should be directed into more fundamental linguistic research

CSA4050 Machine Translation II


History post alpac

History – Post ALPAC

  • 1965-1970

    • Operational Systems approach: SYSTRAN (eventually became the basis for babelfish)

    • University centres established in Grenoble (CETA), Montreal and Saarbruecken

  • Systems developed on the basis of linguistic and non-linguistic representations 1970-1990

    • Ariane (Dependency Grammar)

    • TAUM METEO (Metamorphoses Grammars)

    • EUROTRA (multilingual intermediate representations)

    • ROSETTA (Landsbergen) interlingua based

    • BSO (Witkam) – Esperanto

  • 1990- Data Driven Translation Systems

CSA4050 Machine Translation II


Mt methods

MT Methods

MT

Direct MT Rule-Based MT Data-Driven MT

Transfer Interlingua EBMT SMT

CSA4050 Machine Translation II


Basic architecture direct translation

source text

target text

Basic Architecture:Direct Translation

  • Basic idea

  • language pair specific

  • no intermediate representation- pipeline architecture

CSA4050 Machine Translation II


Staged direct mt en jp

Staged Direct MT (En/Jp)

CSA4050 Machine Translation II


Direct translation advantages

Direct TranslationAdvantages

  • Exploits fact that certain potential ambiguities can be left unresolvedwall -wand/mauer – parete/muro

  • Designers can concentrate more on special cases where languages differ.

  • Minimal resources necessary: a cheap bilingual dictionary & rudimentary knowledge of target language suffices.

  • Translation memories are a (successful and much used) development of this approach.

CSA4050 Machine Translation II


Direct translation disadvantages

Direct TranslationDisadvantages

  • Computationally naive

    • Basic model: word-for-word translation + local reordering (e.g. to handle adj+noun order)

  • Linguistically naive:

    • no analysis of internal structure of input, esp. wrt the grammatical relationships between the main parts of sentences.

    • no generalisation; everything on a case-by-case basis.

  • Generally, poor translation

    • except in simple cases where there is lots of isomorphism between sentences.

CSA4050 Machine Translation II


Transfer model of mt

Transfer Model of MT

  • To overcome language differences, first build a more abstract representation of the input.

  • The translation process as such (called transfer) operates upon at the level of the representation.

  • This architecture assumes

    • analysis via some kind of parsing process.

    • synthesis via some kind of generation.

CSA4050 Machine Translation II


Basic architecture transfer model

Basic Architecture:Transfer Model

source

representation

target

representation

transfer

analysis

generation

target text

source text

CSA4050 Machine Translation II


Transfer rules

Transfer Rules

In General there are two kinds of transfer rule:

  • Structural Transfer Rules: these deal with differences in the syntactic structures.

  • Lexical Transfer Rules: these deal with cross lingual mappings at the level of words and fixed phrases.

CSA4050 Machine Translation II


Structural transfer rule

Structural Transfer Rule

NPs(Adjs,Nouns)  NPt(Nount,Adjt)

CSA4050 Machine Translation II


Csa4050 advanced techniques in nlp

intermediate-representation-1

an old man gardening was

  • delete initial there

  • make gardening modify NP

existential-there-sentence

there was an old man gardening

  • reverse order of NP/modifier

intermediate-representation-2

gardening an old man was

  • lexical transfer

japanese-s

niwa no teire o suru ojiisan ita


More structural transfer rules

More Structural Transfer Rules

CSA4050 Machine Translation II


Lexical transfer

Lexical Transfer

  • Easy cases are based on bilingual dictionary lookup.

  • Resolution of ambiguitiesmay require further knowledge

    know  savoirknow  connaître

  • Not necessarily word for wordschimmel  white horse

CSA4050 Machine Translation II


Transfer model

Transfer Model

  • Degree of generalisation depends upon depth of representation:

    • Deeper the representation, harder it is to do analysis or generation.

    • Shallower the representation, the larger the transfer component.

  • Where does ambiguity get resolved?

  • Number of bilingual components can get large.

CSA4050 Machine Translation II


Interlingual translation the vauquois triangle

Interlingual Translation:The Vauquois Triangle

interlingua

increasing depth

analysis

generation

target text

source text

CSA4050 Machine Translation II


Interlingual translation

Interlingual Translation

  • Transfer model requires different transfer rules for each language pair.

  • Much work for multilingual system.

  • Interlingual approach eliminates transfer altogether by creating a language independent canonical form known as an interlingua.

  • Various logic-based schemes have been used to represent such forms.

  • Other approaches include attribute/value matrices called feature structures.

CSA4050 Machine Translation II


Possible feature structure for there was an old man gardening

Possible Feature Structure for “There was an old man gardening”

eventgardening

typeman

agentnumbersg

definitenessindef

aspectprogressive

tensepast

CSA4050 Machine Translation II


Ontological issues

Ontological Issues

  • The designer of an interlingua has a very difficult task.

  • What is the appropriate inventory of attributes and values?

  • Clearly, the choice has radical effects on the ability of the system to translate faithfully.

  • For instance, to handle the muro/parete distinction, the internal/external characteristic of the wall would have to be encoded.

CSA4050 Machine Translation II


Feature structure for muro

Feature Structure for “muro”

wordmuro

syntaxPOSclass noun

type count

fieldbuildings

semanticstypestructural

positionexternal

CSA4050 Machine Translation II


Interlingual approach pros and cons

Interlingual Approach Pros and Cons

  • Pros

    • Portable (avoids N2 problem)

    • Because representation is normalised structural transformations are simpler to state.

    • Explanatory Adequacy

  • Cons

    • Difficult to deal with terms on primitive level:

    • universals?

    • Must decompose and reassemble concepts

    • Useful information lost (paraphrase)

  • In practice, works best in small domains.

CSA4050 Machine Translation II


  • Login