Interlingua methodology
Download
1 / 59

Interlingua Methodology - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Interlingua Methodology. Directly obtain the meaning of the source sentence. Do target sentence generation from the meaning representation. John gave the book to Mary. Meaning representation: give-action: agent: john object: the book receiver: mary.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Interlingua Methodology' - ramla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Interlingua methodology
Interlingua Methodology

Directly obtain the meaning of the source sentence.

Do target sentence generation from the meaning

representation.

John gave the book to Mary.

Meaning representation:

give-action:

agent: john

object: the book

receiver: mary


Competing approaches
Competing approaches

Direct

Transfer based


Direct approach
Direct approach

  • Word replacements

    I like mangoes

    maOM AcCa laga Aama

    I like (root) mangoes

  • Morphology

    maOM AcCa lagata Aama

    I like mangoes

  • Syntactic re-arrangement

    maOM Aama AcCa lagata hO

    I mangoes like

  • Semantic embellishment

    mauJao Aama AcCa lagata hO

    I (dative) mangoes like


Transfer based
Transfer Based

Source sentence processed for parsing, chunking etc.

S

VP

NP

V

NP

I

like

mangoes


Transfer based1
Transfer Based

Transfer structures obtained for the target sentence.

S

VP

NP

NP

V

I

mangoes

like


Transfer based2
Transfer Based

Morphology and language specific modifications

S

VP

NP

NP

V

mauJao

AcCa lagataa hO

Aama



Relation between the transfer and the interlingua models

Interlingua

Relation Between the Transfer and the Interlingua Models

Source language

Parse tree

Target Language

Parse tree

Interpretation generation

transfer

Parsing generation

Target language

words

source language

words


State of affairs
State of Affairs

  • Systran reports 19 different language

    pairs.

  • 8 alright for intended use.

  • Even fewer are capable of quality written

    or spoken text translation.


English spanish english
ENGLISH-SPANISH-ENGLISH

  • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province

  • ... en ese imperio, el arte de la cartografía logró tal perfección que el mapa de una sola provincia ocupó la totalidad de una ciudad, y el mapa del imperio, la totalidad de una provincia

  • ... in that empire, the art of the cartography obtained such perfection that the map of a single province occupied the totality of a city, and the map of the empire, the totality of a province

Provided by Systran on 19/11/02


English korean english
ENGLISH-KOREAN-ENGLISH

  • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province

  • 저 제국안에, 단순한 지방의 지도가 도시의 완전을 점유했다 고 Cartography의 예술은 같은 얀벽,및 제국, 지방의 완전의 지도 를 달성했다

  • Inside that empire, the map of the region where it is simple occupied the perfection of the city the art of the Cartography is same, yan it attained the map of of perfection of the wall and empire and region

Provided by Systran on 19/11/02


Unl based mt the scenario
UNL Based MT: the scenario

ENGLISH

RUSSIAN

ENCONVERSION

UNL

DECONVERSION

HINDI

FRENCH


Universal Networking Language

  • Common language for computers to express

    information written in natural language

  • (Uchida et. al. 2000)

  • Application:

    • Electronic language to overcome language barrier

    • Information Distribution System


Unl example
UNL Example

arrange

agt obj plc

residence

meeting

John


Components of the unl system
Components of the UNL System

  • Universal Word

  • Relation Labels

  • Attributes


Universal word
Universal Word

[saayaa] "shadow(icl>darkness)"; the place was now in shadow

[laoSamaa~] "shadow(icl>iota)"; not a shadow of doubt about his guilt

[saMkot] "shadow(icl>hint)" ; the shadow of the things to come

[Cayaa] "shadow(icl>deterrant)"; a shadow over his happiness


Universal word foreign concepts
Universal Word (foreign concepts)

[aput] "snow(icl>thing)";

[pukak] "snow(aoj<salt like)";

[mauja] "snow(aoj<soft, aoj<deep)";

[massak] "snow(aoj<soft)";

[mangokpok] "snow(aoj<watery)";


Relation
Relation

agt (agent) Agt defines a thing which initiates an action.

agt (do, thing)

Syntaxagt[":"<Compound UW-ID>] "(" {<UW1>|":"<Compound UW-ID>} "," {<UW2>|":"<Compound UW-ID>} ")"

Detailed DefinitionAgent is defined as the relation between:UW1 - do, andUW2 - a thingwhere:

UW2 initiates UW1, or

UW2 is thought of as having a direct role in making UW1 happen.

Examples and readingsagt(break(icl>do), John(icl>person)) John breaksagt(translate(icl>do), computer(icl>machine)) computer translates


Attributes
Attributes

  • Used to describe what is said from the

    speaker's point of view.

  • In particular captures number, tense,

    aspect and modality information.


Example attributes
Example Attributes


The analyser machine

Analysis

Rules

Enconverter

Dictionary

ni-1

ni+3

Node List

ni

ni+1

ni+2

C

C

C

A

A

A

D

Node-net

C

B

E

The Analyser Machine


Strategy for analysis
Strategy for Analysis

  • Morphological Analysis

  • Syntactico-Semantic Analysis


Analysis of a simple sentences
Analysis of a simple sentences

<< A Report of John’s genius reached King’s ears>>

articleandnounare combined [email protected] added to the noun.

<<[Report ][of] John’s genius reached king’s ears>>

Right shift to put preposition with the succeeding noun.

<</Report /[of ][John’s] genius reached king’s ears>>

Ram’s being a possessing noun, shift right.

<</Report //of / [John’s] [genius] reached king’s ears>>

These two nouns are resolved into relation pos and first noun is deleted:


Simple sentence continued
Simple sentence (continued)

<</Report /[of][genius] reached King’s ears>>

The preposition of is then combined with noun and a dynamic attribute OFRES is

added to entry of genius.

<<[Report][of genius ] reached King’s ears>>

Using the attribute OFRES these two nouns are resolved to relation mod and the

second noun is deleted.

<<[Report ][reached] King’s ears>>

Shift right again and solve King’s ears, relation pof is generated.  

<</Report /[reached][ ears]>>

Relation obj is generated here and then relation agt is generated between Report

and ears

<</reached />>


Unl as interlingua and language divergence dave parikh bhattacharyya jmt 2003
UNL as Interlingua and Language Divergence(Dave, Parikh, Bhattacharyya, JMT, 2003)

  • Stands for the discrepancy in representation

    due to the inherent characteristics of the

    languages.

  • Syntactic Divergence

  • Lexical Semantic Divergence


Issue of free word order
Issue of free word order

jaIma nao caaorI krnaovaalao laD,ko kao laazI sao maara.

jaIma nao laazI sao caaorI krnaovaalao laD,ko kao maara.

caaorI krnaovaalao laD,ko kao jaIma nao laazI sao maara.

caaorI krnaovaalao laD,ko kao laazI sao jaIma nao maara.

laazI sao jaIma nao caaorI krnaovaalao laD,ko kao maara.

  • Use made of the fact that in Hindi post positions stay adjacent to nouns (opposed to the preposition stranding divergence).

  • Flexibility in parsing- hit and preserve the predicate till the end.


Conjunct and compound verbs
Conjunct and Compound verbs

Typical Indian language phenomenon. Conjunct for verb-verb,

compound for other POS+verb.

vah gaanao lagaI

She started singing

H calao jaaAao

E Go away.

H $k jaaAao

E Stop there.

H Jauk jaaAao

E Bend down.

Possibility of combinatorial explosion in the lexicon. Possible

solution: wordnet?


Use of lexical resources

Use of Lexical Resources

Automatic Generation of the UW to language dictionary

(Verma and Bhattacharyya, Global Wordnet Conference, Czeck Republic, 2004)

Universal Word generation

Semantic attribute generation

Heavy use of wordnets and ontologies



Conclusions
Conclusions

  • Predicate preservation strategy used for

    English, Hindi, Marathi, Bengali (Spanish

    being added).

  • Focus in marathi on morphology for

    Marathi.

  • Focus on kaarak (case) system for Bengali.

  • Extremely lexical knowledge hungry.


Conclusions1
Conclusions

  • Work going on in the creation of Indian language wordnets (Hindi, Marathi in IIT Bombay; Dravidian in Anna University).

  • Interlingua has a the attractive possibility of being used as a knowledge representation and applying to interesting applications like summarization, text clustering, meaning based multilingual search engines.


Generation of the hindi case system in an interlingua based mt framework

Generation of the Hindi Case System in an Interlingua based MTFramework

Debasri Chakrabarti, Sunil Kumar Dubey, Pushpak Bhattacharyya.

Computer Science and Engineering Department,

Indian Institute of Technology, Bombay,

Mumbai, 400076, India.

debasri,dubey,[email protected]


Introduction
Introduction MT

  • Role of the case marker in a language

    • plays an important role in the structure of a sentence

    • helps to impart the meaning and naturalness

  • Example

    *मोटे तौर पर कृषि भूमि की जुताई, फसलों की रुपाई, कटाई, पालतू पशु

    प्रजनन, पालन, दुग्ध-व्यवसाय और वनीकरण सम्मिलित होता है ।

    In a broad sense, agriculture includes cultivation of the soil and

    growing and harvesting crops and breeding and raising livestock

    and dairying and forestry.


The case system in hindi
The Case System in Hindi MT

  • Hindi is characterized by a rich subsystem of case

  • Example:

    राम ने रवि को किताब दी।

    Ram Erg Ravi Dat book Nom give + pastRam gave a book to Ravi.

  • Hindi has the following cases

    nominative, ergative, accusative, instrumental, dative, genitive locative








Nominative ergative alternation in the agent position
Nominative ~ Ergative alternation in the agent position MT

  • agent of an action may bear either nominative case or ergative case

  • ergative case appears in Hindi

    • simple past form

    • perfective aspect


Examples
Examples MT

  • राम ने रवि को पीटा।

    Ram erg Ravi acc beat+past

    Ram beat Ravi.

  • राम ने रवि को पीटा था।

    Ram erg Ravi accbeat+past perfect

    Ram had beaten Ravi.

  • राम ने रवि को पीटा है।

    Ram erg Raviacc beat+present perfect

    Ram has beaten Ravi.


Observations
Observations MT

  • There is a correlation between the ergative case and the aspectual property of the main verb

  • This is morphologically overt on the verb

    • Simple Past Tense: पीटा

    • Perfective Aspect: पीटा था

  • Morphological Rule

    • Simple Past Tense: V + आ  ने

    • Perfective Aspect: V + आ + (Tense morphology) ने


  • Nominative ergative alternation
    Nominative ~ Ergative Alternation MT

    • Some Complex Phenomena

      • nominative case on the agent with the mentioned aspectual features

  • IS nominative ~ ergative subject to transitivity?

    • language universally transitivity determines nom ~ erg

    • three types of patterns independent of transitivity in Hindi


  • Nominative ergative alternation1
    Nominative ~ Ergative Alternation MT

    • Three patterns are:

      • only nom agents

      • only erg agents

      • either nom or erg agents

  • Examples of Intransitive verbs

    • Only nom agents

      i) राम गिरा। Ram fell down

      Ram +nomfall + past.

      ii) *राम ने गिरा।

      Ram erg f all + past


  • Intransitive verbs
    Intransitive Verbs MT

    • Only erg agents

      i)राम ने प्रतीक्षा की। Ram waited.

      Ram ergwait + past.

      ii)*राम प्रतीक्षा किया।

      Ram +nomwait + past.

    • Either nom or erg agents

      i)राम खेला। Ram played.

      Ram +nom play + past.

      ii)राम ने खेला।

      Ramerg play + past.


    Transitive verbs
    Transitive Verbs MT

    • Only nom agents

      i)राम शीशा लाया। Ram brought the glass.

      Ram +nomglass bring + past.

      ii) *राम ने शीशा लाया।

      Ramergglassbring + past.

    • Only erg agents

      i) राम ने शीशा तोड़ा। Ram broke the glass.

      Ram ergglassbreak + past.

      ii) *राम शीशा तोड़ा।

      Ram +nom glassbreak + past.


    Transitive verbs1
    Transitive Verbs MT

    • Either nom or erg agents

      i) राम ने समझा कि घर मेरा है।

      Ram erg think + past thathousemineis.

      Ram thought that the house is mine.

      ii)राम समझा कि घर मेरा है।

      Ram think + past thathousemineis.


    Inferences
    Inferences MT

    • Ergative case in Hindi is semantically driven

      • action performed deliberately : ergative case

      • action performed non deliberately: nominative

        case

    • Examples of deliberate and non-deliberate action

      राम गिरा।Ram fell down

      Ram +nomfall + past.

      राम ने मोहन कोगिराया। Ram made Mohan to fall down.

      Ram ergMohan acc cause to fall down


    Accusative nominative alternation in the object
    Accusative ~ Nominative Alternation in the Object MT

    • Primary objects in Hindi

      • either accusative : को

      • or nom uninflected : Ө

  • Examples

    राम ने चावल खाया। Ram ate rice

    Ramergrice + nom eat+ past.

    राम ने रावण को मारा। Ram killed Ravan.

    RamergRavan acckill + past.


  • Accusative nominative alternation
    Accusative ~ Nominative Alternation MT

    • Generalization

      • animate objects are accusative

      • inanimate objects are nominative

    • Counter examples of this generalization

      • accusative case with the inanimate objects

        राम ने किताब को उठाया ।Ramlifted the book.

        Ram ergbookacc lift + past.

        राम ने किताब उठाई ।Ramlifted a book.

        Ramergbooklift + past.


    Summarization of the theoretical approach
    Summarization of the theoretical approach MT

    • NOM~ERG Alternation

      • subject to the semantic property of the verb

      • this semantic property conscious choice

    • NOM~ACC Alternation

      • subject to animacy

      • subject to definiteness


    How to generate the case markers in the unl system
    How to generate the Case Markers in the UNL System MT

    • Three components to the generation system

      • Lexicon

      • Rule Base

      • UNL Expression

  • Lexicon

    • attribute for a verb is taken from a verb hierarchy

    • attribute of conscious choice is [DLBRT-ACT]

    • [DLBRT-ACT] stands for deliberate action


  • Case markers in the unl system
    Case Markers in the UNL System MT

    • Rule Base

      • Hindi is a SOV language

      • a frequently used rule in Hindi is left insertion rules

      • a child node is mostly always inserted to the left of the parent node given in the UNL expression

  • Format for the Rule

    :"<COND1>:<ACTION1>:<RELATION1>:<ROLE1>"{<COND2>:<ACTION2>:<RELATION2>:<ROLE2>}


  • <<SHEAD>> MT

    <<STAIL>>

    [email protected]

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    <<SHEAD>>

    [email protected]

    सचिन

    <<STAIL>>

    <<STAIL>>

    खा@entry

    खा@entry

    <<STAIL>>

    <<STAIL>>

    चावल

    चावल

    <<STAIL>>

    चावल

    चावल

    <<STAIL>>

    <<STAIL>>

    <<STAIL>>

    <<STAIL>>

    खा@entry

    खा@entry

    खा@entry

    खा@entry

    <<STAIL>>

    <<STAIL>>

    <<STAIL>>

    <<STAIL>>

    सचिन

    Sachin

    सचिन

    [email protected]

    [email protected]

    सचिन

    [email protected]

    [email protected]

    सचिन

    [email protected]

    सचिन

    [email protected]

    agt

    obj

    Rice

    Rice

    Rice

    Rice

    Rice

    Rice

    obj

    obj

    obj

    obj

    obj

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    G

    Sachin


    Interpretation of a rule
    Interpretation of a rule MT

    • Example of a left insertionrule

      :"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@past,DLBRT-ACT,^@progress:+!agt::}P242;

      • {} indicates parent node

      • “ ” indicates child node

      • condition of the parent node

        V,>agt,@past,DLBRT-ACT,^@progress

      • condition of the child node

        <agt

      • priority is denoted by P followed by a priority number

        P242


    Generation of the case marker on the agent
    Generation of the Case marker on the Agent MT

    • Rules for the generation system to handle the case of the agent

      R1

      :"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@past,DLBRT- ACT,^@progress:+!agt::}P242;

      R2

      :"<agt:+blk,+agt,+sufc:agt:"{V,>agt,@past,^@progress:!agt::}P241;

      • Priority plays an important role in generation

      • R1  Ergative

      • R2  Nominative


    Generation of the case marker on the object
    Generation of the Case marker on the Object MT

    • Rules for the generation system to handle the case of the object

      R5 

      :"<obj,INANI,MALE,^V,^SCOPE:+obj,+sufc,+blk:obj:"{>obj:+!obj,+male::}P160;

      R6 

      :"<obj,ANIMT,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!ko:obj:"{>obj:+!obj,+male::}P160;

      R7 

      :"<obj,@def,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!ko:obj:"{>obj:+!obj,+male::}P163;


    Conclusion
    Conclusion MT

    • Result

      • provides accuracy in the generation of case markers for the UNL relations (see table)

      • lends naturalness in the generation of the Hindi sentences

      • This alternation is extended for the pronominal cases

    • Future Work

      • enhance the study for Dative and Genitive case markers

        and their corresponding UNL relations

        Demo


    ad