Machine translation and lexical resources activity at iit bombay
Download
1 / 29

machine translation and lexical resources activity at iit bombay - PowerPoint PPT Presentation


  • 219 Views
  • Uploaded on

Machine Translation and Lexical Resources Activity at IIT Bombay. Pushpak Bhattacharyya Computer Science and Engineering Department Indian Institute of Technology Bombay [email protected] http://www.cse.iitb.ac.in/pb. Interlingua Methodology.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'machine translation and lexical resources activity at iit bombay' - betty_james


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Machine translation and lexical resources activity at iit bombay l.jpg

Machine Translation and Lexical Resources Activity at IIT Bombay

Pushpak Bhattacharyya

Computer Science and Engineering Department

Indian Institute of Technology Bombay

[email protected]

http://www.cse.iitb.ac.in/pb


Interlingua methodology l.jpg
Interlingua Methodology Bombay

Directly obtain the meaning of the source sentence.

Do target sentence generation from the meaning representation.

John gave the book to Mary.

Meaning representation:

give-action:

agent: john

object: the book

receiver: mary


Competing approaches l.jpg
Competing approaches Bombay

Direct

Transfer based



State of affairs l.jpg
State of Affairs Bombay

  • Systran reports 19 different langauge pairs.

  • 8 alright for intended use.

  • Even fewer are capable of quality written or spoken text translation.


English spanish english l.jpg
ENGLISH-SPANISH-ENGLISH Bombay

  • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province

  • ... en ese imperio, el arte de la cartografía logró tal perfección que el mapa de una sola provincia ocupó la totalidad de una ciudad, y el mapa del imperio, la totalidad de una provincia

  • ... in that empire, the art of the cartography obtained such perfection that the map of a single province occupied the totality of a city, and the map of the empire, the totality of a province

Provided by Systran on 19/11/02


English korean english l.jpg
ENGLISH-KOREAN-ENGLISH Bombay

  • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province

  • 저 제국안에, 단순한 지방의 지도가 도시의 완전을 점유했다 고 Cartography의 예술은 같은 얀벽,및 제국, 지방의 완전의 지도 를 달성했다

  • Inside that empire, the map of the region where it is simple occupied the perfection of the city the art of the Cartography is same, yan it attained the map of of perfection of the wall and empire and region

Provided by Systran on 19/11/02


Unl based mt the scenario l.jpg
UNL Based MT: the scenario Bombay

ENGLISH

RUSSIAN

ENCONVERSION

UNL

DECONVERSION

HINDI

FRENCH


Slide9 l.jpg

Universal Networking Language Bombay

Common language for computers to express information written in natural language

(Uchida et. al. 2000)

Application:

Electronic language to overcome language barrier

Information Distribution System


Unl example l.jpg
UNL Example Bombay

arrange

agt obj plc

residence

meeting

John


Components of the unl system l.jpg
Components of the UNL System Bombay

  • Universal Word

  • Relation Labels

  • Attributes


Universal word l.jpg
Universal Word Bombay

[saayaa] "shadow(icl>darkness)"; the place was now in shadow

[laoSamaa~] "shadow(icl>iota)"; not a shadow of doubt about his guilt

[saMkot] "shadow(icl>hint)" ; the shadow of the things to come

[Cayaa] "shadow(icl>deterrant)"; a shadow over his happiness


Universal word foreign concepts l.jpg
Universal Word Bombay(foreign concepts)

[aput] "snow(icl>thing)";

[pukak] "snow(aoj<salt like)";

[mauja] "snow(aoj<soft, aoj<deep)";

[massak] "snow(aoj<soft)";

[mangokpok] "snow(aoj<watery)";


Relation l.jpg
Relation Bombay

agt (agent) Agt defines a thing which initiates an action.

agt (do, thing)

Syntaxagt[":"<Compound UW-ID>] "(" {<UW1>|":"<Compound UW-ID>} "," {<UW2>|":"<Compound UW-ID>} ")"

Detailed DefinitionAgent is defined as the relation between:UW1 - do, andUW2 - a thingwhere:

UW2 initiates UW1, or

UW2 is thought of as having a direct role in making UW1 happen.

Examples and readingsagt(break(icl>do), John(icl>person)) John breaksagt(translate(icl>do), computer(icl>machine)) computer translates


Attributes l.jpg
Attributes Bombay

  • Used to describe what is said from the speaker's point of view.

  • In particular captures number, tense, aspect and modality information.


Example attributes l.jpg
Example Attributes Bombay

  • I see a flower

    UNL: obj(see(icl>do), flower(icl>thing))

  • I saw flowers

    UNL: obj(see(icl>do).@past, flower(icl>thing).@pl)

  • Did I see flowers?

    UNL: obj(see(icl>do).@past.@interrogative,

    flower(icl>thing).@pl)

  • Please see the flowers?

    UNL: obj(see(icl>do).@past.@request,

    flower(icl>thing).@pl.@definite)


The analyser machhine l.jpg

Analysis Bombay

Rules

Enconverter

Dictionary

ni-1

ni+3

Node List

ni

ni+1

ni+2

C

C

C

A

A

A

D

Node-net

C

B

E

The Analyser Machhine


Strategy for analysis l.jpg
Strategy for Analysis Bombay

  • Morphological Analysis

  • Syntactico-Semantic Analysis


Analysis of a simple sentences l.jpg
Analysis of a simple sentences Bombay

<< A Report of John’s genius reached King’s ears>>

articleandnounare combined [email protected] added to the noun.

<<[Report ][of] John’s genius reached king’s ears>>

Right shift to put preposition with the succeeding noun.

<</Report /[of ][John’s] genius reached king’s ears>>

Ram’s being a possessing noun, shift right.

<</Report //of / [John’s] [genius] reached king’s ears>>

These two nouns are resolved into relation pos and first noun is deleted:


Simple sentence continued l.jpg
Simple sentence (continued) Bombay

<</Report /[of][genius] reached King’s ears>>

The preposition of is then combined with noun and a dynamic attribute OFRES is added to entry of genius.

<<[Report][of genius ] reached King’s ears>>

Using the attribute OFRES these two nouns are resolved to relation mod and the second noun is deleted.

<<[Report ][reached] King’s ears>>

Shift right again and solve King’s ears, relation pof is generated.  

<</Report /[reached][ ears]>>

Relation obj is generated here and then relation agt is generated between Reportand ears

<</reached />>


Unl as interlingua and language divergence dave parikh bhattacharyya jmt 2003 l.jpg
UNL as Interlingua and Language Divergence Bombay(Dave, Parikh, Bhattacharyya, JMT, 2003)

  • Stands for the discrepancy in representation due to the inherent characteristics of the languages.

  • Syntactic Divergence

  • Lexical Semantic Divergence


Issue of free word order l.jpg
Issue of free word order Bombay

jaIma nao caaorI krnaovaalao laD,ko kao laazI sao maara.

jaIma nao laazI sao caaorI krnaovaalao laD,ko kao maara.

caaorI krnaovaalao laD,ko kao jaIma nao laazI sao maara.

caaorI krnaovaalao laD,ko kao laazI sao jaIma nao maara.

laazI sao jaIma nao caaorI krnaovaalao laD,ko kao maara.

  • Use made of the fact that in Hindi post positions stay adjacent to nouns (opposed to the preposition stranding divergence).

  • Flexibility in parsing- hit and preserve the predicate till the end.


Conjuct and compound verbs l.jpg
Conjuct and compound verbs Bombay

Typical Indian language phenomenon. Conjunct for verb-verb, compound for other POS+verb.

vah gaanao lagaI

She started singing

H calao jaaAao

Go away.

H $k jaaAao

E Stop there.

H Jauk jaaAao

E Bend down.

Possibility of combinatorial explosion in the lexicon. Possible solution: wordnet?


Use of lexical resources l.jpg

Use of Lexical Resources Bombay

Automatic Generation of the UW to language dictionary

(Verma and Bhattacharyya, Global Wordnet Conference, Czeck Republic, 2004)

Universal Word generation

Semantic attribute generation

Heavy use of wordnets and ontologies


Wordnet and lexical resources l.jpg

Wordnet and Lexical Resources Bombay

Approximately 12000 Hindi synsets corresponding to about 35000 root words of Hindi.

Approximately 7000 Hindi synsets corresponding to about 16000 root words of Hindi.

Verb Hierarchy of approximately 4000 unique words corresponding to 6000 senses.


Wordnet sub graph l.jpg
WordNet Sub-Graph Bombay

saMrcanaa

Hyponymy

Aavaasa , inavaasa

Hypernymy

Meronymy

rsaao[-Gar

Hyponymy

Aa^Mgana

Sayana kxa

M

e

r

o

n

y

m

y

Gar , gaRh

Gloss

baramada

manauYyaaoM ka Cayaa huAa vah sqaana jaao dIvaaraoM sao Gaor kr banaayaa jaata hO

Hyponymy

AQyana kxa

Aitiqa gaRh

AaEama

JaaopD,I



Conclusions l.jpg
Conclusions Bombay

  • Predicate preservation strategy used for English, Hindi, Marathi, Bengali (Spanish being added).

  • Focus in marathi on morphology for Marathi.

  • Focus on kaarak (case) system for Bengali.

  • Extremely lexical knowledge hungry.


Conclusions29 l.jpg
Conclusions Bombay

  • Work going on in the creation of Indian language wordnets (Hindi, Marathi in IIT Bombay; Dravidian in Anna University).

  • Interlingua has a the attractive possibility of being used as a knowledge representation and applying to interesting applications like summarization, text clustering, meaning based multilingual search engines.


ad