parallel syntactic annotation of multiple languages l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Syntactic Annotation of Multiple Languages PowerPoint Presentation
Download Presentation
Parallel Syntactic Annotation of Multiple Languages

Loading in 2 Seconds...

play fullscreen
1 / 42

Parallel Syntactic Annotation of Multiple Languages - PowerPoint PPT Presentation


  • 221 Views
  • Uploaded on

Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell , Rebecca Green, Nizar Habash , Stephen Helmreich, Eduard Hovy , Lori Levin , Keith J. Miller , Teruko Mitamura, Florence Reeder, Advaith Siddharthan

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Parallel Syntactic Annotation of Multiple Languages' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
parallel syntactic annotation of multiple languages

Parallel Syntactic Annotation of Multiple Languages

Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Florence Reeder, Advaith Siddharthan

interlingual annotation of multi lingual text corpora iamtc
Interlingual Annotation of Multi-lingual Text Corpora (IAMTC)
  • CMU
    • Lori Levin, Teruko Mitamura
  • Columbia
    • Owen Rambow, Advaith Siddharthan
  • ISI
    • Eduard Hovy
  • MITRE
    • Keith Miller, Flo Reeder
  • New Mexico State University
    • David Farwell, Steven Helmreich
  • University of Maryland
    • Bonnie Dorr, Rebecca Green, Nizar Habash
goals of iamtc
Goals of IAMTC
  • Design an Interlingua
    • Language-independent representation of text meaning
    • Useful for MT, IR, IE, QA,…
  • Develop an Annotation Methodology
    • Manuals, tools, evaluations
  • Annotate multi-lingual, multi-parallel texts
    • Foreign language original and 2 English translations
    • Foreign languages: Arabic, French, Hindi, Japanese, Korean, Spanish
il development three levels
IL Development: Three Levels
  • IL0: syntactic dependency tree
  • IL1: semantic annotations
    • Concepts:
      • ‘senses’ from ISI’s Omega ontology
      • for Nouns, Verbs, Adjs, Advs
    • Semantic Roles
      • Theta Roles from Dorr’s LCS work
  • IL2: reconciliation of different IL1s with same meaning but different syntax:
    • Predicate argument structure
    • Sentence plan: main and embedded clauses
outline
Outline
  • Goals of IAMTC
  • IL0: A deep syntactic dependency representation
    • How and why it is different from other dependency representations
  • Examples:
    • Copula
    • Future tense
    • Causative
    • Light verbs
  • Comparison to other work
    • Prague tectogrammatical representation
    • PropBank
example of il0
Example of IL0

Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

TrEd, Pajas, 1998

il0 design reduce cross linguistic differences
IL0 Design: Reduce cross-linguistic Differences
  • Retain content words
  • Replace function words with syntactic features
    • Tense, definiteness, etc.
  • Retain information about the event and participants
  • Neutralize information about the organization of the information or how it is communicated
il0 features
IL0 Features
  • Parts of Speech
    • Verb, noun, proper noun, adjective, adverb, preposition, conjunction, determiner, aux (modal), punctuation, symbols, speech sounds, misc
  • Features of Nouns
    • Number, Definiteness
  • Features of Verbs
    • Progressive, Perfective, Tense, Mood
summary of il0
No auxiliary verbs

No determiners

Add empty arguments

I want ___ to go

“Undo” passives and clefts

Copular sentences are headed by the predicate

The umbrella is red

Retain causative markers and light verbs only if they affect the argument structure of the sentence or have a literal meaning

Includes syntactic roles (Subj, Obj, IndObj, Mod)

Summary of IL0
annotations done so far
Annotations done so far
  • Annotations of 6 English Texts
  • Each translated from a different source language
  • Two translations of each text
  • 10 – 12 annotators for each text
  • Approximately 144 annotated texts total
il0 annotation manuals
IL0 Annotation Manuals
  • English
  • Arabic
  • French
  • Hindi
  • Japanese
  • Korean
  • Spanish
outline12
Outline
  • Goals of IAMTC
  • IL0: A deep syntactic dependency representation
    • How and why it is different from other dependency representations
  • Examples:
    • Copula
    • Future tense
    • Causative
    • Light verbs
  • Comparison to other work
    • Prague tectogrammatical representation
    • PropBank
copula
Copula
  • English: overt copula
    • The umbrella was red.
  • Arabic: overt copula in past tense
    • kAnat AlmiZl~apu HamrA’F
  • Japanese: optional copula (desu)
    • Kasa wa akai.
future tense
Future Tense

Spanish: Llegará Juan

English: Juan will arrive

causative sentences in english japanese and arabic
Causative Sentences in English, Japanese, and Arabic
  • English: main clause and embedded clause

I made [the cat eat the fish]

  • Japanese: productive causative morpheme

Watashi-wa neko-ni sakana-wo tabe-sase-ta

I TOP cat DAT fish ACC eat CAUSE-PAST

  • Arabic: lexical causatives

>ak~altu AlqiT~apa Alsamakpa

Eat-CAUSE cat.DEF.ACC fish.DEF.ACC

il0 for causative sentences in english japanese and arabic

Make[V,past]

SUBJ

I[N]

OBJ

eat[V]

SUBJ

cat[N,sg,def]

OBJ

fish[N,sg,def]

IL0 for causative sentences in English, Japanese, and Arabic

>ak~al[V,cause,past]

SUBJ

Empty[N]

IOBJ

cat[N,sg,def]

OBJ

fish[N,sg,def]

sase[V,past]

Reduce differences between languages but only to the extent allowed by the syntax, morphology, and lexical items

OBJ

tabe[V]

SUBJ

watashi[N]

OBJ

sakana[N,sg,def]

SUBJ

neko[N,sg,def]

hindi light verbs
Hindi Light Verbs

Hum santre kha gaye

We oranges eat went

“We ate oranges”

hindi light verbs20
Hindi Light Verbs

Ram santra kha-kar jayega

Ram orange eat-then go

“Ram will eat the orange and left”

outline21
Outline
  • Goals of IAMTC
  • IL0: A deep syntactic dependency representation
  • Examples:
    • Copula
    • Future tense
    • Causative
    • Light verbs
  • Comparison to other work
    • Prague tectogrammatical representation
    • PropBank
comparison to other work
Comparison to other work
  • Compared to annotation projects
    • IAMTC is an interlingua project
    • IAMTC annotates multi-lingual, multi-parallel texts in order to reconcile differences between languages
  • Compared to interlingua design projects
    • IAMTC is a corpus driven project
    • IAMTC is an annotation project
comparison to tectogrammatical representation
Comparison to Tectogrammatical Representation
  • IL0 has only syntactic relation labels
    • In IL0: all adjuncts are marked “adj”
  • IL0 retains strongly governed prepositions
    • give X to Y
  • IL0: prepositions are heads
    • But there is some flexibility for each language to decide
comparison to propbank
Comparison to PropBank
  • IAMTC is more syntactic
  • Thematic paraphrases: same arguments filling the same roles for the same verb
    • Load hay on truck/load truck with hay
    • Same in PropBank
    • Different in IL0
il0 differences between languages
IL0 Differences Between Languages
  • Morphological features on nodes different between languages
  • No raising verbs in Arabic, Hindi, Japanese, Korean; raising verbs have no subject

John seems to like beans

  • Serial verbs in Hindi: additional verb with only aspectual meaning (?) treated as dependent on main verb

hum santre kha gaye

we oranges eat went

`We ate the oranges’

il0 differences between languages 2
IL0 Differences Between Languages (2)
  • Morphological causatives in Japanese: causative morpheme is head

私は (猫に 魚を 食べ-) -させた

1sg-TOP (cat-DAT fish-OBJ eat-) -CAUSE-PAST

I made the cat eat the fish

  • Prepositions as heads in all our languages, but probably not others (Czech)
summary what is normalized where
Summary: What is Normalized Where?
  • Syntactic variation: IL0
    • The gangster killed at least 3 innocent bystanders
    • At least 3 innocent bystanders were killed by the gangster
  • Lexical synonymy: IL1
    • The toddler sobbed, and he attempted to console her
    • The baby wailed, and he tried to comfort her
  • Diathesis alternation: IL1 (caveat)
    • The men loaded hay into the trucks
    • The men loaded the trucks with hey
summary what is normalized where30
Summary: What is Normalized Where?
  • Part-of-speech class and derivational morpholgy: IL1/2
    • I was surprised that he destroyed the old house
    • I was surprised by his destruction of the old house
  • Possession: IL1
    • Dubais’s oil, oil of Dubai
  • Clause combination: IL2
    • This is Joe’s new car, which he bought in New York
    • This is Joe’s new car. He bought it in New York
summary what is normalized where31
Summary: What is Normalized Where?

Different argument realizations: IL1/2

    • Bob enjoys playing with his kids
    • Playing with his kids pleases Bob
  • Noun-noun compounds: IL2
    • She loves velvet dresses
    • She loves dresses made of velvet
  • Head switching: IL2
    • Mike Mussina excels at pitching
    • Mike Mussina pitches well
    • Mike Mussina is a good pitcher
summary what is normalized where32
Summary: What is Normalized Where?
  • Overlapping meanings:IL2
    • Lindbergh flew across the Atlantic Ocean
    • Lindbergh crossed the Atlantic Ocean by plane
  • Locus of Negation: IL2
    • I have not bought any cheese
    • I have bought no cheese
summary what is normalized where33
Summary: What is Normalized Where?
  • Light verbs: IL2
    • conduct a tightening = tighten
    • witness a growth rate of = grow by
  • Direct and indirect discourse: IL2
    • said “X”vs. said that X
not normalized at il0 il1 nor il2
Not Normalized at IL0, IL1, nor IL2
  • Logical inferences
    • He’s smarter than everybody else
    • He’s the smartest one
  • Real-World Inference
    • The tight end caught the ball in the end zone
    • The tight end scored a touchdown
  • Different syntactic sentence types, same pragmatic meaning
    • Who composed the Brandenburg Concertos?
    • Tell me who composed the Brandenburg Concertos
not normalized at il0 il1 nor il235
Not Normalized at IL0, IL1, nor IL2
  • Viewpoint variation
    • The U.S.-led invasion/liberation/occupation of Iraq
    • He is getting in the way vs.

He is only trying to help

differences from other projects
Differences from other projects
  • Eurotra, Euro WordNet, UNL
    • Share the goal of defining an interlingua
    • Don’t share the goal of producing an annotated corpus
  • ParGram
    • Grammars for several languages developed in close consultation
    • Based on the assumption of universal grammar
    • Not an annotation project
    • Not corpus based
getting at meaning two translations of korean original text
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)
getting at meaning two translations of korean original text38
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)

black: same words, same meaning

getting at meaning two translations of korean original text39
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)

green: small syntactic differences

getting at meaning two translations of korean original text40
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)

blue: lexical differences

getting at meaning two translations of korean original text41
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)

red: not contained in other text

getting at meaning two translations of korean original text42
Starting on January 1

of next year,

SK Telecomsubscribers

can switch to

less expensive LG Telecom or KTF. …

The Subscribers

cannot switch again

toanother provider

for the first 3 months,

but they can cancel

the switch

in 14 days

if they are not satisfied with services

like voice quality.

Starting January 1st

of next year

customersof SK Telecom

can changetheir service company to

LG Telecom or KTF …

Once a service company swap has been made,

customers

are not allowed to change

companies again

within the first three months,

although they can cancel

the change

anytimewithin 14 days

if problems

such aspoorcall quality

are experienced.

Getting at Meaning(Two translations of Korean original text)

purple: more complex relations