Parsing estonian with constraint grammar
Download
1 / 19

Parsing Estonian with Constraint Grammar - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Parsing Estonian with Constraint Grammar. Kaili Müürisep Institute of Cybernetics at Tallinn Technical University. Outline. Background Constraint Grammar framework Morphological disambiguation Syntactic analysis Results Applications Future work. Background. Project started in 1995/96

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Parsing Estonian with Constraint Grammar' - blythe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Parsing estonian with constraint grammar

Parsing Estonian with Constraint Grammar

Kaili Müürisep

Institute of Cybernetics at Tallinn Technical University


Outline
Outline

  • Background

  • Constraint Grammar framework

  • Morphological disambiguation

  • Syntactic analysis

  • Results

  • Applications

  • Future work


Background
Background

  • Project started in 1995/96

  • Two grammar-writers:

    • morphological disambiguation - Tiina Puolakainen

    • syntax - Kaili Müürisep


Constraint grammar
Constraint Grammar

  • proposed by Fred Karlsson 1990 (University of Helsinki)

  • employs surface-near dependency-oriented syntax

  • rule-based

  • integrates morphological disambiguation and shallow syntactic analysis


Cg parsing scheme

Morphological analysis

CG - Parsing Scheme

Input text

Identification of clause boundaries

Morphological disambiguation

Determination of syntactic functions

Analysed text


Morphologically analyzed sentence
Morphologically analyzed sentence

Eesti

Eesti+0 //_S_ prop sg gen #cap // Estonia

Eesti+0 //_S_ prop sg nom #cap //

eesti+0 //_G_ #cap // Estonian

vanimad

vanim+d //_A_ super pl nom // oldest

asukad

asukas+d //_S_ com pl nom // dwellers

saabusid arrived

saabu+sid //_V_ main indic impf ps2 sg ps af #Intr //

saabu+sid //_V_ main indic impf ps3 pl ps af #Intr //

siia

siia+0 //_D_ // here

siig+0 //_S_ com sg gen // whitefish


pärast

pärast+0 //_D_ // afterwards

pärast+0 //_K_ post #gen // after

pärast+0 //_K_ pre #part //

pärane+t //_A_ pos sg part //

pära+st //_S_ com sg el // residue or stern

viimast

viimane+t //_A_ pos sg part // last

vii+mast //_V_ main sup ps el #NGP-P // take, lead ...

jääaega

jää_aeg+0 //_S_ com sg adit // ice-age

jää_aeg+0 //_S_ com sg part //

$.

. //_Z_ Fst //


Morphologically disambiguated sentence
Morphologically disambiguated sentence

Eesti

Eesti+0 //_S_ prop sg gen #cap //

vanimad

vanim+d //_A_ super pl nom //

asukad

asukas+d //_S_ com pl nom //

saabusid

saabu+sid //_V_ main indic impf ps3 pl ps af #Intr //

siia

siia+0 //_D_ //

pärast

pärast+0 //_K_ pre #part //

viimast

viimane+t //_A_ pos sg part //

jääaega

jää_aeg+0 //_S_ com sg part //


After adding syntactic labels
After adding syntactic labels

Eesti

Eesti+0 //_S_ prop sg gen #cap//**CLB @OBJ @ADVL @NN>

vanimad

vanim+d//_A_ super pl nom // @ADVL @AN> @<AN @PRD

asukad

asukas+d //_S_ com pl nom [email protected] @PRD @OBJ @NN> @<NN @ADVL @<Q

saabusid

saabu+sid//_V_ main indic impf ps3 pl ps af #Intr // @+FMV

siia

siia+0//_D_ // @ADVL @AD> @<AD

pärast

pärast+0 //_K_ pre #part // @ADVL @PN> @<PN

viimast

viimane+t //_A_ pos sg part // @AN> @<AN @ADVL

jääaega

jää_aeg+0 //_S_ com sg part // @SUBJ @OBJ @ADVL @<Q @NN> @<NN @<P


Syntactically analyzed sentence
Syntactically analyzed sentence

Eesti

Eesti+0 //_S_ prop sg gen #cap // **CLB @NN>

vanimad

vanim+d //_A_ super pl nom // @AN>

asukad

asukas+d //_S_ com pl nom // @SUBJ

saabusid

saabu+sid //_V_main indic impf ps3 pl ps af #Intr // @+FMV

siia

siia+0 //_D_ // @ADVL

pärast

pärast+0 //_K_ pre #part // @ADVL

viimast

viimane+t //_A_ pos sg part // @AN>

jääaega

jää_aeg+0 //_S_ com sg part // @<P


Actually
Actually ...

Eesti @NN>

vanimad @AN>

asukad @SUBJ

saabusid

saabu+sid //_V_main indic impf ps3 pl ps af #Intr // @+FMV

siia

siia+0 //_D_ // @ADVL

pärast

pärast+0 //_K_ pre #part // @ADVL

viimast

viimane+t //_A_ pos sg part // @AN>

vii+mast //_V_ main sup ps el #NGP-P // @ADVL

jääaega

jää_aeg+0 //_S_ com sg part // @<P @OBJ

jää_aeg+0 //_S_ com sg adit // @ADVL


Morphological disambiguation
Morphological disambiguation

  • Morphological analyser of Estonian assigns adequate morphological descriptions to about 99% of tokens in a text.

  • In morphologically analysed Estonian text over 45% of all words are ambiguous and have 2 – 15 readings.

  • > 1125 constraints

  • 85-90 % of words become morphologically unambiguous and the error rate of the disambiguator is less than 2 %.


Morphological disambiguation 2
Morphological disambiguation (2)

  • The major ambiguities are between:

    • The adjectival and verbal readings of participles

    • The nominative, genitive, partitive and short illative cases of a noun.

    • The adposition, adverb and noun readings.

  • Some coincidences:

    sai (white bread, got), viis (five, melody, carried), tee (tea, road, do!),või (butter, or, may),

    tuli (fire - light, came)


Morphological disambiguation 3
Morphological disambiguation (3)

Most difficult is disambiguate between nominative, genitive, partitive and short illative cases:

(1) maailma-GEN juhtivad majandusriigid

the leading economic states of the world

(2) maailma-PART juhtivad majandusriigid

the economic states leading the world

(3) maailma-ILLAT juhtivad majandusriigid

the economic states leading into the world


Determination of syntactic functions
Determination of syntactic functions

  • 27 syntactic tags (subject, object, adverbial etc)

  • no direct connection between attribute and head

    professori (@NN>) nahast (@NN>) portfell

    professor-GEN leather-ELAT portfolio

  • > 1300 rules

  • 83-90% of words become syntactically unambiguous

  • Correctness is 96.5 - 98.5%


Syntactic disambiguation problems
Syntactic disambiguation - problems

  • Adverbial versus adverbial attributes

    • Mees sai siiski pidada ühendust mobiiltelefoniga (@ADVL @NN> @<NN) Kosovos sõdivate poegadega.

    • Man could still keep connection with_mobile-phone in_Kosovo fightening with_sons.

  • Object in genitive or attribute

    • Ta asetas mantli (gen @OBJ @NN>) tooli (gen @OBJ @NN>) seljatoele (@ADVL @<NN)

    • He put coat-GEN chair-GEN back-ALLAT.

    • 'He put the coat onto the back of a chair.'


Syntactic disambiguation errors
Syntactic disambiguation - errors

  • One clause divides the other into two parts:

  • Seega oli samm, mille astus Eesti, palju pikem ja otsustavam.

  • Thus the step, that Estonia took, was bigger and more decisive.

  • Ellipsis

  • Determination of apposition, quantifiers


Applications
Applications

  • Automatic summary generator

  • Noun phrase detector

  • Linguistic research

  • Promising fields of applications:

    • Information retrieval

    • Text-to-speech synthesis

    • Grammar and style checker

    • Machine translation, translation aids


Future work
Future work

  • Improvement of lexicon, integration of analyser with semantic database

  • Bigger training corpus

  • Use of statistical methods

  • Improvement of tag set

  • Deeper structure

  • Prototypes of practical applications


ad