1 / 19

Morphological Analysis of Hungarian in NooJ

Morphological Analysis of Hungarian in NooJ. Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics. Summary. Hungarian morphology Linguistic resources Some experiments with INTEX/NooJ The solution Examples Derivation. Hungarian morphology.

kiaria
Download Presentation

Morphological Analysis of Hungarian in NooJ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

  2. Summary • Hungarian morphology • Linguistic resources • Some experiments with INTEX/NooJ • The solution • Examples • Derivation

  3. Hungarian morphology • Agglutinative (and sometimes inflectional) • The suffixes • Can have many forms (vowel harmony) • Can change the form of the stem (there are groups of variants) • bokor (sg.)  bokr – ok (pl.); alma (sg.)  almá – k (pl.) • Sometimes begin with a linking vowel • plural: -k / -ak / -ek / -ok / -ök • A noun (adj., num.)can have ~ 7-800 forms • A verb can have ~ 80 forms • Orthography: there are difficulties, when digraphs are doubled • cs  cscs ccs, gy  gygy ggy

  4. Nominal inflections • 18 cases (nominative, accusative, dative + grammatical relations which are expressed by prepositions in French/English) • Expression of the possessives by suffixes • Which mark the number, the person, the number of the possessed • ház-a-m, ház-a-d, ház-a (my/your/his house) • ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his houses) • Anaphorical possessive • A ház Péteré The house is Péter’s; A házak Péteréi The houses are Péter’s • The maximal number of inflections can be five • barát-ai-tok-é-i-t • (I can see) those (things) of your friends’

  5. Verbal inflections • Two tenses: present, past • three modes: indicative, conditional, imperative • definite and indefinite conjugations • Néz-ek egy asztalt  Néz-em az asztalt • I watch a table  I watch the table • one special form where the subject is in 1st person and the object is in the 2nd: • néz-lek (I watch you) • infinitive and „conjugated infinitive” (sometimes subjunctive in French)

  6. The resources • Dictionary of Hungarian inflections (Elekfi,’92) • A traditional description, profound and exhaustive • Two dimensional classification: • Vowel harmony (3 classes) and • complex features of the stems (stem-types, linking vowel, etc., 55 classes) • Altogether: 1700 different sub-classes (paradigms) • systematic differences and similarities are hidden • not convenient to use in finite-state transducers • We have converted it into a database, where we can retrieve all the forms from

  7. The experiments with INTEX/NooJ • ‘Brute-force’ method • We created one graph per sub-class for testing INTEX • 1700 sub-graphs • 45000 paths in the graphs… • Using only dictionaries (.nod) • Dictionary of stems (70000 words) • ház,ház,N+C2A+stem=1+NW • Dictionary of suffixes (one million entries) • (*)ak,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PL} • (*)am,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1} • (*)at,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=ACC} • (*)at,<$1=N+C2A1+stem=1>{$0,$1L,N$1S+ana=ACC} • (*)amat,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1+ACC} • dictionary of lexical forms (which have a zero morpheme as suffix) • ház,ház,N+ana=NOM

  8. The linguistic solution • transform the database into a grammar based on morpho-phonological features • The grammatical features of stems and morphemes are in the dictionary • The features of the stems and the suffixes can be unified • Grammar • We have to describe the order of the morphemes • Introduce features which select from the allomorphs

  9. The order of morphemes for nominals

  10. The order of morphemes for nominals barát-a-i-tok-é-i-t barát,N +PS +PL +ps_2 +ps_pl +ANAP+i +ACC

  11. Morpho-phonological features To introduce features we examine the allomorphs • HÁZ HAJÓ • HÁZ - AHAJÓ-JA • ház,,N+nonj hajó,,N+j • HÁZ - AT HAJÓ - T • ház,,N+nonj+acclink hajó,,N+j+accnolink

  12. The dictionary

  13. The plural and the accusativekalap - ot(hat, SG+ACC)kalap - ok - at(hats, PL+ACC)

  14. Derivation • Can change or leave the category (POS) • Introduce new features • kosár kosar - ak(pl.)basket • kosar-as kosar - as - ok(pl.)basketball player • Simple cases are handled by graphs • Others are listed as lemmas in the dictionary

  15. Assimilation and digraphs • some suffixes (eg. val/vel) enforce total assimilation: • LÉC + VEL  LÉCCEL • PÉCS + VEL  PÉCCSEL • PLÉD + VEL  PLÉDDEL

  16. Conclusion • We have adapted the traditional description • We have described the inflectional morphology of Hungarian in NooJ grammars/dictionaries • Handled some of the derivational morphology • Objectives • Find a simpler method for derivation • Disambiguation • Automatic methods to expand the dictionary • Automatic delegation of features

  17. Thank you

More Related