1 / 20

Information Query Formulation in a Slavonic Language and its Automatic Processing

Information Query Formulation in a Slavonic Language and its Automatic Processing. Experience from Polish and Czech in comparison to Western European Languages Petr Strossa University of Economics, Prague Department of Information & Knowledge Engineering. General Issue.

rae-deleon
Download Presentation

Information Query Formulation in a Slavonic Language and its Automatic Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Query Formulation in a Slavonic Language and its Automatic Processing Experience from Polish and Czech in comparison to Western European Languages Petr Strossa University of Economics, Prague Department of Information & Knowledge Engineering

  2. General Issue 86 Question/Answer Types and the basic idea of their recognition in texts [D. Laurent et al., SYNAPSE, Toulouse] TEL-ME-MOR/M-CAST Seminar, 2006

  3. Technology Priberam’s lexicon data structure SintaGest software tool [Priberam Informática, Lisbon] TEL-ME-MOR/M-CAST Seminar, 2006

  4. Question-Answer Pattern(Example) Question(WEIGHT) : Root("jaký")? Dist(0,5) WeightNoun = 20// Jaká je hmotnost Země? : Wrd(jak) WeightAdj = 20// Jak těžký může být slon? : Wrd(kolik) WeightUnit = 20// Kolik kg má dospělý kapr? : Wrd(kolik) Root("vážit") = 20// Kolik váží kapr? Answer : WeightNoun Definition With Pivot Dist(0,5) {Number6 WeightUnit} = 20 // Váha kapra může dosáhnout až 5 kg. : Pivot Dist(0, 5) Cat(V) Dist(0,5) {Number6 WeightUnit} = 20 // Roční kapr může dosáhnout 5 kg tělesné váhy. ; Answer(WEIGHT) : Number6 WeightUnit = 20 ; TEL-ME-MOR/M-CAST Seminar, 2006

  5. Definitions of Constants Used in the Previous Example Const WeightNoun = AnyRoot(hmotnost, hmota, "tíha", "váha", "zatížení"); Const WeightAdj = AnyRoot("těžký", "lehký"); Const WeightUnit1 = AnyRoot(mikrogram, miligram, centigram, decigram, gram, dekagram, hektogram, kilogram, kilo, cent, megagram, miligram, tuna, "karát", pond, kilopond, megapond, libra); Const WeightUnit2 = AnyWrd(mg, cg, dg, g, dag, deka, Dg, dkg, hg, kg, q, Mg, t, p, kp, Mp, lb, "lb.", lbs, "lbs.", cwt, "cwt."); Const WeightUnit = AnyConst(WeightUnit1, WeightUnit2); TEL-ME-MOR/M-CAST Seminar, 2006

  6. General Observation • The conception and the tools designed to process Western European languages can be adapted to process Slavonic languages, as Polish and Czech. • Some basic differences between the language families must be kept in mind during such an adaptation! TEL-ME-MOR/M-CAST Seminar, 2006

  7. The Abundance of Morphology • Nouns: 4 (!) genders, 2 numbers, 7 cases • Adjectives: e.g. světlý (bright) • 3 degrees: světlý↔ světlejší, nejsvětlejší • 4 genders: světlý↔ světlá, světlé • 2 numbers: světlý↔ světlí • 7 cases: světlý↔ světlého, světlému, ... TEL-ME-MOR/M-CAST Seminar, 2006

  8. The Abundance of Morphology (2) • Adjectives Continued: • Theoretically every adjective may have 3*4*2*7 = 168 forms altogether! • Practically some of them are regularly (without exceptions) equal... • A general scheme for a morphology pattern description cannot work with less than 57 forms(= 3 degrees * 19 possibly differing gender/number/case endings). TEL-ME-MOR/M-CAST Seminar, 2006

  9. The Abundance of Morphology (3):Illustration – the 19 Ending System TEL-ME-MOR/M-CAST Seminar, 2006

  10. The Abundance of Morphology (4) • Adjectives Continued: • In fact, not all of them may have all the forms. • Some adjectives cannot undergo gradation for purely morphological reasons: domácí (home, home-made) • Other adjectives usually do not undergo gradation for semantic reasons: jednofázový (one-phase) TEL-ME-MOR/M-CAST Seminar, 2006

  11. Morphological Pattern (Ex. 1) TEL-ME-MOR/M-CAST Seminar, 2006

  12. Morphological Pattern (Ex. 2) TEL-ME-MOR/M-CAST Seminar, 2006

  13. Morphology of Nouns: Some Statistics TEL-ME-MOR/M-CAST Seminar, 2006

  14. Morphology of Nouns: Some Statistics (2) • We need about 300 noun patterns altogether. • We have about 90 noun patterns that describe the declension of at least 10 different nouns. • We have about 80 noun patterns that describe only 1 noun each. • About one half of the noun patterns describe the declension of 1–3 nouns each. TEL-ME-MOR/M-CAST Seminar, 2006

  15. Inherent Homonymy of Forms • A typical situation for our type of morphology:světlé(bright) • nominative/accusative/vocative singular neuter • genitive/dative/locative singular feminine • nom./acc./voc. plural fem. • acc. pl. masculine animate • nom./acc./voc. pl. masculine inanimate • i.e. 13 possible grammatical interpretations altogether! TEL-ME-MOR/M-CAST Seminar, 2006

  16. Inherent Homonymy of Forms (2) • Only a little bit less typical situation: Ženu holí stroj. • I am setting a machine in motion with a stick. • OR: I am setting a machine of sticks in motion. (*) • The woman is shaved by a machine. • Dress the woman with a stick. • OR: Dress the woman of sticks. (*) TEL-ME-MOR/M-CAST Seminar, 2006

  17. Inherent Homonymy of Forms (3) • All the previous once again – in a question:Jaký je plat Petra Hanka? • What is the salary of XY? • X {Petr, Peter, Petar} • Y {Hank, Hanek, Hanke, Hanko} • The only thing we know for sure:X ≠ Petra (though such name exists);Y ≠ Hanka (though such name exists)! TEL-ME-MOR/M-CAST Seminar, 2006

  18. Inherent Homonymy of Forms (4) Jaký je plat Petra Hanka? • What is the salary of XY? • The only thing we know for sure:X ≠ Petra (though such name exists);Y ≠ Hanka (though such name exists)! : Jaký plat Hanka dává svým zaměstnancům? • What salary does Hanka give to her/his employees? TEL-ME-MOR/M-CAST Seminar, 2006

  19. Inherent Homonymy of Forms (Conclusion) • Due to our free word order, it is generally quite problematic to try any limited context disambiguation. • A really safe disambiguation is possible only after a complete syntactic analysis of a sentence (which should keep all the possible meanings of all the words up to the end). • (But we do not make complete syntactic analysis of sentences in M-CAST.) TEL-ME-MOR/M-CAST Seminar, 2006

  20. Free Word Order Again • How far is it to Brno? • Jak daleko je do Brna? (+++) • Jak je daleko do Brna? (+++) • Jak je do Brna daleko? (++) • Do Brna je jak daleko? (++) • Do Brna jak je daleko? (+) • Do Brna je daleko jak? (+) • Daleko je do Brna jak? (+) TEL-ME-MOR/M-CAST Seminar, 2006

More Related