1 / 10

ZRINKA DUJMOVIĆ University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 S

ZRINKA DUJMOVIĆ University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 September 2005. STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN. What?. Constitution of the Republic of Italy

lajos
Download Presentation

ZRINKA DUJMOVIĆ University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 S

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ZRINKA DUJMOVIĆUniversity of Zagreb/ETFJRC Workshop:Exploiting parallel corpora in up to 20 LanguagesArona, 25-27 September 2005 STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN

  2. What? • Constitution of the Republic of Italy (original in Italian + translation in Croatian) – 139 art. + transitory provisions); effective since 1948. • Federal Constitution of the Swiss Confederation (original in Italian + translation in Croatian<It/Germ/Eng.) – 196 art. (+tr. provisions); in force since 2000.

  3. Why? • objective: test terminological consistency between SL & TL • prerequisites: - parallel corpora as rich resources of translation equivalents - small corpora

  4. How? Data processing: • Conversion into the HTML format • Sentence alignment • Lemmatisation (inflectionally rich language!!) • Corpus annotation (POS tagging) • Word alignment • Word frequency lists

  5. Testing terminological consistency of translation 1. HYPOTHESIS 1Italiannoun lemma = 1translation equivalent in Croatian  Constitution 2. STATISTICAL TESTING • the minimum least square method • Y = a + bX • Correlation coefficient (R)

  6. Correlation of the most frequent Italian and Croatian noun lemmas in the Federal Constitution of the Swiss Confederation(51) a = 0,0090.039 b = 0.999 0,030 R = 0,978

  7. Correlation of the most frequent Italian and Croatian noun lemmas in the Constitution of the Republic of Italy (31) a = 0,075 0.07305 b = 0,9380.03970 R = 0,975

  8. Deviation from linearity • (a) Accidental (translators’ mistakes) • (b) Justified (still not expected!) • - stillistic differencies e.g.use of relative pronun instead of a noun (1:0) - polysemy (1:2) e. g. It. titolo11 x = Cr.naslov6 x (eng. title) = Cr. vrijednosni papiri1 x ( eng. Securities) - as idiom: 1) a titolo transitorio = privremeno / eng. temporarily; 2) a titolo oneroso = za plaću /eng. against payment

  9. Italian noun lemmas present in Italian and Swiss constitutions = candidates for glossary

  10. Conclusions • the minimum least square method appeared to be adequate for verification of translation • the verification does not have to be carried out on the entire sample, but only on the lemmas with the highest frequency covering at least one order of magnitude • the best candidates for glossary are those lemmas which are repeated with the high frequency in both constitutions

More Related