ZRINKA DUJMOVIĆ
Download
1 / 10

ZRINKA DUJMOVI? University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 S - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

ZRINKA DUJMOVIĆ University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 September 2005. STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN. What?. Constitution of the Republic of Italy

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ZRINKA DUJMOVI? University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 S' - lajos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

ZRINKA DUJMOVIĆUniversity of Zagreb/ETFJRC Workshop:Exploiting parallel corpora in up to 20 LanguagesArona, 25-27 September 2005

STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN


Slide2 l.jpg
What?

  • Constitution of the Republic of Italy

    (original in Italian + translation in Croatian) – 139 art. + transitory provisions); effective since 1948.

  • Federal Constitution of the Swiss Confederation

    (original in Italian + translation in Croatian<It/Germ/Eng.) – 196 art. (+tr. provisions); in force since 2000.


Slide3 l.jpg
Why?

  • objective:

    test terminological consistency between SL & TL

  • prerequisites:

    - parallel corpora as rich resources of translation equivalents

    - small corpora


Slide4 l.jpg
How?

Data processing:

  • Conversion into the HTML format

  • Sentence alignment

  • Lemmatisation (inflectionally rich language!!)

  • Corpus annotation (POS tagging)

  • Word alignment

  • Word frequency lists


Testing terminological consistency of translation l.jpg
Testing terminological consistency of translation

1. HYPOTHESIS

1Italiannoun lemma

= 1translation equivalent in Croatian

Constitution

2. STATISTICAL TESTING

  • the minimum least square method

  • Y = a + bX

  • Correlation coefficient (R)


Slide6 l.jpg
Correlation of the most frequent Italian and Croatian noun lemmas in the Federal Constitution of the Swiss Confederation(51)

a = 0,0090.039

b = 0.999 0,030

R = 0,978


Slide7 l.jpg
Correlation of the most frequent Italian and Croatian noun lemmas in the Constitution of the Republic of Italy (31)

a = 0,075 0.07305

b = 0,9380.03970

R = 0,975


Deviation from linearity l.jpg
Deviation lemmas in the Constitution of the Republic of Italy from linearity

  • (a) Accidental (translators’ mistakes)

  • (b) Justified (still not expected!)

    • - stillistic differencies

      e.g.use of relative pronun instead of a noun (1:0)

      - polysemy (1:2)

      e. g. It. titolo11 x

      = Cr.naslov6 x (eng. title)

      = Cr. vrijednosni papiri1 x ( eng. Securities)

      - as idiom: 1) a titolo transitorio = privremeno / eng. temporarily;

      2) a titolo oneroso = za plaću /eng. against payment


Italian noun lemmas present in italian and swiss constitutions candidates for glossary l.jpg
Italian noun lemmas present in lemmas in the Constitution of the Republic of ItalyItalian and Swiss constitutions = candidates for glossary


Conclusions l.jpg
Conclusions lemmas in the Constitution of the Republic of Italy

  • the minimum least square method appeared to be adequate for verification of translation

  • the verification does not have to be carried out on the entire sample, but only on the lemmas with the highest frequency covering at least one order of magnitude

  • the best candidates for glossary are those lemmas which are repeated with the high frequency in both constitutions