European
Download
1 / 13

Wolfgang Täger - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

European Patent Office. European Machine Translation Programme. Wolfgang Täger. December 2006. Programme Partners and Goals. Trigger: Success of JP-EN patent translation Agreement EPO - Member States MT of patents/ abstracts/ communications to/from English Three language pairs per year

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Wolfgang Täger' - igor-guerra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

European

Patent Office

European Machine Translation Programme

Wolfgang Täger

December 2006


Programme partners and goals
Programme Partners and Goals

  • Trigger: Success of JP-EN patent translation

  • Agreement EPO - Member States

    • MT of patents/ abstracts/ communications to/from English

    • Three language pairs per year

    • First three languages: FR - DE - ES

  • Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek


  • Mt engine
    MT engine

    • Trial with SMT system (Language Weaver)

    • Call for tender: Winner Worldlingo (Systran)

    • Going public ([email protected]): December 2006

    • Needed: Improve translation by specific dictionaries


    Dictionary format
    Dictionary format

    • Desiderata

    • open standard

    • XML-Unicode

    • support features of MT engines

    • support conditional translations (e.g. based on IPC)

    • Is not intended for terminology (no definitions, lexical focus and no semantic focus).

    • OLIF format was chosen

      How to get dictionaries ? By bilingual term extraction !


    Available corpora
    Available corpora

    • 560.000 EP-B publications => claims in EN,DE,FR

    • 300.000 DE-T2 publications

    • 37.000 ES-B3/T3 publications

    • => Align corpora for term extraction, concordancing, translation memory (and SMT)

    ES B3/T3 (LaTex)

    DE-T2

    EP-B1

    DESC ES

    DESC DE

    DESC EN OR FR OR DE

    CL ES

    (CL DE)

    CL EN

    CL FR

    CL DE


    Available corpora1
    Available corpora

    • 560.000 EP-B publications => claims in EN,DE,FR

    • 300.000 DE-T2 publications

    • 37.000 ES-B3/T3 publications

    • => Align corpora for term extraction, concordancing, translation memory (and SMT)

    ES B3/T3 (LaTex)

    DE-T2

    EP-B1

    DESC ES

    DESC DE

    DESC EN OR FR OR DE

    CL ES

    (CL DE)

    CL EN

    CL FR

    CL DE


    Alignment extraction
    Alignment & Extraction

    • Alignment: Trial at EPO with internally developed SW

    • Result was not improved by external companies during call for tender.


    Alignment extraction1
    Alignment & Extraction

    • Call for tender for bilingual term extraction

    • Winner: DFKI

    • Alignment of corpora, POS tagging, Identification of terms

    • Pairing of terms using clues like co-occurrence score, string similarity, grammatical clues, position, available dictionaries, ...

    • Providing further information like gender, inflection, transitivity, countable, ...


    Validation concordancing
    Validation & Concordancing

    • Development of OLIF editor at EPO

    • Remove noise

    • Correct entries

    • Use concordancer (provides statistics based on parallel corpora)

    • => DEMO


    Olif format
    OLIF format

    • Support of more languages

    • Clarification of inflection scheme

    • Clarification of term vs lex approach

    • Tools


    Relational database
    Relational database ??

    Transl

    SemRel

    Concept

    Term

    Naming

    SurfForm

    InflForm

    Lemma

    RegEx

    LexType

    Infl


    Relational database1
    Relational database ??

    Transl

    SemRel

    „hot drink ...“

    grüner Tee

    Naming

    grüner

    Nom. Sg. str. f. pos.

    grün

    -er

    DE, Adj

    iLike „klein“


    End

    • Thank you!


    ad