European Language
Download
1 / 36

European Language Resources Association (ELRA) - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

European Language Resources Association (ELRA). HLT Evaluations. Khalid CHOUKRI ELRA/ELDA 55 Rue Brillat-Savarin, F-75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: [email protected] http://www.elda.org/ or http://www.elra.info/.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' European Language Resources Association (ELRA)' - quanda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

European Language Resources Association (ELRA)

HLT Evaluations

Khalid CHOUKRI

ELRA/ELDA

55 Rue Brillat-Savarin, F-75013 Paris, France

Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30

Email: [email protected]

http://www.elda.org/ or http://www.elra.info/


Presentation outline

European language Resources Association

Evaluation to drive research progress

Human Language Technologies Evaluation(s)

What, why, for whom, how ….

(Some figures from TC-STAR)

Examples of Evaluation campaigns

Demo …(available afterwards)

Presentation Outline


European language resource association an improved infrastructure for data sharing hlt evaluation
European Language Resource Association An Improved infrastructure for Data sharing & HLT evaluation

  • An Association of users of Language Resources

  • Infrastructure for the evaluation of Human Language Technologies providing resources, tools, methodologies, logistics,


The association
The Association

  • Membership Drive:

    • ELRA is Open to European & Non-European Institutions

    • Resources are available to Members & Non-Members

  • Pay per Resource

    • Some of the benefits of becoming a member:

    • Substantial discounts on LR prices (over 70%),

    • Substantial discountson LREC registration fees

    • Legal and contractual assistance with respect to LR matters

    • Access to Validation and production manuals (Quality assessment)

    • Figures and facts about the Market (results of ELRA surveys)

    • Newsletter and other publications

    • ……………. New: Fidelity program … earn miles and get more benefits


    2005- Extension of ELRA’sofficial mission to promote LRs and evaluation for the Human Language Technology (HLT):

    The mission of the Association is to promote language resources (henceforth LRs) and evaluation for the Human Language Technology (HLT) sector in all their forms and all their uses;

    ELRA: An efficient infrastructure to serve the HLT CommunityStrategies for the next Decade … New ELRA status:



    What to evaluate … Levels of Evaluation

    Usage

    Evaluation

    Meeting points with technology development

    Quantitative

    Evaluation

    Basic

    Research

    Technology

    Development

    Application

    Development

    Technologies

    necessitated

    for applications

    Bottleneck

    Identification

    Research results

    in quantitative

    evaluation

    Technologies

    which have been

    validated

    for applications.

    Long term / high risk

    Large return of investment

    Usability

    Acceptability

    Evolutionary


    What to evaluate levels of evaluation
    What to evaluate … Levels of Evaluation

    • Basic Research Evaluation (validate research direction)

    • Technology Evaluation (assessment of solution for well defined problem)

    • Usage Evaluation (end-users in the field)

    • Impact Evaluation (socio-economic consequences)

    • Programme Evaluation (funding agencies)

    Our concern


    Why evaluate

    Validate research hypotheses

    Assess progress

    Choose between research alternatives

    Identify promising technologies (market)

    Benchmarking … state of the art

    Share knowledge … dedicated workshops

    Feedback … Funding agencies

    Share Costs ???

    Why Evaluate?


    Progress evaluation courtesy charles wayn
    Progress & Evaluation (Courtesy Charles Wayn)


    Technology performance applications

    Bad technology may be used to design useful applications

    What about good technology ? ….

    Software industry

    Technology performance & Applications


    Hlt evaluations for whom

    MT developers want to improve the “quality” of MT output

    MT users (humans or software e.g. CLIR ) want to improve productivity using the most suitable MT system (e.g. multilinguality)

    ….

    HLT Evaluations …. For whom

    • Basic Research Evaluation (validate research direction)

    • Technology Evaluation (assessment of solution for well defined problem)

    • Usage Evaluation (end-users in the field)

    • Impact Evaluation (socio-economic consequences)

    • Programme Evaluation (funding agencies)


    For whom … essential for technology development

    • Share of Information and knowledge between participants: (how to get the best results, access to data, scoring tools)

    • Information obtained by industrialists: state of the art, technology choice, market strategy, new products.

    • Information obtained by funding agencies: technology performance, progress/investment, priorities


    Some types of evaluations

    • Comparative evaluation

      • the same or similar control tasks and related data with metrics that are agreed upon

  • Competitive vs Cooperative

  • Black box evaluation … Glass box

  • Objective evaluation … Subjective (Human-based)

  • Corpus based (test suites)

  • Quantitative measures … Qualitative


  • Comparative Evaluation of Technology

    • Used successfully in the USA by DARPA and NIST (since 1984)

    • Similar efforts in Europe on a smaller scale, mainly projects (EU funded or national programs)

      • Select a common "task"

      • Attract enough Participants

      • Organize the campaign (protocol/metrics/data)

      • Follow-up workshop, interpret results and share info


    Requirements for an evaluation campaign

    Referencial Language Resources (Data) (truth)

    Metric(s): Automatic, Human judgments … scoring software

    scale/range of performance to compare with (Baseline)

    Logistics’ Management

    reliability assessment: independent body

    Participants: technology providers

    Requirements for an evaluation campaign


    Hlt evaluation portal pointers to projects

    Overview

    HLT Evaluations

    Activities by technology

    Activities by geographical region

    Players

    Evaluation resources

    Evaluation Services

    HLT Evaluation Portal… Pointers to projects

    http://www.hlt-evaluation.org/

    Let us list some well known campaigns


    Examples of Evaluation Campaigns – Capitalization

    • Speech & Audio/sound

      • ASR: TC-STAR, CHIL, ESTER

      • TTS: TC-STAR, EVASY

      • Speaker identification (CHIL)

      • Speech 2 Speech Translation

      • Speech Understanding (Media)

      • Acoustic Person tracking

      • Speech activity detection, …..

      • ………


    Examples of Evaluation Campaigns – Capitalization

    • Multimodal --- Video – Vision technologies

      • Face Detection

      • Visual Person Tracking

      • Visual Speaker Identification

      • Head Pose Estimation

      • Hand Tracking


    Some of the technologies being evaluated within chil http chil server de

    A) Vision technologies

    A.1) Face Detection

    A.2) Visual Person Tracking

    A.3) Visual Speaker Identification

    A.4) Head Pose Estimation

    A.5) Hand Tracking

    B) Sound and Speech technologies

    B.1) Close-Talking Automatic Speech Recognition

    B.2) Far-Field Automatic Speech Recognition

    B.3) Acoustic Person Tracking

    B.4) Acoustic Speaker Identification

    B.5) Speech Activity Detection

    B.6) Acoustic Scene Analysis

    C) Contents Processing technologies

    C.1) Automatic Summarisation … Question Answering

    Some of the technologies being evaluated within CHIL …http://chil.server.de/

    more at the CHIL/CLEAR workshops


    Examples of Evaluation Campaigns – Capitalization

    • Written NLP & Content

      • IR, CLIR , QA, (Amaryllis, EQUER, CLEF)

      • Text analysers (Grace, EASY)

      • MT (CESTA, TC-STAR)

      • Corpus alignement & processing (Arcade, Arcade-2, Romanseval/Senseval, …)

      • Term & Terminology extraction

      • Summarisation


    Evaluation projects the french scene some projects in nl italy

    Technolangue/Evalda: the Evalda platform consists of 8 evaluation campaigns with a focus on the spoken and written language technologies for the French language:

    ARCADE II: evaluation of bilingual corpora alignment systems.

    CESART: evaluation of terminology extraction systems.

    CESTA: evaluation of machine translation systems (Ar, Eng => Fr).

    EASY: evaluation of parsers.

    ESTER: evaluation of broadcast news automatic transcribing systems.

    EQUER: evaluation of question answering systems.

    EVASY: evaluation of speech synthesis systems.

    MEDIA: evaluation of in and out-of context dialog systems.

    Evaluation Projects …. The French sceneSome projects in NL, Italy, ...


    Some details from relevant projects

    CLEF

    TC-STAR

    Some details from relevant projects


    Example of Evaluation Initiatives

    • CLEF (Cross-Language Evaluation Forum)

    • Promoting research and development in Cross-Language Information Retrieval (CLIR)

    • (i) providing an infrastructure for the testing and evaluation of information retrieval systems - European languages - monolingual and cross-language contexts

    • (ii) creating test packages of reusable data which can be employed by system developers for benchmarking purposes.



    Back to evaluation tasks within tc star http www tc star org

    2 categories of transcribing and translatingtasks

    European Parliament Plenary Sessions: (EPPS): English (En) and Spanish (Es),

    Broadcast News (Voice of America VoA): Mandarin Chinese (Zh) and English (En)

    Back to Evaluation Tasks within TC-STAR (http://www.tc-star.org/)

    • TC-STAR: Speech to speech translation

    • Packages with Speech recognition, speech translation, and speech synthesis

    • Development and Test data, metrics & results.


    Tc star evaluations 3 consecutives annual evaluations
    TC-STAR evaluations……. 3 Consecutives annual evaluations

    • SLT in the following directions

      • Chinese-to-English (Broadcast News)

      • Spanish-to-English (European Parliament plenary speeches)

      • English-to-Spanish (European Parliament plenary speeches)

    • ASR in the following languages

      • English (European Parliament plenary speeches)

      • Spanish (European Parliament plenary speeches)

      • Mandarin Chinese (Broadcast News)

  • TTS in Chinese, English, and Spanish under the following conditions:

    • Complete system

    • Voice conversion intralingual and crosslingual, expressive speech:

    • Component evaluation


  • Improvement of slt performances en es

    Input = Text ,

    Verbatim

    Speech recognition

    Improvement of SLT Performances (EnEs)


    Improvement of slt performances es en
    Improvement of SLT Performances (EsEn)

    Input = Text ,

    Verbatim

    Speech recognition



    Human evaluation translations enes adequacy 1 5
    Human Evaluation Translations … EnEs adequacy (1-5)

    Commercial

    Combinations


    End to end

    The end-to-end evaluation is carried out for 1 translation direction: English-to-Spanish

    Evaluation of ASR (Rover) + SLT (Rover) +TTS (UPC) system

    Same segments as for SLT human evaluation

    Evaluation tasks:

    Adequacy: comprehension test

    Fluency: judgement test with several questions related to fluency and also usability of the system

    End-to-End


    Fluency questionnaire

    [Understanding] Do you think that you have understood the message?

    1: Not at all , ...........5: Yes, absolutely

    [Fluent Speech] Is the speech in good Spanish?

    1: No, it is very bad ...... 5: Yes, it is perfect

    [Effort] Rate the listening effort

    1: Very high ............ 5: Low, as natural speech

    [Overall Quality] Rate the overall quality of this audio sample

    1: Very badm unusable ...... 5: It is very useful

    Fluency questionnaire


    End to end results subjective test 1 5
    End to End results message?(subjective test: 1…5 )


    Tc star tasks

    More results from the 2007 Campaign message?

    http://www.tc-star.org/

    Evaluation packages available 

    TC-STAR Tasks


    Some concluding remarks on technology evaluation

    It saves developers time and money message?

    It help assess progress accurately

    It produces reusable evaluation packages

    It helps to identify areas where more R&D is needed

    Some concluding remarks on Technology evaluation


    ad