1 / 8

Putting development and evaluation of core technology first

Putting development and evaluation of core technology first. Anja Belz Natural Language Technology Group University of Brighton, UK. N L T G. Overview. NLG needs comparative evaluation Core technology first, applications second Towards common subtasks, corpora and evaluation techniques

elmer
Download Presentation

Putting development and evaluation of core technology first

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting development and evaluation of core technology first Anja Belz Natural Language Technology Group University of Brighton, UK N L T G

  2. Overview • NLG needs comparative evaluation • Core technology first, applications second • Towards common subtasks, corpora and evaluation techniques • What kind of STEC event for NLG? Belz: Putting development and evaluation of core technology first

  3. NLG needs comparative evaluation • NLG has strong evaluation traditions • But there has been no comparative evaluation, except handful of results, e.g.: • regenerating the Wall Street Journal Corpus • SumTime wind forecast generation • At present, we don’t really know which NLG techniques generally work better • For consolidation of results and collective progress, need ability to comparatively evaluate Belz: Putting development and evaluation of core technology first

  4. Core technology first, applications second • Biggest challenge: identifying sharable tasks • Shared application—potentially divisive: • NLG is a varied field with many applications • hard to select one with enough agreement • evaluation results would be application-specific • Instead—choose tasks that can unify NLG: • tasks that are relevant to all NLG • core technology that is potentially useful to all NLG • utilise commonalities and agreement that have already emerged: GRE, lexicalisation, content ordering Belz: Putting development and evaluation of core technology first

  5. Towards common subtasks, corpora and evaluation techniques • Standardising subtasks and input/output requirements • Building data resources for building and evaluating systems • Creating NLG-specific evaluation techniques • ISO quality characteristics: functionality, reliability, usability, efficiency, maintainability, portability • Need to focus on evaluation of quality of outputs: • (New) GENEVAL: test existing and new evaluation techniques • that assess different evaluation criteria • and have a range of associated cost/time requirements Belz: Putting development and evaluation of core technology first

  6. What kind of STEC? • Don’t have an NLG STEC at application level (yet) • Don’t invest millions (yet) • Don’t have a large organisation run it (yet) • Because: • NLG technology isn’t ready • participation would involve large investment in terms of money and time • not many groups would be able to do that • would have to decide on an application – potentially divisive Belz: Putting development and evaluation of core technology first

  7. What kind of STEC? • Do encourage many different shared tasks and subtasks (at least, initially) • Involve many NLG researchers in organising STECs • Involve SIGGEN, have steering committee • Because: • diversity in tasks reflects diversity of field (NLG just isn’t one thing) • it’s inclusive and representative • control stays with international academic community Belz: Putting development and evaluation of core technology first

  8. Stakeholder STECs • Similar to SemEval 2007 (Senseval 4) • As opposed to shareholder STECs like DUC and MT-Eval • Annual STEC event attached to INLG and ENLG • Call for task proposals • Proposers organise and run their own STEC tasks • Ready test bed for new tasks: popular tasks grow, less popular ones disappear Belz: Putting development and evaluation of core technology first

More Related