1 / 5

Automatic Functor Assignment (AFA) in the Prague Dependency Treebank

Automatic Functor Assignment (AFA) in the Prague Dependency Treebank. PDT : a long term research project at the Institute of Formal and Applied Linguistics aimed at a complex annotation of a part of the Czech National Corpus annotation scheme - 3 levels: Functors:

oria
Download Presentation

Automatic Functor Assignment (AFA) in the Prague Dependency Treebank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Functor Assignment (AFA) in the Prague Dependency Treebank • PDT : • a long term research project • at the Institute of Formal and Applied Linguistics • aimed at a complex annotation of a part of the Czech National Corpus • annotation scheme - 3 levels: • Functors: • actants: ACT, PAT, ADDR, EFF, ORIG • free modifiers: TWHEN, LOC, DIR1, BEN, APP, CPR ... Raw text AFA‘s position within the PDT Morphologically tagged text Analytic tree structures (ATS) Tectogrammatical tree structures (TGTS) Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT

  2. Problem analysis, Data preprocessing • Motivation • to reduce the huge amount of human work involved in the development of the PDT • Problem statement • to assign a functor to every node in a TGTS • Initial situation • no AFA system with a reasonable cover existed • human annotators use mostly only their language knowledge, not “formal“ rules • annotators take into account the whole-sentence context • a certain amout of manually annotated TGTSs are available • What is the minimal amount of information that is sufficient to decide about the functor ? • Problem reformulation • AFA  toclassify symbolic vectors into 53 classes • Available material - 18 files (up to 50 sentences in each) • imperfect:incomplete, ambiguous • divided into two parts: • testing set - 15 files (6049 vectors) • training set - 3 files (1089 vectors) vectors with 12 symbolic attributes feature selection feature extraction + Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT

  3. Components of the proposed AFA system • Symbiosis of 4 different approaches: • 7 Rule-based Methods (RBMs) • 3 Dictionary-based Methods (DBMs) • Nearest vector (similarity) • Machine learning (Quinlan‘s C4.5, Sašo Džeroski) • Implementation: • a set of small programs for preprocessing and format conversions, dictionary mining, functor assigning, and performance evaluation • Linux filters, Perl, SQL • assigners are applied in a strictly pipelined fashion • Data Flow Diagram: Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT

  4. Performance evaluation • Detailed evaluation of several quantities for each assigner in a sequence • Several sequences of assigners were tested • e.g., a sequence of RBMs: • Comparison of different sequences of assigners: Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT

  5. Further work • Machine learning - searching for new regularities • Improvement of dictionaries • Tectogrammatical annotation of verb valency frames • Categorial grammars Talks & Publications language   fuzzy sets ZŽ: Fuzzy ontroller as a Tool for Traffic Simulation. Mendel 1999 ZŽ: Introduction to the PDT, Faculty of Arts, Ljubljana, 2000 ZŽ: Constrained Fuzzy Arithmetic: Engineer’s View. CMP Research Rep. ZŽ: AFA in the PDT, seminar at the IFAL, 2000 ZŽ: AFA in the PDT, TSD 2000 ZŽ: Comp. Problems of CFA,CMP seminar S. Džeroski, ZŽ: ML approach to AFA in the PDT, 5th TELRI seminar, 2000 M. Navara, ZŽ: Comp. Problems of CFA, ISCI 2000 ? S. Džeroski, ZŽ: ML approach to AFA in the PDT, ACL, 2001 M. Navara, ZŽ: How to make CFA efficient, Soft Computing 2001 Straňáková, Skoumalová, Panevová, ZŽ: Tectogram. annotation of verb. val. frames, TSD 2001 ? M. de Cock, ZŽ: Representing Ling. Hedges by L-Fuzzy Modifiers, CIMCA 2001 ? Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT

More Related