Gerd Fliedner Computational Linguistics Saarland University

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics Saarland University

Comments/Thoughts • Useful approach, as it can potentially speed up and support annotation and thus making new FrameNets. • Uses only few resources, therefore extendable to other language pairs (in principle). • First experiments ‘only’.

Multilingual FrameNets • Having FrameNet for as many languages as possible would be nice. • There are numerous monolingual and cross-lingual applications. • BUT: Building ‘a FrameNet’ is knowledge and labour intensive work, and thus expensive, funding may be a problem.

Bootstrapping Multilingual FNs LSA • (Re-) Use as much knowledge from existing FrameNets as possible. • Ease the task of annotators by making useful suggestions. • Use automatic methods for knowledge acquisition. Swamp of Language

More than one strand of hair may be needed… • By the way: Change_hair_configuration is not yet in FN.

FR.FrameNet • In FR.FrameNet, several methods have been explored that could reduce time and costs of building new FrameNets. • Tasks explored: • Lexical Unit (Frame Evoking Element) transfer • Identify Frame Elements • Disambiguating LU-Frame Assignment

Lexical Unit Transfer • Can be seen as the task of finding and disambiguating translation pairs (links to Machine Translation, lexicography). • Extract disambiguated translations from existing ‘cluster-based’ dictionary. • Some manual annotation required, but relatively fast and simple way of acquiring a solid core lexicon.

Manual Filtering • Is frame information currently used for disambiguation? • How is the manual annotation done? Sounds like rules of thumb. Guidelines? • How is it evaluated?

Resources needed • Lexical unit transfer • English FrameNet  • Large coverage bi-lingual dictionary (source►target language, optimally sense-disambiguated)  • Corpus in target language  • (Some) manual annotation  (Read:  OK, may be problem for ‘small’ languages, may be problem for small projects)

Lexical Unit Transfer: Other Possibilities • Using ‘human readable’ resources • Use existing dictionaries • Problem: Disambiguation • Using machine readable resources • Use Euro WordNet or similar • Problem again: Disambiguation • Use parallel corpora • Padó&Lapata, AAAI-05

Identify Frame Elements • Core idea: The same semantic restrictions/preferences should apply to Frame Elements in source and target language. • How can these semantic preferences be learned? • First step: Learn cross-lingual semantic similarity • Second step: Identify Frame Elements in one language and transfer.

Bilingual Infomap/Latent Semantic Analysis (LSA) • Originally used for crosslingual information retrieval. • Use bilingual, parallel ‘core’ corpus. • Parallel documents/paragraphs/… are put together and count as one text. • Build vector space. • Monolingual and cross-lingual similarities will ‘fall out’.

Identify and transfer Frame Elements • Use Berkeley FrameNet corpus as training corpus (English): Frame Elements (content words+POS) from annotated examples are used as starting point. • Use semantic space (generated by LSA) to find good (hopefully semantically related) translation candidates for words making up Frame Element. • To identify French Frame Element: Find ‘closest’ vector. • Several good examples, some less good ones.

Add Clustering • Inspection of data shows: Frame Elements may have semantically different fillers. • Thus, clustering of LSA vectors seems promising. • Identifying French Frame Elements: Instead of finding closest vector, check whether word vector belong to one of the clusters. • Problems: Identify optimal number of clusters, sparse data, …

Resources Needed • Frame Element identification/transfer • English FrameNet  • Parallel corpus source/target language  • Additional corpora in both languages  • Corpus in target language  • (Tagger in source/target language) • (Not so little) manual annotation  (Read:  OK, may be problem for ‘small’ languages, may be problem for small projects)

Use information from WordNet? • For French: • Use (Euro) WordNet alternatively/in addition: • Use Euro WordNet links (translations) • Use WordNet to expand ‘queries’ • Use similarity measures such as Jiang&Conrath 97. • For other languages that do not have WordNet: ???

Syntax • Certain Frame Elements are semantically totally heterogeneous, but syntactically (relatively) easy to identify • For example: Statement.Message (engl.: say that X, fr.: dire que X) • Problem: Semantic transfer can be learned using LSA, syntactic transfer (that≈que) cannot. • Could (partially) parsed parallel corpora be used to learn syntactic transfer? Can ‘syntactic’ and ‘semantic’ Frame Element identification be combined? Alternatively: Can ‘syntactic’ Frame Elements be recognised and left to annotators altogether?

Frame Element Preferences • Knowing more about Frame Elements (explicitly) would be very helpful. • Automatic Frame/Frame Element assignment. • Manual annotation/guidelines. • Transfer to other languages. • Encoding preferences as links within FrameNet • Encoding preferences as links with external resources (WordNet? SUMO/MILO?), cf. work by Aljoscha Burchardt • Cf. yesterday’s talk by Michael Ellsworth

Conclusions • (Some) more research required. • Optimising the annotation process probably very important, e.g.: • Use several cycles (start with ‘more certain’ cases, re-train with the additional data, …) • Integrate different strategies, e.g. ‘syntax’ and ‘semantics’. • Which decisions can be made automatically? Can suggestions be made? How good are they? Recall vs. precision optimisations

Gerd Fliedner Computational Linguistics Saarland University

Gerd Fliedner Computational Linguistics Saarland University

Presentation Transcript

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics

Computational Cognitive Linguistics

Computational Linguistics

Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department o

Introduction to Computational Linguistics

Computational linguistics

Introduction to Computational Linguistics

Computational linguistics

Computational Linguistics Introduction

Computational linguistics

Introduction to Computational Linguistics

Computational Linguistics Introduction

Computational Linguistics @ UIUC

Computational Linguistics

Computational Linguistics

Computational Linguistics Introduction

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics