hypermedia lexica and lexicon metadata l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hypermedia Lexica and Lexicon Metadata PowerPoint Presentation
Download Presentation
Hypermedia Lexica and Lexicon Metadata

Loading in 2 Seconds...

play fullscreen
1 / 34

Hypermedia Lexica and Lexicon Metadata - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Hypermedia Lexica and Lexicon Metadata. The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002. Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hypermedia Lexica and Lexicon Metadata' - jake


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hypermedia lexica and lexicon metadata
Hypermedia Lexica andLexicon Metadata

The MetaLex model in the ModeLex project

Dafydd GibbonU BielefeldEurope

E-MELD Workshop, Detroit, August 2002

overview
Overview

Metalex goalsBackground: DATR, Hyprlex, Speech, Language DocumentationMetalex design: theory and practiceLexical documents & metadocuments Lexical objects, properties, structuresMetalex implementationIvory Coast encyclopaedia project Ega documentation model project The Modelex (multimodal lexicon) project Ivory Coast + Nigeria documentation curriculum projectExtending metalexModalities & submodalities Data-driven lexicography Data structures & algorithms: trees, lattices; induction, inference

metalex goals background
Metalex goals: background
  • General objectives:
      • Versatile high quality spoken language lexicography
      • Motivated balance of high-tech + low tech
      • Good resources are data-driven and theory-informed
  • Specific project objectives:
      • DATR/ILEX: formal lexicon theory and implementation
      • VerbMobil: integrated HyprLex dissemination model
      • HyprLex encyclopaedia model for Ivory Coast Languages
      • Ega endangered language documentation model
      • Modelex - theory and design of multimodal lexica
      • Ivory Coast and Nigeria curricula for language documentation
metalex design data and theory
Metalex design: data and theory
  • Data-driven data + metadata acqusition:
    • Systematic metatext derived from and supporting ...
      • Computational fieldwork
      • Induction of lexica
  • Theory-informed data + metadata acquisition:
    • Integrated Lexicon (ILEX) consisting of ...
      • Abstract Lexicon (ALEX) - "theory" in the mathematical sense
      • Object Lexicon (OLEX) - "model" in the mathematical sense
metalex design data
Metalex design: data
  • Data-driven acquisition:
    • Computational fieldwork
      • Portable metadatabase with restricted vocabulary and general metatext, and
        • Definition of and support for transcription + annotation
        • Portable support for scenarios, scripts
        • Portable support for lexicon processing
    • Induction of lexica
      • Lexicon tools for
        • Extraction of macrostructural elements (lexeme elements)
        • Induction of microstructural information (media concordance, POS, ...)
        • Induction of mesostructural regularities and subregularities (grammar, ...)
metalex design theory
Metalex design: theory
  • Theory-informed formalisation:
      • Abstract Lexicon (ALEX) - "theory" in the mathematical sense
          • Decomposition (componential A-V description)
          • Generalisation (inheritance)
          • Composition (multilinear operations)
      • Object Lexicon (OLEX) - "model" in the mathematical sense
          • XML archiving and dissemination formats
          • object-relational database acquisition and processing formats
    • = Integrated Lexicon (ILEX)
metalex implementation architecture
Metalex implementation:architecture
  • Data model Ç Theory = shared lexicon architecture:
      • Macrostructure: declarative and procedural components
          • Lexicon architecture: relational, inheritance, text, ...
          • Lexical objects: entry types
          • Lexical access: fact query, semasiological / onomasiological indexing
      • Mesostructure:
          • Generalisations: grammar, phonetics, cultural background, ...
          • Composition of lexicon object types: idioms, words, morphemes, ...
          • Lexical access: inferential query
      • Microstructure:
          • Lexical entry (article, lemma structure - atom, string, tree, ...)
          • Types of lexical information - standardly: "lexicon model"
metalex implementation microstructure
Metalex implementation:microstructure
  • Microstructure specification philosophy:
      • Anybody can specify any kind of unpredictable detail
          • Questionnaire / Experiment / Corpus / Archive dependence
          • Lexicon architecture: relational, inheritance, text, ...
          • Intelligent (semi-)automatic classification, not fixed attributes
      • Theory-informed coarse grouping is possible
          • Media attributes: visual, auditory, tactile, ...
          • Meaning attributes: definition, gloss, lexical relations, ...
          • Composition attributes: context/category, parts, operations
          • Use attributes: style, register, concordance, media illustrations, ...
          • Micrometadata attributes: lexicographer DB indices, source (e.g. fieldwork metadata) DB indices, modification, ...
metalex implementation fieldwork metadata source 1
Metalex implementation:fieldwork metadata source (1)

Situation dimensions

  • participant: fieldworker, partners, contacts
  • channel: modalities, media
  • locale: indoor/outdoor, spatial configuration
  • temporal: date, time, calendar event
  • functional: affiliation, role, occasion; observation (prompt, metadata management)

Language dimension

  • affiliation
  • discourse level: discourse type, genre + prosody
  • phrase level: recursive phrasal categories/relations + prosody
  • word level: clitics, inflexion, word formation + prosody
metalex implementation fieldwork metadata source 2
Metalex implementation:fieldwork metadata source (2)

Technical dimension

  • physical characteristics of participants: age, sex, health
  • physical characteristics of locale: indoor/outdoor, spatial configuration, temporal sequence, date (season), time (of day)
  • audio: mike type, position, room; A/D; channels, fsample, resolution; formats
  • video: camera & microphone type, analogue/digital; filters, lenses; audio; formats
  • other sensors: laryngograph, airflow, data glove, ...

Metalinguistic dimension

  • empirical method: introspection, experiment, corpus elicitation
  • materials: questionnaire, experiment layout, corpus scenario
  • metadata specification: index, metatext type, metacatalogue type
metalex implementation fieldwork metadata entry tool
Metalex implementation:fieldwork metadata entry tool

LREC 2002, Workshop on Portability Issues

metalex objects in conjunction with work in isle clwg computational lexicon working group
Metalex objectsin conjunction with work in ISLE CLWG(Computational Lexicon Working Group)

(see Gibbon in reading list)

LEXICON:

  • { < Macrostructure > , < Mesostructure > }
    • Macrostructure: Ordering( {ENTRY, ...} )
    • Mesostructure: < FrontmatterMetadata, Descriptions >

ENTRY:

  • < Microstructure, HousekeepingMetadata >
the lexicon object
The LEXICON object

Front Matter Metadata:

  • Bibliographical: creator, publisher, title, date, ...
  • Medium / format: paper, CD-ROM/DVD, web, ...

Macrostructure type:

  • access: semasiological/onomasiological,
  • n-lingual/langue(s),
  • special: taxonomy (thesaurus), concordance
  • structure, e.g. tabular: f(type,attrib)=value
the entry object metadata
The ENTRY object: metadata

Entry Metadata: (see Gibbon & al. in reading list)

  • Entry type (wrt macrostructure specification):
      • encyclopaedic
      • multiword unit, word, ...
  • Microstructure data model specification:
      • entry structure: flat, tree, graph (net), ...
      • dta categories specification (atribute, field, information type)
        • DC groups - structural skeleton
        • DCs
        • DC substructure - homography, homophony, polysemy ...
the entry object dc groups
The ENTRY object: DC groups

Media ("surface"):

  • acoustic (phonetic, earcon, sonification,), visual (orthography, icon, gesture, ...)

Composition (structure):

  • part (e.g. morphology for words), context (e.g. POS, subcat for words)

Meaning (definition, illustration):

  • semantic (components, relations, senses, ontology)
  • pragmatic (speech act, dialogue, disfluency, ...)

Use: typically: media (e.g. audio) concordance, ...

Metadata: lexicographer, ...

the entry object dcs
The ENTRY object: DCs

Countless Data Category models: (see reading list)

  • every existing dictionary
  • linguistic "types of lexical information"
  • several European projects

(GENELEX, MULTILEX, ACQUILEX, ...)

  • ISO terminology norms (cf. MARTIF etc. ...)
the entry object dc structures
The ENTRY object: DC structures

Computationally relevant properties of fields:

  • type (atomic, complex: tree, string, xyz-formatted text)
  • character encoding spec.: ASCII, Unicode, xyz
  • tree (or other graph/net):
      • finite depth
      • flat, disjunctive disjunctive tree
      • recursive graph (net)
  • table, non-tree graph, anchor/link/index structure
  • generated text:
      • print, hypertext (compiled vs. dynamic (generated on the fly)
metalex microstruture application
Metalex microstruture application

Media ("surface"):

  • phonemic & tonemic transcription (SAMPA ASCII - still waiting for Unicode...)

Composition (structure):

  • morphemic substructure, category & subcategory

Meaning (definition, illustration):

  • glosses (English, French, German)
  • definitions, senses, relations, components; audio-visual illustration

Use: genres; examples (e.g. concordance link); free text notes

Metadata: first record; last field

metalex field lexicon microstruture
Metalex field lexicon microstruture

Anouman_1:

  • Media attributes:
      • Phonemic tier: `an'U~m`'a~
      • Skeletal tier: VNVNV
      • Tonal tier: L H LH
      • Signal tier: Audio
  • Meaning attributes:
      • F-gloss: Oiseau
      • E-gloss: Bird
      • G-gloss: Vogel
      • Definition: avis
      • Homophone full: Anouman_2: grandchild
      • Homophone phonemic: Anouman_3: yesterday
  • Use:
      • < Concordance pointer >
      • Genre: narrative
  • Metadata:
      • Lexicographer: S. Adouakou
      • Source: Bielefeld-Anyi-Corpus, Adaou village, CI
      • Date: March 2002
metalex portable lexical database
Metalex portable lexical database

Relational database:

  • Metalex specs flattened
  • structure re-constitution via metalex specs
  • HanDBase for PalmOS
  • Features:
      • standard full RelDBMS
      • XML, CSV, text export
      • export/import via GSM
      • inexpensive (wrt laptop)
      • stylus, keyboard, sync input
      • light weight
      • low power consumption
      • inconspicous in use
      • interfaces to Scheme, C
metalex extension the modelex project theory and design of multimodal lexica
Metalex extensionThe Modelex project:"Theory and Design of Multimodal Lexica"

Goals:

  • Data-driven, theory-informed lexicon models
  • Formal properties of abstract data models for multimodal lexica
  • Interpretation of abstract data models in XML
  • Integration of parallel annotation lattices for modalities and submodalities
  • Development of a prototype multimodal lexicon
modelex gesture annotation
Modelex: gesture annotation

Time Aligned Signal

Corpus System

(Java, GPL)

Jan-Torsten Milde, U Bielefeld

TASX annotator:

  • Phonological tier
  • ToBI tiers
  • Gesture tier
  • Speech Act tier

Anyi, Ega, German

metalex in the modelex project m ultimodal concordance as microstructure dc
Metalex in the Modelex project:Multimodal concordance as microstructure DC

Prototype: http://www.spectrum.uni-bielefeld.de/langdoc/PAX/

metalex in the modelex project underspecified alex microstructure for gesture coordinates
Metalex in the Modelex project:underspecified ALEX microstructure for gesture coordinates

Hand:

<parts> == "Palm" "Digit"

<vector> == "<name>" <coord "<name>">

<coord> == "<x1>" "<y1>" "<x2>" "<y2>"

<> ==

.

Palm:

<parts> == <vector>

<name> == palm

<width> == pw

<height> == ph

<x1 fore> == <x1>

<x1 middle> == ( <x1> + ( <x2> - <x1> ) / 3 )

<x1 ring> == ( <x1> + ( <x2> - <x1> ) * 2 / 3 )

<x1 pinky> == <x2>

<x1> == px1

<y1> == py1

<x2> == ( <x1> + <width> )

<y2> == ( <y1> + <height> )

<> == Hand

.

metalex in the modelex project fully specified alex microstructure for gesture coordinates
Metalex in the Modelex project:fully specified ALEX microstructure for gesture coordinates

Hand:<parts> =

palm px1 py1 ( px1 + pw ) ( py1 + ph )

thumb px1 py1 ( px1 - lt ) py1

fore px1 py1 px1 ( py1 - lf )

middle ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) ( py1 - lm )

ring ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) ( py1 - lr )

pinky ( px1 + pw ) py1 ( px1 + pw ) ( py1 - lp )

metalex conclusion prospects
Metalex: conclusion & prospects

User complexity:

  • demands an open, data-driven approach

Domain:

  • demands a theory-informed approach
  • with computational acquisition & inference

Data-driven and theory-informed lexica

  • are possible (METALEX)
  • need integrated model-theoretic approach (ILEX):

INTERPRETATION (ALEX) = OLEX

  • a formal problem remains: differing complexity of

trees (archive): simulation of other graphs via semantics only

annotation lattices (data), tables (lexica):

regular relations if non-recursive, indexed grammars if recursive?