Lexicography and computer science a harmless drudgery
This presentation is the property of its rightful owner.
Sponsored Links
1 / 76

Lexicography and computer science: a harmless drudgery? PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Lexicography and computer science: a harmless drudgery?. Judith Knapp ([email protected]) Andrea Abel ([email protected]) European Academy Bozen - Bolzano. Content. Learner‘s Difficulties and Needs Pedagogical Lexicography Today – A Short Overview

Download Presentation

Lexicography and computer science: a harmless drudgery?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lexicography and computer science a harmless drudgery

Lexicography and computer science: a harmless drudgery?

Judith Knapp ([email protected])

Andrea Abel ([email protected])

European Academy Bozen - Bolzano


Content

Content

  • Learner‘s Difficulties and Needs

  • Pedagogical Lexicography Today – A Short Overview

  • ELDIT – Linguistic-lexicographic Background & Live Demo

  • Datamodel

  • Implementation

  • Content Authoring

  • ELDIT and Word Manager

  • ELDIT and the TreeTagger

  • Literature

  • Conclusion


Lexicography and computer science a harmless drudgery

Problems with foreign

language use

Problems

Decoding

Encoding

Paradigmatic

level

Semantic

level

Syntagmatic

level

Learners‘difficulties and needs


Lexicography and computer science a harmless drudgery

PROBLEMS WITH SYNONYMS

AND SIMILAR WORDS

convegno

riunione

incontro

assemblea

(meeting)

Assemblea condominiale (condominium meeting)

assemblea d‘affari (business meeting)


Lexicography and computer science a harmless drudgery

DIFFICULTIES WITH WORD COMBINATIONS

Collocations

fixed combinations of words (arbitrary, unpredictable):

Grammatical Constructions

formed according to the rules of grammar, partly arbitrary:

  • Ex:

  • to brush one‘s teeth

  • lavarsi i denti

  • sich die Zähne putzen

  • Ex:

  • to ask sb sth

  • chiedere qlco a qlcu

  • jemanden etwas fragen


Lexicography and computer science a harmless drudgery

Problems with

dictionary use

Problems with

dictionary use

Problems with

dictionary use

Metalanguage

Abbreviations

Technical

terms

Other

„codes“

Descriptive

language

Learners‘difficulties and needs

Problems with foreign

language use

Problems

Decoding

Encoding

Syntagmatic

level

Semantic

level

Paradigmatic

level


Lexicography and computer science a harmless drudgery

ABBREVIATIONS

Italian

agg.

art.

tr.

determ.

pron.

femm.

ant.

volg.

region.

mus.

sociol.

German

Adj.

Art.

tr.

best.

Pron.

w./Fem.

veralt.

vulg.

landsch.

Mus.

Soziol..

(adjective)

(article)

(transitive verb)

(definite article)

(pronoun)

(feminine)

(archaic)

(vulgar)

(regional)

(music)

(sociology)


Lexicography and computer science a harmless drudgery

TECHNICAL TERMS

aggettivo

articolo

ausiliare

transitivo

determinativo

pronome

femminile

antico

volgare

dialetto

musica

sociologia

Adjektiv

Artikel

Hilfsverb

transitiv

bestimmt

Pronomen

weiblich

veraltet

vulgär

landschaftlich

Musik

Soziologie

grammar

language

variation


Lexicography and computer science a harmless drudgery

OTHER „CODES“

International Phonetic Alphabet (IPA) or other transcription systems

focus

shake

chiesa [chiè-sa]

.

Syntactic information (valency) provided in coded or abbreviated form

Ex.: (a) geben; [...] Vt j-m etw. g (Langenscheidt)

(b) give 2 Vnn (Cobuild)

Vn

(c) dare 17. N-V-N1 (N2/a N3) (Blumenthal/ Rovere)


Lexicography and computer science a harmless drudgery

UNDERSTANDING THE DEFINITION...

„Ich muß im Lexikon nachschlagen, um herauszufinden, was eine Jungfrau ist. [...] Im Lexikon steht, Jungfrau, Frau (gewöhnlich jung), welche sich in einem Zustand unangetasteter Keuschheit befindet und in diesem verbleibt.

Jetzt muß ich unangetastet und Keuschheit nachschlagen, und alles, was ich hier finde, ist, daß unangetastet das Gegenteil von angetastet bedeutet, und Keuschheit bedeutet keusch, und das bedeutet frei von gesetzeswidrigem geschlechtlichen Interkursus. Jetzt muß ich Interkursus nachschlagen [...] und ich weiß nicht, was das bedeutet, und ich bin es einfach leid, in dem schweren Lexikon von einem Wort zum anderen geschickt zu werden wie ein Vollidiot, und das alles nur, weil die Leute, die das Lexikon geschrieben haben, nicht wollen, daß unsereins etwas erfährt.

Ich will doch nur wissen, wo ich hergekommen bin, aber wenn man jemanden fragt, sagen sie einem, man soll jemand anderen fragen, oder sie schicken einen von Wort zu Wort.“

(McCourt 1998: 412 – 413, dt. Übersetzung)


Lexicography and computer science a harmless drudgery

Formal

Problems

Search

Presentation

Learners‘difficulties and needs

Problems with foreign

language use

Problems with

dictionary use

Problems with

dictionary use

Problems with

dictionary use

Problems

Decoding

Encoding

Metalanguage

Syntagmatic

level

Abbreviations

Semantic

level

Paradigmatic

level

Technical

terms

Other

„codes“

Descriptive

language


Lexicography and computer science a harmless drudgery

Problems with searching

  • Time consuming

    • - 2000 pages

    • - Small characters

    • - Difficult metalanguage

  • Complex expressions

    • - Collocations (“Zähne putzen”)

    • - Idiomatic expressions


  • Lexicography and computer science a harmless drudgery

    Problems with presentation

    • Limited space

    • Linear presentation order

    • Organisation of the dictionary

    • Organisation of the entries


    Lexicography and computer science a harmless drudgery

    Solutions

    Learners‘difficulties and needs

    Problems with foreign

    language use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems

    Decoding

    Encoding

    Formal

    Problems

    Metalanguage

    Syntagmatic

    level

    Search

    Presentation

    Abbreviations

    Semantic

    level

    Paradigmatic

    level

    Technical

    terms

    Other

    „codes“

    Descriptive

    language


    Lexicography and computer science a harmless drudgery

    Pedagogical Dictionaries

    • Target Group: language learners

    • Functions: encoding & decoding

    • General characteristics:

      • (usually) monolingual

      • selective regarding macrostructure (limited number of entries )

      • exhaustive regarding microstructure (detailled information for each entry)


    Lexicography and computer science a harmless drudgery

    Elektronisches Lern(er)wörterbuch Deutsch-Italienisch

    ELDIT

    Dizionario elettronico per apprendentiItaliano-Tedesco

    http://www.eurac.edu/eldit


    Lexicography and computer science a harmless drudgery

    Three main characteristics:

    1. typologically innovative:

    • a monolingual dictionary (German or Italian): definitions, collocations, idiomatic expressions, examples … in the target language

      &

    • a bilingual dictionary (German and Italian): translation equivalents, explanations in L1

    • „cross-lingual“ dictionary German-Italian


    Lexicography and computer science a harmless drudgery

    2. well defined target group:

    • beginners – intermediate students (Waystage level A1 up to Threshold level B1):basic vocabulary: ~ 3.000 entry words for each language

    • addressed to the linguistic layman:limited use of meta-language, abbrevations and symbols


    Lexicography and computer science a harmless drudgery

    3. designed solely for computer use:

    • not a transformation of a paper dictionary into a electronic dictionary

    • exploits the possibilities of the electronic medium (multimedia & hypertext)

    • modular structure: contains detailled informations which you usually find in different types of dictionaries


    Lexicography and computer science a harmless drudgery

    1) Avoiding

    2) Explaining

    Electronic

    search

    possibilities

    1) Avoiding

    2) Explaining

    Hypertext

    and

    hyperlinks

    1) Sound-files

    2) Verb patterns

    1) Definitions

    2) Examples

    3) ...

    1) Simple

    2) Use of L1

    3) Multimedia

    Learners‘difficulties and needs

    Problems with foreign

    language use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems

    Formal

    Problems

    Decoding

    Encoding

    Metalanguage

    Syntagmatic

    level

    Search

    Presentation

    Abbrevations

    Semantic

    level

    Paradigmatic

    level

    Technical

    terms

    Other

    „codes“

    Solutions

    Descriptive

    language


    Lexicography and computer science a harmless drudgery

    SOLUTIONS ...

    Descriptive language

    1. Simple

    2. Multiple descriptions

    3. Hypertext


    Lexicography and computer science a harmless drudgery

    1. Simple=

    a) Limited defining vocabulary

    b) Easy syntax

    d) Avoid circularity


    Lexicography and computer science a harmless drudgery

    2. Multiple descriptions=

    a) Definitions

    b) Lexicographic examples

    c) Word fields

    d) L1 (semantic equivalents)

    [e) images]


    Lexicography and computer science a harmless drudgery

    Hypernyms

    Coordinates

    Kinds of ...

    das Gebäude

    Semantic information:

    1. Definitions

    2. Examples

    3. Word fields

    4. Equivalents

    das Haus, die Villa, das Schloss, die Wohnung ...

    das Hochhaus, das Bauernhaus ...

    1.a) Ein Haus ist ein Gebäude, in dem Menschen wohnen.casa

    Sie wohnt mit ihrer Familie in einem zweistöckigen Haus am Stadtrand.

    b) Ein Haus ist das Gebäude, in dem man ständig lebt und in das man

    regelmäßig zurückkehrt. Es ist der Ort, wo man daheim ist.

    Sie verlässt das Haus jeden Morgen um sieben Uhr, um zur Arbeit

    zu fahren.

    2. Das Haus sind die Bewohner eines Hauses (1a).casa

    ....

    Semantic Level:


    Lexicography and computer science a harmless drudgery

    3. Hypertext=

    a) Click on unknown words inside the definition

    b) Click on the semantic equivalents

    c) Click on any information you‘re interested in


    Lexicography and computer science a harmless drudgery

    1) Collocations

    2) Examples

    3) ...

    Learners‘difficulties and needs

    Problems with foreign

    language use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems

    Formal

    Problems

    Decoding

    Encoding

    Metalanguage

    Syntagmatic

    level

    Search

    Presentation

    Abbrevations

    Semantic

    level

    Paradigmatic

    level

    Technical

    terms

    1) Avoiding

    2) Explaining

    Electronic

    search

    possibilities

    Other

    „codes“

    Solutions

    1) Avoiding

    2) Explaining

    Descriptive

    language

    Hypertext

    and

    hyperlinks

    1) Sound-files

    2) Verb patterns

    1) Definitions

    2) Examples

    3) ...

    1) Simple

    2) Use of L1

    3) Multimedia


    Lexicography and computer science a harmless drudgery

    Syntagmatic level:

    1. Collocations

    2. Idiomatic Expressions

    3. Verb Valency


    Lexicography and computer science a harmless drudgery

    Verb Valency

    • Definition: “Valency refers to the capacity of a verb to take a specific number and type of arguments”(Bianco)

    • Theoric origin: dependency grammar (Lucien Tesnière)


    Lexicography and computer science a harmless drudgery

    Verb Valency: a problem for learners and researchers

    • verb constructions are largely arbitrary and unpredictable

    • number of obligatory and facultative elements

    • distinction between transitivity and intransitivity


    Lexicography and computer science a harmless drudgery

    The description of verb valency in different dictionary types

    • General monolingual dictionaries


    Lexicography and computer science a harmless drudgery

    The description of verb valency in different dictionary types

    2. Special mono- and bilingual verb valency dictionaries


    Lexicography and computer science a harmless drudgery

    The description of verb valency in different dictionary types

    3. (Monolingual) learners‘ dictionaries


    Description of verb valency in eldit

    N-V-N1-(N2)v.tr. (2 argom.) Vt/i (etw.) (über j-n/etw.) r.

    Description of Verb Valency in ELDIT

    I. Learner friendly description:

    Explicit way of describing verb valency


    Description of verb valency in eldit1

    Description of Verb Valency in ELDIT

    II. Multimedia:

    Visualization of information to support comprehension

    (colors and animations instead of meta-language)


    Description of verb valency in eldit2

    Description of Verb Valency in ELDIT

    • III. Semiotic didactics:

    • Functions of the different colors:

    • they indicate the parts of the sentence

    • they show which parts of the verbs belong together

    • correspondence between patterns and examples


    Description of verb valency in eldit3

    Description of Verb Valency in ELDIT

    • IV. Additional explanations for the learner:

    • Visible notes to describe semantic restrictions

    • Variations for realizing single parts of the sentence


    Lexicography and computer science a harmless drudgery

    Lexical fields

    Three dimensional

    graphics

    Learners‘difficulties and needs

    Problems with foreign

    language use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems

    Decoding

    Encoding

    Formal

    Problems

    Metalanguage

    Syntagmatic

    level

    Search

    Presentation

    Abbreviations

    Semantic

    level

    Paradigmatic

    level

    Technical

    terms

    1) Avoiding

    2) Explaining

    Other

    „codes“

    Electronic

    search

    possibilities

    Solutions

    1) Avoiding

    2) Explaining

    Descriptive

    language

    1) Collocations

    2) Examples

    3) ...

    Hypertext

    and

    hyperlinks

    1) Sound-files

    2) Verb patterns

    1) Definitions

    2) Examples

    3) ...

    1) Simple

    2) Use of L1

    3) Multimedia


    Lexicography and computer science a harmless drudgery

    PARADIGMATIC RELATIONS

    • Word field theory:

      „Ein Wortfeld ist eine Gruppe von Wörtern, die inhaltlich einander eng benachbart sind und die sich vermöge Interdependenz ihre Leistungen gegenseitig zuweisen.“ (Trier 1968/1973: 189, späte Def.)

    • Existing Projects

      - WordNet (GermaNet, Italian WordNet)

      - Alexia

      - Kirrkirr


    Paradigmatic relations in eldit

    Paradigmatic relations in ELDIT

    • Ca. 150 words per language

    • interactive graphic representation

    • spacial arrangement and colors for the representation of paradigmatic lexical relations

    • explicit description of the semantic relations between the lexical units and the lemma (no metalanguage)

    • definitions and examples for describing similarities/differences of meaning, register, authentic context


    Lexicography and computer science a harmless drudgery

    Lexical fields in ELDIT

    Type of meaning relations:

    • hierachical relations (hyperonymy/hyponymy; holonymy/meronymy)

    • non-hierarchical relations (similarity: synonyms, quasi-synonyms … - contrast: gradable and nongradable antonyms; converse terms)


    Lexicography and computer science a harmless drudgery

    Learners‘difficulties and needs

    Problems with foreign

    language use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems with

    dictionary use

    Problems

    Decoding

    Encoding

    Formal

    Problems

    Metalanguage

    Syntagmatic

    level

    Search

    Presentation

    Abbreviations

    Semantic

    level

    Paradigmatic

    level

    Technical

    terms

    1) Avoiding

    2) Explaining

    Other

    „codes“

    Electronic

    search

    possibilities

    Solutions

    1) Avoiding

    2) Explaining

    Descriptive

    language

    1) Collocations

    2) Examples

    3) ...

    Hypertext

    and

    hyperlinks

    1) Sound-files

    2) Verb patterns

    1) Definitions

    2) Examples

    3) ...

    Three dimensional

    graphics

    1) Simple

    2) Use of L1

    3) Multimedia


    Other modules

    Other modules

    • Flexion

    • Word family

    • N.B.


    Datamodel needs for an innovative presentation

    DatamodelNeeds for an innovative presentation


    A detailed data model

    A detailed data model


    Implementation

    Implementation

    • Hierarchical structured data

    • Many changes were expected

    • Communication with linguists


    Use of xml

    Use of XML

    • XML und XML-Editor

      • Hierarchic Structure

      • Communication with Linguists

    • Java-Servlet Technology

    • DXML or JDOM

    • Dynamic Generation of HTML


    Content authoring

    Content Authoring

    • Content Authoring

      • Difficult

      • Time consuming

      • Error prone

    • In ELDIT:

      • Innovative Presentation

      • Efficient Interface

        (Real World System)

      • Research of Linguists


    Efficient authoring interface

    “Efficient” Authoring Interface


    Efficient authoring interface1

    Efficient Authoring Interface

    • Semi-structured Data

    • Automatic full-structuring

    • Automatic enriching


    Semi structured data

    Semi-structured Data


    Lexicography and computer science a harmless drudgery

    Automatic full-structuring

    <example>

    <w>Meine</w>

    <w>Eltern</w>

    <w style="emphasized">haben</w>

    <w style="emphasized">das</w>

    <w style="emphasized">Haus</w>

    <w>vor</w>

    <w>50</w>

    <w>Jahren</w>

    <w style="emphasized">gebaut</w>

    <w>.</w>

    </example>

    <prebasuf>

    <article>die</article>

    <praefix>Be</praefix>

    <basis>haus</basis>

    <suffix>ung</suffix>

    </prebasuf>


    Lexicography and computer science a harmless drudgery

    Automatic Enriching

    • By using Computational Linguistics tools

    • WordManager

    • TreeTagger

    • PhraseManager, WordNet, Parser, …


    Lexicography and computer science a harmless drudgery

    <derivation>

    <prebasuf>die Be_haus_ung</prebasuf>

    <translation>la dimora</translation>

    </derivation>


    Lexicography and computer science a harmless drudgery

    <derivation id="de.n.haus.1.deriv2">

    <pattern id="de.n.haus.1.deriv2.patt0" base="Behausung" ctag="N" lexref="">

    <article base="der" ctag="art" lexref="de.g.articles.1.item1">die</article>

    <praefix explref="de.prae.h.be">Be</praefix>

    <basis>haus</basis>

    <suffix explref="de.suff.h.ung">ung</suffix>

    </pattern>

    <translation id="de.n.haus.1.deriv2.trans0">

    <w id="de.n.haus.1.deriv2.trans0.w0"

    type="content"

    base="il" ctag="art"

    lexref="it.g.articles.1.item2">la</w>

    <w id="de.n.haus.1.deriv2.trans0.w1"

    type="content"

    base="dimora" ctag="N"

    lexref="it.n.dimora.1">dimora</w>

    </translation>

    </derivation>


    Lexicography and computer science a harmless drudgery

    ELDIT and WordManager

    • WordManager

    • WM Transducers

    • WordManager in ELDIT


    Lexicography and computer science a harmless drudgery

    WordManager - 1992

    • System for reusable morphological dictionaries

    • Information of a word about

      • Flexion (Declination and Conjugation)

      • Word formation (Derivation and Composition)

      • Orthography (Old and new for German)

    • German, Italian, English


    Lexicography and computer science a harmless drudgery

    WMTransducers - 2000


    Wm in eldit

    WM in ELDIT

    Search (Lemmatizer)


    Lexicography and computer science a harmless drudgery

    Links and Additional Examples (Lemmatizer)


    Lexicography and computer science a harmless drudgery

    Exercises (Analyzer)


    Lexicography and computer science a harmless drudgery

    Conjugation tables (Generator)


    Lexicography and computer science a harmless drudgery

    ELDIT and TreeTagger

    • ELDIT Text Corpus

    • Development

    • Tagging

    • Manual Corrections


    Eldit texts

    ELDIT Texts


    Development

    Development

    • MSWord

      (Goethe Institut of Milan)

    • HTML

    • Simple XML


    Tagging

    Tagging

    • POS – tagging (→ TreeTagger)

    • XML with links

    • Iterative Correction by frequency of unlinked words


    Corrections

    Corrections

    • Old German spelling rules valid until 1998

    • The Italian verb “sono” (they are) was always tagged with “sonare” (=suonare, make music) instead of with “essere” (to be).

    • The verb “sia” (he may be) was always recognized as a conjunction and tagged with “sia” (as well as) instead of with “essere” (to be).

    • Many conjugated forms of “avere” were tagged with “riavere” (to get something back) instead of with “avere” (to have).

    • Many conjugated forms of “andare” were tagged with “riandare” (to go back) instead of with “andare”.

    • Abbreviated forms of Italian words (such as “bel”, “vuol”, “pur”, “fin”) were tagged as nouns and with the original form as lemma.

    • Some Italian words which exist both as nouns and as past participles (such as the word “successo” (the success, it happened)) were tagged with the wrong word class.


    Literature

    Literature

    • http://www.eurac.edu/about/collaborators/JKnapp/index.htm

      → Publications

      (some linguistic ones, too)

      → PhD-Thesis

      (Abel Andrea – Uni Innsbruck;

      Judith Knapp – Uni Hannover)


    Conclusion

    syntagmatisch, paradigmatisch, pragmatisch, Polysemie, Homographie, Homonymie, Holonymie, Hyponymie, Hyperonymie, semiotisch, ludativ, …

    Fileserver,Webserver,

    Datenmodell, HTTP request,

    Client, Protokoll, Port, …

    +∞

    ∫√∂u∆v

    - ∞

    Goal based scenarios, blended learning …

    TEI, CES, NLP, Lemmatizing, POS-Tagging …

    Conclusion


  • Login