Language resources for maltese
1 / 35

Language Resources for Maltese - PowerPoint PPT Presentation

  • Updated On :

Language Resources for Maltese. Mike Rosner Dept. Artificial Intelligence University of Malta Malta. Team. Mike Rosner, Dept AI, UoM Ray Fabri, Inst. Linguistics, UoM Duncan Attard, RA, Dept AI, UoM Albert Gatt, Aberdeen and UoM …. and others . Outline.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Language Resources for Maltese' - RoyLauris

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Language resources for maltese l.jpg

Language Resources for Maltese

Mike RosnerDept. Artificial Intelligence

University of Malta

Language Resources for Maltese

Malta l.jpg

Language Resources for Maltese

Slide3 l.jpg

  • Mike Rosner, Dept AI, UoM

  • Ray Fabri, Inst. Linguistics, UoM

  • Duncan Attard, RA, Dept AI, UoM

  • Albert Gatt, Aberdeen and UoM

    …. and others

Language Resources for Maltese

Outline l.jpg

  • Maltese Language

  • MLRS

    • Corpus

    • Lexicon

  • Conclusion

  • Demo

Language Resources for Maltese

Maltese language l.jpg
Maltese Language

  • National language of the Maltese Islands (along with English).

    • c.1M native speakers (Malta, Australia, Canada, UK)

  • Real language

  • Mixed Language

    • Arabic: kelb (dog)

    • Romance: karozza (car)

    • English: swiċċ; ners; owkej

  • Latin script + some special characters

    • ċ, ġ, ħ, ż, għ, ie

  • Vowels are written (unlike Arabic)

    • kiteb

Language Resources for Maltese

Semitic morphology l.jpg
Semitic Morphology

  • Root-and-template based

  • Root has 3 consonantse.g. "k t b"

  • Template is a pattern of consonants and vowels e.g. CVCVC

  • Vocalism = 2 vowels e.g. "i e"

  • Word formed by interdigitation

  • interdigitate(ktb, ie, CVCVC) → kiteb

Language Resources for Maltese

Semitic morphology7 l.jpg
Semitic Morphology

  • ħadem work (verb);

  • ħaddiem worker;

  • ħidma work (noun);

  • ħadem be worked (verb passive);

  • ħaddem caused to work.

Language Resources for Maltese

Plural formation l.jpg

Sound Plural

formed by suffixes:

(a) Romance

karozza/karozzi (car)

tappit/tappiti (carpet)

(b) Semitic

ikla/ikliet (food)

Broken Plural

change of stemdrop of vowel



ġdid/ġodda (new)

tappit/twapet (carpet)

Plural Formation

Language Resources for Maltese

Morpho syntactic features l.jpg
Morpho-Syntactic Features

  • Verb-less sentencesIl-karozza ġdid/the car is new

  • Construct state (inalienable possession)Id it-tifel/the boy's hand

  • Sun-lettersix-xemx/the sunit-tifel/the boy

Language Resources for Maltese

Construct state l.jpg
Construct State

  • Id it-tifel fil-but

  • Id it-tifel fil-but

  • hand (def) the boy in the pocket

  • The boy's hand (is) in the pocket

Language Resources for Maltese

Verbs with semitic inflections l.jpg

Italian Borrowing

spjega explain (It. spiegare)

jispjega he explains

nispjegaw we explain

spjegat she explained

spjegajt I explained, etc.

English Borrowing

ixxuttja kick a football (Eng. shoot)

jixxuttja he kicks

nixxuttjaw we kick

ixxuttjat she kicked

ixxuttjajt I kicked, etc.

Verbs with Semitic Inflections

Language Resources for Maltese

Clitic pronouns l.jpg
Clitic Pronouns

  • bgħatthielux

  • bgħat − t − hie − lu − x

  • send past to her it not


  • I didn't send it to her

Language Resources for Maltese

Summary l.jpg

  • Mixed language

  • Morphology and syntax more mixed together than in other European languages (typical of Semitic langs)

  • Empirical work needs to be carried out to establish correct morphosyntactic description.

  • Lack of systematic language resources

Language Resources for Maltese

Language resources l.jpg
Language Resources

  • Natural language processing systems and tools,

  • Linguistic research that yields new knowledge about the language itself, and

  • Language-related industries such as software localization, translation, publishing etc.

Language Resources for Maltese

Maltese language resource server mlrs l.jpg
Maltese Language Resource Server (MLRS)

  • RTDI National Project

  • Main Deliverables:

    • Maltese National Corpus (Server)

    • Computational Lexicon (Server)

  • Subsidiary Deliverables - tools for access, creation and maintenance of resources

    • Tokeniser

    • Part of Speech Tagger

    • NP Chunker

Language Resources for Maltese

Same data different services l.jpg
Same Data, Different Services

Language Resources for Maltese

Corpus l.jpg

  • Representative

  • Accessible to

    • contributors

    • editors

    • other users

  • Multiple levels of annotation

  • Word extraction

Language Resources for Maltese

2 dimensional corpus l.jpg
2 Dimensional Corpus

Text Category

Language Resources for Maltese

Levels of annotation l.jpg
Levels of Annotation

Language Resources for Maltese

C 20 text categories l.jpg
c. 20 Text Categories

Language Resources for Maltese

Corpus website l.jpg
Corpus Website

Language Resources for Maltese

Wordlist management l.jpg
Wordlist Management

  • User submits text, files or page URLs.

  • These resources are scanned and the words extracted from them and displayed.

  • User edits the resulting lists of extracted words manually.

  • User submits final version for incorporation into the wordlist database.

Language Resources for Maltese

Current corpus l.jpg
Current Corpus

  • 50M words at level 0, predominantly news, legal, government. Some fiction.

  • Submission requires a signed agreement from contributors.

  • Level 0

    • catalogue: visible to all

    • contents: only visible to submitter.

  • Level 1 and higher

    • catalogue and contents: visible to all

Language Resources for Maltese

Morphosyntactic annotation level iii l.jpg
Morphosyntactic AnnotationLevel III

  • Tagset: a predetermined collection of tags for Maltese (Albert Gatt/Ray Fabri)

  • Brill Tagger (Brill 1996)

  • Training phase – hand tagging.

  • Each tag can be regarded as a set of attribute/value pairs

  • For example, the tag NCS stands for{Cat=noun, Type=common, Num=sing}

Language Resources for Maltese

Lexicon aims l.jpg
Lexicon - Aims

  • Broad coverage

  • Support for different kinds of lexical information

    • Syntactic (Part of Speech + other)

    • Phonetic Spelling

    • Translation (En)

  • Interaction with linguist over Internet

Language Resources for Maltese

Lexicon construction workflow l.jpg
Lexicon Construction: Workflow

  • Extract wordlists from text (automatic)

  • Identify/correct headwords (semi-automatic)

    • Alignment techniques (Dalli 2001)

    • Automatic prefix/suffix recognition (Attard 2004)

  • For each headword, construct lexical entry (manual)

  • Led (Lexicon Editor)

Language Resources for Maltese

Lexicon editor l.jpg
Lexicon Editor

Language Resources for Maltese

Object description language l.jpg
Object Description Language

  • OO language for handling dependencies between lexical fields.

  • Primarily affects linguist interface.

  • An ODL description contains the following parts in order:

    • Enumeration Declarations

    • Class Declarations

    • Rules (Optional)

    • Macro Definitions (Optional)

Language Resources for Maltese

Odl example l.jpg
ODL Example

enum Number { Singular, Plural, Dual }

class NOUN


Cat = noun;

Type = common | proper;

Number = *; }


{ Case = *; }

if (Number == Plural){ !Gender }

Language Resources for Maltese

Current status l.jpg
Current Status

  • Website (

    • User Classes (public; linguist; administrator)

  • Corpus

    • Web interface

    • Tools level0; level1; level 2

    • Collection approx 50MB @ level 0

  • Lexicon

    • Editor/Browser

    • ODL version 0

Language Resources for Maltese

Future work l.jpg
Future Work

  • Manual annotation:

    • POS annotation to train tagger

    • Migration of level 0 to level 1

  • Morphological component

    • Morphological analyser/synthesiser

    • Relationships between lexical entries

  • HPSG integration. Stefan Muller, Saarbruecken.

  • Compatibility/Integration with existing lexical resources (cf WordNet)

  • Language-enabled tools.

    • Spellchecker

    • IE

    • Translation

Language Resources for Maltese

Inheritance and morphology l.jpg
Inheritance and Morphology

j a s l u

{ pers=1, mamma=wasal, num=plur }

Language Resources for Maltese

Conclusion l.jpg

  • Cross-disciplinary (Ling/CLing/CS) project presents challenges.

  • Training of automatic tagger has been a bottleneck.

  • Stable funding/support required beyond life of project

Language Resources for Maltese

Valletta in winter l.jpg
Valletta in Winter

Language Resources for Maltese