Laying the Foundations for a Diachronic Dictionary of Tunis Arabic
Download
1 / 26

- PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource. Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology ( Austrian Academy of Sciences )

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - phuong


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Karlheinz m rth 1 stephan proch zka 2 ines dallaji 2

Laying the Foundations for a Diachronic Dictionary of Tunis ArabicA First Glance at an Evolving New Language Resource

Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2

1Instituteof Corpus Linguistics and Text Technology (AustrianAcademyofSciences)

2DepartmentofOrientalStudies (Universityof Vienna)

karlheinz.moerth@oeaw.ac.at

stephan.prochazka@univie.ac.at

ines.dallaji@univie.ac.at


Introduction two projects
Introduction ArabicTwoprojects

Vienna Corpus ofArabicVarieties (VICAV)

  • Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach (TUNICO)

  • Text technology + Linguistics


Introduction vicav
Introduction ArabicVICAV

==> Vienna Corpus ofArabicVarieties

Digital language resources of a wide range of spoken Arabic varieties: dictionaries, corpora, bibliographies, language profiles, best practices

Cooperationof University of Vienna andthe Austrian AcademyofSciences

  • http://corpus3.aac.oeaw.ac.at/vicav2/


Introduction vicav1
Introduction ArabicVICAV


Introduction vicav2
Introduction ArabicVICAV


Introduction vicav3
Introduction ArabicVICAV


Introduction tunico
Introduction ArabicTUNICO

  • ==> Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach

  • Funded by the Austrian Science Fund (FWF, P 25706-G23)

  • Main objectives:

  • Linguistic exploration of spoken, contemporary Arabic

  • Two digital language resources

  • Corpus of spoken youth language

  • Dictionary of Tunis Arabic


Arabic dialect lexicography
Arabic Arabicdialectlexicography

  • NocomprehensivedictionaryoftheArabicdialectof Tunis

  • Basis for diachronicresearch:

  • Nicolas, A. (1911).Dictionnairefrançais-arabe

  • Beaussier, M. (2006). Dictionnairepratiquearabe-français(arabemaghrébin)

  • Quéméneur, J. (1961). “Notes surquelquesvocables du parlerTunisien”

  • Quéméneur, J. (1962). “Glossaire de dialectal”

  • Abdellatif, K. (2010). Dictionnaire «le Karmous» du Tunisien

  • Marçais, W. , Guîga, A. (1958-61). Textes arabes de Takroûna. II:Glossaire


Dictionary of tunis arabic
Dictionary Arabicof Tunis Arabic

- micro-diachronic and machine-readable

- up-to-date and easily accessible lexical information

- incorporation of:

a) contemporary data from a digital corpus

b) various historical sources (e.g. Stumme, H.)

- information added is kept traceable to its origin

- basis: data taken from didactic materials

- 3 other main sources: newly created corpus,

interviews and historical publications


Dictionary of tunis arabic contemporary sources
Dictionary Arabicof Tunis ArabicContemporary sources

1) Corpus of spoken youth language (dialogues, narratives):

uncommon approach in Arabic dialectology:

dialectological interests in language of older people --> only older

forms of particular varieties known

focus on modern language, contemporary usage and lexical

neologisms

2) Additional interviews to complete the data gained from corpus and historical sources


Dictionary of tunis arabic historical sources
Dictionary Arabicof Tunis ArabicHistorical sources

- 800-page grammar of the Medina of Tunis by Hans-Rudolf Singer

(1984): evaluation of data, integration of excerpted lexicographic

data into dictionary

- Verification and completion of collected data with other

historical resources

- Diachronic dimension helps to understand processes in the

development of the lexicon

- Material gathered will allowanalysis of recent developments

(migration of parents from rural areas, influence by other Arabic

varieties, influence of revolution, foreign elements)


Dictionary of tunis arabic1
Dictionary Arabicof Tunis Arabic


Dictionary of tunis arabic technical issues
Dictionary Arabicof Tunis ArabicTechnical issues

Modelling thedata

Tools


Dictionary of tunis arabic technical issues1
Dictionary Arabicof Tunis ArabicTechnical issues

Single schemafor a rangeofdictionaries

LMF, RDF, SKOS, TEI (P5)


Dictionary of tunis arabic technical issues2
Dictionary Arabicof Tunis ArabicTechnical issues

  • Using the TEI dictionary module to encode digitised print dictionaries is a fairly common standard procedure in digital humanities.

  • The TEI dictionary module needs to be further constrained:

  • to enhance interoperability

  • to reduce alternate constructs

  • to achieve a high degree of compliance with LMF (ISO 24613)

  • Easy to impose in the creation of digitally born dictionaries.


Dictionary of tunis arabic basic schema
Dictionary Arabicof Tunis ArabicBasic schema

<TEI>

<teiHeader>

...

</teiHeader>

<text>

<body>

<divtype="entries">

<entry>...</entry>

<entry>...</entry>

<entry>...</entry>

...

...

...

</div>

</body>

</text>

</TEI>


Dictionary of tunis arabic basic schema1
Dictionary Arabicof Tunis ArabicBasic schema

<body>

<divtype="entries">

<entry>...</entry>

<entry>...</entry>

<entry>...</entry>

...

...

...

</div>

<divtype="examples">

<cittype="example">...</cit>

<cittype="example">...</cit>

<cittype="example">...</cit>

...

...

...

</div>

</body>


Dictionary of tunis arabic basic schema2
Dictionary Arabicof Tunis ArabicBasic schema

<entryid="ktaab_001">

<formtype="lemma">

<orthlang="ar-aeb-x-tunis-vicav">ktāb</orth></form>

<formtype="inflected"ana="#n_pl">

<orthlang="ar-aeb-x-tunis-vicav">ktub</orth></form>

<gramGrp>

<gramtype="pos">noun</gram>

<gramtype="root"lang="ar-aeb-x-tunis-vicav">ktb</gram>

</gramGrp>

<sense>

<cittype="translation"lang="en">

<quote>book</quote></cit>

<cittype="translation"lang="de">

<quote>Buch</quote></cit>

<cittype="translation"lang="fr">

<quote>livre</quote></cit>

</sense>

</entry>


Dictionary of tunis arabic representing diachrony
Dictionary Arabicof Tunis ArabicRepresenting diachrony

<bibl>

<author>Ritt-Benmimoun</author>

<date>2014</date>

</bibl>

<bibl>

<author>Singer</author>

<date>1958</date>

<biblScopeunit="page">56</biblScope>

</bibl>


Dictionary of tunis arabic documentation
Dictionary Arabicof Tunis ArabicDocumentation

  • http://corpus3.aac.ac.at/vicav2/query/

  • tools/dictionary_encoding_guidelines


Dictionary of tunis arabic tools
Dictionary Arabicof Tunis ArabicTools

  • VienneseLexicographic Editor (VLE)

  • XML editor providing functionalities typically needed in compiling lexicographic data

  • Web-based standalone application

  • Designed to process standard-based lexicographic and terminological data such as LMF, TBX, RDF or TEI.

  • Automating procedures

  • Freely configurable visualisation (via XSLT)

  • Validation: MSXML Schema

  • Client-server architecture (php + mysql)

  • Freely available and easy to setup


Dictionary of tunis arabic tools1
Dictionary Arabicof Tunis ArabicTools


Dictionary of tunis arabic tools2
Dictionary Arabicof Tunis ArabicTools

  • Corpus – Dictionary interface

  • tokenEditor

  • Specialised Web-browser


Dictionary of tunis arabic tools3
Dictionary Arabicof Tunis ArabicTools

  • corpus_shell

  • ... a modular framework of reusable software components to access and publish heterogeneous and distributed language resources such as language corpora, dictionaries, encyclopaedic databases, prosopographic databases, bibliographies, metadata, and schemata.

  • Language Resources Portal

  • clarin.oeaw.ac.at/ccv/corpus_shell.

  • clarin.oeaw.ac.at/ccv/


Dictionary of tunis arabic status and outlook
Dictionary Arabicof Tunis ArabicStatus and outlook

CLARIN-ERIC (Common Language Resources and Technology Infrastructure).

Open access and open source.

~5000 entries


Karlheinz m rth 1 stephan proch zka 2 ines dallaji 2

Thank you for your attention! Arabic

! شكراً لانتباهكم

Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2

1Instituteof Corpus Linguistics and Text Technology (AustrianAcademyofSciences)

2DepartmentofOrientalStudies (Universityof Vienna)

karlheinz.moerth@oeaw.ac.at

stephan.prochazka@univie.ac.at

ines.dallaji@univie.ac.at