Issues in multilingual thesauri
Download
1 / 34

Issues in Multilingual Thesauri - PowerPoint PPT Presentation


  • 85 Views
  • Updated On :

Issues in Multilingual Thesauri. Managing Content. Managing Content relevant and related to an organization Documentary Resources Internally generated reports and other resources Web Resources CMS combine a variety of tools & technologies. Managing Content. Involves Capturing Storing

Related searches for Issues in Multilingual Thesauri

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Issues in Multilingual Thesauri' - quana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Managing content l.jpg
Managing Content

  • Managing Content relevant and related to an organization

    • Documentary Resources

    • Internally generated reports and other resources

    • Web Resources

  • CMS combine a variety of tools & technologies


Managing content3 l.jpg
Managing Content

  • Involves

    • Capturing

    • Storing

    • Managing

    • Preserving; and

    • Delivering

      Information


Managing content4 l.jpg
Managing Content

  • Document management

  • Collaboration

  • Web content management

  • Records management - long-term storage

    Need for Vocabulary management;

    • Consistency in content representation

      • By Creators – authors

      • By Indexers

      • By Searchers

        Thesauri are important tools for this purpose


Slide5 l.jpg

LINGUSITC DIVERSITY IN GLOBAL INFORMATION NETWORKS AND UNIVERSAL ACCESS TO INFORMATION IN CYBERSPACE ARE AT THE CORE OF CONTEMPORARY DEBATES AND CAN BE A DETERMINING FACTOR IN THE DEVELOPMENT OF A KNOWLEDGE-BASED SOCIETY

UNESCO


Slide6 l.jpg


Multilingual thesauri l.jpg
Multilingual Thesauri increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Multilingual Thesauri support, among other things:

    • Cross-walk between KO tools

    • Cross-cultural communication (including comparative studies)

    • Navigation between semantically related concepts (Terms)

      • Semantic navigation between concepts in a domain and related knowledge resources (bibliographical metadata, etc)


Multilingual thesauri contd l.jpg
Multilingual Thesauri [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Intelligent query expansion

  • Linguistic Research

    Future

  • Improved natural language processing

    • Language recognition

    • Improved parsing

    • Concept resolution

  • Inferencing / Reasoning - Ontology


Background l.jpg
Background increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Early DRTC interest in Thesaurus Building

  • F-Thes

    • OM Information System

  • The Present Project

    • Digital Library of Tamil Classics

      Characteristics:

      • More than one language

      • Culture-Specific Domains


Slide12 l.jpg

Subject Coverage increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

Time / Period

Structure & Presentation

F-THES

Religious Mysticism

No period restriction

Structure defined to generate independent language thesauri, if required; Context specifying elements used only occasionally

TAMTH

Entire universe of subjects

Sangam Period

Structure based on Tamil terms as the base / source (descriptor) with corresponding terms in English language; Context specifying elements used for every Descriptor


Background contd l.jpg
Background [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • The objective:

    • To employ the new thesaurus for vocabulary management:

      • In Indexing

      • User Interfaces

        • formulating search expressions and search strategies

        • Facilitating navigation between related terms (Narrower, Broader and other Related terms)

        • Value addition via links to relevant lexical tools


Issues l.jpg
Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Humanities vis-à-vis Sciences

  • The Approach

    • Candidate Concepts & Terms

  • Issues Related to Script & Transliteration

  • Semantic Issues

  • Structural Issues

  • Management Issues

    • Handling NTs, BTs, RTs


Issues15 l.jpg
Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Focus on:

    • Vocabulary management in bilingual and multilingual thesauri in culture-specific domains;

    • Special aspects of the Tamil language in this regard;

    • Alternative ways of linking descriptors to lengthy lists of NTs and RTs;

    • Advantages of integrated use of two or more knowledge organization tools

  • Many of the issues discussed here are unique to Thesauri in the domains of Humanities


Issues16 l.jpg
Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Humanities vis-à-vis Sciences

  • The Approach

    • Candidate Concepts & Terms

  • Issues Related to Script & Transliteration

  • Semantic Issues

  • Structural Issues

  • Management Issues

    • Handling NTs, BTs, RTs


The approach l.jpg
The Approach increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Combining existing thesauri

    • Merging two or more existing thesauri

    • Linking existing thesauri to each other

  • Translating an existing thesaurus into one or more other languages

  • Building a new thesaurus ‘bottom up’

    • Starting with one language and adding another language or languages

    • Starting with more than one language simultaneously


The approach18 l.jpg
The Approach increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • The candidate terms:

    • The corpus;

      Both print-on-paper and electronic sources; E.g.,

      1)Cologne online Tamil lexicon. [Based on Tamil Lexicon and supplement, 1924-1939]. http://webapps.uni-koeln.de/tamil/ (COTL)

      2)Commemorative bibliography of the first 1008 books published by the South India Saiva Siddhanta Works Publishing Society / By S.R. Ranganathan and R. Muthukumaraswamy. Tirunelveli: The Society; 1961.

      3)Periya puranam: a Tamil classic on the great Saiva Saints of South India / By Sekkizhaar. Condensed English version by G. Vanmikanathan and N. Mahalingam. Madras: Sri Ramakrishna Math; [1985].

      4) Sub-forms of Tamil poetry and their classification / By S.R. Ranganathan and V.Thillainayagam. Annals of Library Science, 10(3); 1963; 175-185

      5)WordNet 2.1 (online)

      6) Murugan, V. (200). Tolkappiam in English: Translation with the Tamil text translileration in the Roman script, Introduction, glossary and illustrations / Project Director; Dr. G. John Samuel. Chennai: Institute of Asian Studies. ISBN 81-87892-05-6.

      7) Tamil lexicon (1924-1939). Published under the authority of the University of Madras. Reprint 1982. v.I-VI + Supplement.

      8) Thillainayagam, V. (1978). The cultural heritage of the Tamils: Library studies. Madras Institute of Tamil Studies, Seminar on Cultural Heritage of Tamils, 25-27 February 1978; p. 292-333. Also published in Pulamai, v.4, No.3-4; July-September 1978; p.253-299.


The approach contd l.jpg
The Approach [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • To Create records in an alphabetical fashion (from a to z); This was found to be tedious;

  • The terms in the corpus were grouped into broad categories – based on Basic Classes of C.C.

  • The thesaurus is being maintained as a database (using WINISIS)


The approach contd20 l.jpg
The Approach [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Candidate concepts

    • Titles of Classics

      • Quasi classes; have attracted other works upon themselves;


Slide21 l.jpg

  • The Approach increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

    • Candidate Concepts & Terms

  • Issues Related to Script & Transliteration

  • Semantic Issues

  • Structural Issues

  • Management Issues

    • Handling NTs, BTs, RTs


Script transliteration l.jpg
Script & Transliteration increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Terms entered in the Roman script using the COTL scheme for transliteration (This is used by the Tamil Lexicon)

    • Supports automatic conversion to Tamil script

  • Records will eventually be in Tamil script


Issues23 l.jpg
Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • The Approach

    • Candidate Concepts & Terms

  • Issues Related to Script & Transliteration

  • Semantic Issues

  • Structural Issues

  • Management Issues

    • Handling NTs, BTs, RTs


Semantic issues l.jpg
Semantic Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Equivalence

    • Within the Language

      • A large number of synonyms in Tamil

    • Across Languages

      • Concepts unique to a culture (and so to the language); Non-Availability of terms in English for a large number of concepts

        • Near equivalent concepts

        • Use the original term


Semantic issues contd l.jpg
Semantic Issues [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

Example

  • tAmarai (lotus)

    • mirunALam (Stalk of the Lotus)

      • tAmaraimuL (thorny portion of the stalk of the lotus)


Slide26 l.jpg

Search Term increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

No. of Records

tAmarai

327 Entries with tAmarai as entry word or in the explanation

kamalam

36 entries with kamalam as entry word or in the explanation

Lotus

309 entries with Lotus as entry word or in the explanation

Multiplicity of Synonyms

  • tAmarai – 82 synonyms in Tamil


Semantic issues contd27 l.jpg
Semantic Issues [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • cAttunARRu = Young plants planted in place of the dead ones

  • aSTAgkaputti = Eight Kinds of Knowledge

  • cARvAkam = cAruvAka’s materialistic philosophy which says perception is the only source of knowledge


Semantic issues contd28 l.jpg
Semantic Issues [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Homographs

    • tAmarai = Lotus plant; Lotus flower; Lotus as a shape (entities in the shape of a lotus); Lotus-like properties (e.g., soft like lotus petals)

    • appu = Thigh; Father; Loan; Debt; Domestic male servant; Water; Trumpet tree; Sixth division of day

    • May also have to do with the evolution in the meaning and connotation of terms in Tamil

      • kurinchi, mullai, marutam, neitl, and palai


Semantic issues contd29 l.jpg
Semantic Issues [Contd.] increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Homographs

    • Elam (spice)

      • SN Cardamom plant, elettaria cardamomum; cardamom

      • UF ilAjncali (spice)

      • UF ilAjnci (spice)

      • UF kALintam (spice)

      • UF kaNmali (spice)

      • BT tAparavastu (plant)

      • BT2 tAparanUl (botany)

    • IlAjncali (spice)

      • Use Elam (spice)


Homographs l.jpg
Homographs increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • The real meaning is to be understood in the context; Extensive use of Role Operators.

    Examples:

  • iTimpam (baby); iTimpam (castor); iTimpam (egg); iTimpam (misery); iTimpam (spleen)

    • Inverted Index will help users to select appropriate search term


Issues32 l.jpg
Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • The Approach

    • Candidate Concepts & Terms

  • Issues Related to Script & Transliteration

  • Semantic Issues

  • Structural Issues

  • Management Issues

    • Handling NTs, BTs, RTs


Structural issues l.jpg
Structural Issues increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

  • Hierarchy

    • Difficulties in developing corresponding hierarchies in two languages

    • Large Number NTs

      • Alternative Ways of Managing

  • Associative Relations

    • Links to Online lexical tools


Slide34 l.jpg

tAmarai increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information…

mirunALam

tAmaraimuL

Lotus

Stalk of the Lotus

thorny portion of the stalk of the lotus