1 / 25

Interlingual word mapping

MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay. Interlingual word mapping. Motivation Introduction

kira
Download Presentation

Interlingual word mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay Interlingual word mapping

  2. Motivation Introduction Introduction to Transliteration Syllables and their structure types Sonority Theory Relation between Sonority and Syllables What is Schwa? A Sonority theory based Syllabification module Results obtained References Presentation pathway

  3. Language – an integral part of society • Each has its specific structure and rules • Some basic concepts common to all • Helpful in processes like transliteration ultimately leading to better CLIR. • We are trying to exploit them for process of syllabification Motivation

  4. “To study some Phonological similarities between English, Hindi and Marathi and exploit them in order to achieve the goal of transliteration with high accuracy so as to be able to tackle problems like OOV words during Cross-Lingual Information Retrieval.” Problem statement

  5. Concepts being emphasized • Transliteration • Theory of Syllables • Sonority Theory • Their relation • Theory of Schwa & Schwa deletion • Mainly based on the properties of Sound • Driving force behind word pronunciation in any language introduction

  6. A process of phonetically “translating” named entities like proper nouns from a source language to a target language.[1] • The process of transliteration should be as accurate as possible. • Faces the problem of multiple variants of words. Introduction to transliteraton

  7. Proposed transliteration model

  8. “Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.” • Vowels are the heart of a syllable(Most Sonorous Element) • Consonants act as sounds attached to vowels. Basic of syllables

  9. A syllable consists of 3 major parts:- • Onset (C) • Nucleus (V) • Coda (C) • Vowels sit in the Nucleus of a syllable • Consonants may get attached as Onset or Coda. • Basic structure - CV Syllable structure

  10. The Nucleus is always present • Onset and Coda may be absent • Possible structures • V • CV • VC • CVC Possible syllable structures

  11. Prominence Theory • E.g. entertaining /entəteɪnɪŋ/ • The peaks of prominence: vowels /e ə eɪ ɪ/ • Number of syllables: 4 • Chest Pulse Theory • Based on muscular activities • Sonority Theory • Based on relative soundness of segment within words syllable theories

  12. “The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.” • Languages have sounds associated with them • Some sounds are more sonorous • Words in a language can be divided into syllables • Sonority theory distinguishes syllables on the basis of sounds. Introduction to sonority theory

  13. Defined on the basis of amount of sound associated • The sonority hierarchy is as follows:- • Vowels (a, e, i, o, u) • Liquids (y, r, l, v) • Nasals (n, m) • Fricatives (s, z, f,…..sh, th etc.) • Affricates (ch, j) • Stops (b, d, g, p, t, k) Sonority hierarchy

  14. Obstruents can be further classified into:- • Fricatives • Affricates • Stops Sonority scale

  15. “A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.” • Represented as waves of sonority or Sonority Profile of that syllable Nucleus Onset Coda Sonority theory & syllables

  16. “The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.” Peak (Nucleus) Onset Coda Sonority sequencing principle

  17. ABHIJEET • Sonority Profile 1 A I E E H J B T • Sonority Profile 2 A I E E H J B T examples

  18. “The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language-Specific Conditions.” • Determines underlying syllable division • Example • DIPLOMA DIP LO MA & DI PLO MA Maximal onset principle

  19. First alphabet of IAL – {a} • Unstressed and Toneless neutral vowel • Sanskrit is phonetically perfect – no neutral vowels • Hindi, Bengali etc. allow schwa to be neutral • Some schwas deleted and some are not • Schwa deletion – important issue for grapheme to phoneme conversion The concept of schwa

  20. Saphalya and Amantrana Priya and Tritiya Kavya and Ashva Badhai Samuha and Chehara Badara and Kalama Kalama and Banda Schwa deletion contexts

  21. Developed completely in Java • Platform independent • Tries to perform syllabification of words • Rides on the concepts of Sonority theory – mainly sonority sequencing principle • Makes use of Java’s Hashmap utility to save execution time. A sonority-theoretic model

  22. Consists of three major functions:- • SonorityHierarchy() • syllabify(String word) • accuracy() • Delete_schwa() [Under Development] • Stores and references the Sonority hierarchy from the hashmap • Tries to find the syllable boundaries according to their sonority profile • Tries to delete schwas present in the input Technical overview

  23. Syllabification and PRR generation modules implemented • Number of manually syllabified words – 27614 • No. of words fed as input – 27614 • No. of words correctly syllabified – 26253 • Accuracy obtained – 95.86 % for English and about 70% for Hindi • Accuracy of Schwa deletion in English – 77% • Schwa deletion for Hindi is under developement results

  24. Problems faced • First rule-based implementation failed • Some specific consonant and vowel clusters still result in erroneous syllabification • Future work • Schwa deletion for Hindi and Marathi • Implementation of Maximal Onset First principle • Packaging the above implementation in a stable transliteration module to be used further in CLIR Problems and future work

  25. Giegerich, H. J. 1992. English Phonology. An Introduction. Kahn, Daniel. 1976. Syllable-based generalizations in English phonology. Lass, Roger.Phonology: An Introduction to Basic Concepts. Cambridge University Press, 1984 References

More Related