1 / 16

Formal linguistics & the language-technology interface

Formal linguistics & the language-technology interface. Probal Dasgupta IIITH Workshop 8 July 2014. Outside the Box 1A.

rollin
Download Presentation

Formal linguistics & the language-technology interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formal linguistics & the language-technology interface Probal Dasgupta IIITH Workshop 8 July 2014

  2. Outside the Box 1A • Problem awaiting solution: Some Bangla agent nouns have masculine and feminine forms – SikkhOk ~ Sikkhika ‘teacher’, likewise oddhapika ‘professor’, lekhika ‘author’, gayika ‘singer’, nayika ‘heroine’, poricarika ‘maid’, kortri ‘mistress’, netri ‘leader’, obhinetri ‘actor’, dhatri ‘wetnurse’ – but some don’t: no feminine for probOrtok ‘initiator’, nibOrtok ‘preventer’, khadok ‘eater’, SoSok ‘exploiter’, prerOk ‘sender’, prapok ‘recipient’, dOrSok ‘spectator’, droSTa ‘seer’, srOSTa ‘creator’, srota ‘listener’, bOkta ‘orator’, upobhokta ‘consumer’. • Intriguingly, Esperanto –an easy-to-learn language at the glossa/techne interface–shows a neat binary here.

  3. Outside the Box 1B • In the artificial language with the largest number of proficient speakers, Esperanto, there is a set of facts that matches this contrast within Bangla. Hardly a coincidence. • The m/f pairs that do work in Bangla match the Esp feminines instruistino, lekciistino, verkistino, kantistino, ĉefaktorino, servistino, mastrino, gvidantino, aktorino, vartistino.

  4. Outside the Box 1C The Bangla agent nouns not having a fem match the Esperanto words iniciatanto, malhelpanto, manĝanto, ekspluatanto, sendanto, ricevanto, spektanto, vidanto, kreanto, aŭdanto, parolanto, konsumanto – which can add -in-, but only if you're making a contextual point. Note that 1st set uses -ist-, the profession affix, while the 2nd set uses the participial affix -ant-. (Counterexamples in the 1st set explainable.)

  5. Outside the Box 1D • Patterns of lexical viability in Esperanto are known to reflect conceptually significant principles. It is intriguing that the agent nouns that permit a feminine in Bangla come out as profession nouns in Esperanto, while the ones that prohibit a feminine in Bangla turn out to fall back on the participial base line. This brings us one step closer to a solution. Ideas for a 2nd step will come from B.Tech. wizards!

  6. Outside the Box 2A • In the second part of my session I introduce you to Word Formation Strategies such as: (1) [X]V [postX]V (2) [X]V [priX]V

  7. Outside the Box 2B (1) a. Li postkuris vin 'he pursued you' (1)b. Li kuris post vi 'he ran after you' (2)a. Prodip priskribis la domon 'Prodip described the house' (2)b. Prodip skribis pri la domo 'Prodip wrote about the house'

  8. Outside the Box 2C (3)a. Li postdancis ŝin sur la trotuaron 'He afterdanced her on to the pavement' b. ??Li postdancis ŝin en la vespera serio de soldancantoj 'He afterdanced her in the evening sequence of solo dancers' c. *Li postdancis la tertremon 'He afterdanced the earthquake'

  9. Outside the Box 2D (4)a. Li dancis post ŝi sur la trotuaron 'He danced after her on to the pavement' b. Li dancis post ŝi en la vespera serio de soldancantoj 'He danced after her in the evening sequence of solo dancers' c. Li dancis post la tertremo 'He danced after the earthquake'

  10. Outside the Box 2E • Words are specific sites of putting sound and meaning together • They contrast with phrases even in a language whose speakers maximize transparency and compositionality • This raises questions about technical tools • And tools in Sanskrit that rightly inspire us

  11. Outside the Box 2F • The ancient, innovative Indian who chewed on this material, Bhartrihari, broke the bounds of the sentence box, into discourse • He was talking, across 1000 years, to Panini's uncle DaakshaayaNa a.k.a. VyaaDi • Word Formation Strategies are Bhartrihari-inspired, devised by R. Singh (1943-2012)

  12. Outside the Box 3A • Formal utility of Bhartrihari: the Strategy Shadow Theorem deS 'country’ deSantor ‘another country’ gram ‘village’ gramantor ‘another village’ *deSantorantor ‘another other country’ *gramantorantor ‘another other village’

  13. Outside the Box 3B onno EkTa deS/gram 'another country/village' other one country/village vs: onno EkTa onno EkTa deS/gram other one other one country/village ‘another other country/village’ Syntax allows it; word formation does not! This is the Strategy Shadow Effect.

  14. Outside the Box 3C • The Strategy Shadow Theorem: • From X-Wala you can't get X-WalaWala • From X-antor you can't get X-antorantor • This is a theorem, because a strategy is a toggle switch. Bischematic formalisms affix stuff, from left to right, or subtract it, from right to left; they cannot affix the stuff twice

  15. Outside the Box 3D • The data of Esperanto played a significant role during the incubation of the Strategy Shadow Theorem, and later during its confirmation and refinement, a process that is still going on. • Substantivist research, rooted in classical Indian formal linguistics, spreads its wings in the technology-laden sky of the constructed language Esperanto, which pushes human language to its freedom-maximizing edge.

  16. Outside the Box 3E • The logic of the constructed analytical language Esperanto can usefully guide the construction of analyses of data in the spontaneously formed ethnic languages that we speak as our mother tongues. • From Sager through BSO to CALTS/IPDA/IIIT. Esperanto as an MT interlingua in the DLT project; we can extend those results in our NLP inquiry today, in India.

More Related