Entropy in NLP. Presented by Avishek Dan (113050011) Lahari Poddar (113050029) Aishwarya Ganesan (113050042) Guided by Dr. Pushpak Bhattacharyya. Motivation.
Avishek Dan (113050011)
Dr. Pushpak Bhattacharyya
Aoccdrnig to rseearch at an Elingshuinervtisy, it deosn'tmttaer in wahtoredr the ltteers in a wrod are, the olnyiprmoatnttihng is that the frist and lsatltteer is at the rghitpclae. The rset can be a toatlmses and you can sitllraed it wouthit a porbelm. Tihs is bcuseae we do not raederveylteter by it slef but the wrod as a wlohe.
Source: Shannon “Prediction and Entropy of Printed English”
A language model computes either:
P(W) = P(w1,w2,w3,w4,w5…wn)
A good language model gives a high probability to real English
H(w/h) = - log(m(w/h))
H : set of possible word and tag contexts(histories)
T : set of allowable tags
Hs- entropy when words are random , H- entropy when words are ordered
Source: Universal Entropy of Word Ordering Across Linguistic Families
Source: Entropy, the Indus Script, and Language
Rajesh P. N. Rao, NishaYadav, Mayank N. Vahia, HrishikeshJoglekar, RonojoyAdhikari, IravathamMahadevan, Entropy, the Indus Script, and Language: A Reply to R. Sproat