Computational Modeling of Grammar Acquisition Using AI Framework ADIOS

Artificial Intelligence • Computational modelling of Grammar Acquisition Rishabh Nigam ShubhdeepKochhar

The Problem • Computational framework for Grammar Acuisition • Unsupervised Learning from a real corpus • Why the problem • Algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

The Algorithm ADIOS – Automatic Distillation Of Structure What it does : The Mex Criterion : It uses the M[i,j]=PR[i,j] or PL[i,j] , this 2d matrix is then searched for steep decrease in PR[i,j] and PL[i,j] indicating a possibility of Equivalence classes in between them

Codes Used MEX criterion Training scripts Generating Scripts --> Edelman and Zach Solan made these codes available .

Work done so far Converted the CHILDES database, HINDI database(WORDNET) into format readable by the ADIOS algorithm. Ran the algorithm on the CHILDES database and HINDI database. Had a brief correspondence with ShushobhanNayak and we ran the algorithm on his database of small commentary.

Running on the CHILDES database E6478 {we,you,youse} P6479 (I,think) 0.0068258047 1 2 201 P6480 (E6481,you,are) 0 1 3 18 E6481 {there,here} P6482 (who,E6466) 0.0039371848 1 2 36 P6483 (P6439,P6434,Emily) 0 0.33333334 5.4000001 4 P6484 (he,is,E6485) 0.0058915019 1 3 28 E6485 {.,here} P6486 (are,we,P6402) 0.0043362379 1 4 10 P6487 (wait,to,E6488,E6489) 6.1452389e-05 0.5 4 15 E6488 {we,you} E6489 {hear,see} For eg E6481 you are --> There you are and here you are --< sentences in the corpus used

Running it on Hindi Database ID seq p-value gen lenocc P3487 (भी,प्रचलित) 0.0042799711 1 2 5 P3488 (के,E3489,भागों) 0 1 3 11 E3489 {विभिन्न,मुलायम} P3490 (E3491,की,भाषा,में) 1.9848347e-05 1 4 6 E3491 {विज्ञान,बोल-चाल} P3492 (में,E3493,लेप,करने,से) 0.0037000179 1 5 4 E3493 {मिला,घोलकर,पीसकर} P3494 (विष,नष्ट,होता) 0.001850009 1 3 4 P3495 (समान,भाग) 0.0059099197 1 2 26

Running on the Commentary P447 (the,E448,square) 0 1 3 65 E448 {large,big} P449 (big,square) 7.212162e-05 1 2 38 P450 (the,little,E451) 0 1 3 49 E451 {circle,square} P452 (the,big,box) 0 1 3 34 E455 {opens,closes,enters} P456 (the,E457) 0.0055941939 1 2 91 E457 {bottom,corner,door,entrance} P458 (P449,E459,the) 0 1 4 4 E459 {leaves,closes,enters} E461 {--,inside,and,leaves,left,opens,closes,enters}

Precision And Recall Precision - the proportion of Clearner sentences accepted by the Teacher Recall - the proportion of Ctarget sentences accepted by the Learner Values found around 0.6 precision and 0.5 recall

References [1] Heider. Waterfall ,Ben Sandbank,LucaOnnis and Shimon Edelman , An empirical generativeframework for computational modeling of language acquisition* : Cambridge University Press 2010 [2] Zach Solan PHD thesis under Professor David Horn ,Professor Shimon Edelman, and Professor EytanRuppin , AVIV university

Thank You

Computational Modeling of Grammar Acquisition Using AI Framework ADIOS

Computational Modeling of Grammar Acquisition Using AI Framework ADIOS

Presentation Transcript

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

ARTIFICIAL INTELLIGENCE