110 likes | 187 Views
Explore how ADIOS, an automatic distillation of structure algorithm, can learn complex syntax, generate novel grammatical sentences, and aid in various fields. The Mex criterion and training scripts are used to convert real corpus data into readable formats for the algorithm. Precision and recall metrics show promising results, with references to influential works in the field.
E N D
Artificial Intelligence • Computational modelling of Grammar Acquisition Rishabh Nigam ShubhdeepKochhar
The Problem • Computational framework for Grammar Acuisition • Unsupervised Learning from a real corpus • Why the problem • Algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
The Algorithm ADIOS – Automatic Distillation Of Structure What it does : The Mex Criterion : It uses the M[i,j]=PR[i,j] or PL[i,j] , this 2d matrix is then searched for steep decrease in PR[i,j] and PL[i,j] indicating a possibility of Equivalence classes in between them
Codes Used MEX criterion Training scripts Generating Scripts --> Edelman and Zach Solan made these codes available .
Work done so far Converted the CHILDES database, HINDI database(WORDNET) into format readable by the ADIOS algorithm. Ran the algorithm on the CHILDES database and HINDI database. Had a brief correspondence with ShushobhanNayak and we ran the algorithm on his database of small commentary.
Running on the CHILDES database E6478 {we,you,youse} P6479 (I,think) 0.0068258047 1 2 201 P6480 (E6481,you,are) 0 1 3 18 E6481 {there,here} P6482 (who,E6466) 0.0039371848 1 2 36 P6483 (P6439,P6434,Emily) 0 0.33333334 5.4000001 4 P6484 (he,is,E6485) 0.0058915019 1 3 28 E6485 {.,here} P6486 (are,we,P6402) 0.0043362379 1 4 10 P6487 (wait,to,E6488,E6489) 6.1452389e-05 0.5 4 15 E6488 {we,you} E6489 {hear,see} For eg E6481 you are --> There you are and here you are --< sentences in the corpus used
Running it on Hindi Database ID seq p-value gen lenocc P3487 (भी,प्रचलित) 0.0042799711 1 2 5 P3488 (के,E3489,भागों) 0 1 3 11 E3489 {विभिन्न,मुलायम} P3490 (E3491,की,भाषा,में) 1.9848347e-05 1 4 6 E3491 {विज्ञान,बोल-चाल} P3492 (में,E3493,लेप,करने,से) 0.0037000179 1 5 4 E3493 {मिला,घोलकर,पीसकर} P3494 (विष,नष्ट,होता) 0.001850009 1 3 4 P3495 (समान,भाग) 0.0059099197 1 2 26
Running on the Commentary P447 (the,E448,square) 0 1 3 65 E448 {large,big} P449 (big,square) 7.212162e-05 1 2 38 P450 (the,little,E451) 0 1 3 49 E451 {circle,square} P452 (the,big,box) 0 1 3 34 E455 {opens,closes,enters} P456 (the,E457) 0.0055941939 1 2 91 E457 {bottom,corner,door,entrance} P458 (P449,E459,the) 0 1 4 4 E459 {leaves,closes,enters} E461 {--,inside,and,leaves,left,opens,closes,enters}
Precision And Recall Precision - the proportion of Clearner sentences accepted by the Teacher Recall - the proportion of Ctarget sentences accepted by the Learner Values found around 0.6 precision and 0.5 recall
References [1] Heider. Waterfall ,Ben Sandbank,LucaOnnis and Shimon Edelman , An empirical generativeframework for computational modeling of language acquisition* : Cambridge University Press 2010 [2] Zach Solan PHD thesis under Professor David Horn ,Professor Shimon Edelman, and Professor EytanRuppin , AVIV university