1 / 27

CS626 Assignments

CS626 Assignments. Abhirut Gupta Mandar Joshi Piyush Dungarwal (Group No. 6). Trigram Implementation Details. Trigram probabilities were smoothened using bigram and unigram probabilities. Lambdas were calculated using deleted interpolation.

brosh
Download Presentation

CS626 Assignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS626 Assignments Abhirut Gupta Mandar Joshi Piyush Dungarwal (Group No. 6)

  2. Trigram Implementation Details • Trigram probabilities were smoothened using bigram and unigram probabilities. • Lambdas were calculated using deleted interpolation. • Outside Viterbi, Capitalization was used to improve the accuracy of NP0 tag.

  3. Unknown word handling • Unknown words were given small emission probabilities. • Considerable experimentation was done to fix the value of this emission probability.

  4. Trigram Accuracy

  5. Confusion Matrix • CJS : accuracy = 87.43 % • Confused with : PRP = 9.35 % • Eg. “As” CJS in “diagnosed as being HIV positive” PRP in “give AIDS greater recognition not as a disease” • VVB : accuracy = 70.72 % • Confused with : VVI = 12.97 % • Eg. Forget, send, live, return, etc.

  6. ConfusionMatrix • AV0 : accuracy = 87.27 % • Confused with : PRP = 4.67 % • VVD : accuracy = 88.98 % • Confused with : VVN = 6.49 % • Eg. Sent, lived, returned, etc.

  7. Insight • Nouns and verbs are often confused because, • Most verbs can also be used as nouns. • E.g. laugh, people, etc. • Adverbs (which often precede verbs) can also be used as adjectives (which often precede nouns) • E.g. fast, slow, etc.

  8. Next word prediction (W3, T3)* = argmax (W3, T3) (P(W3, T3 | W2, W1, T2, T1)) = argmax (W3, T3) (P(T3| W2, W1, T2, T1) * P(W3 | W2, W1, T3, T2,T1)) = argmax (W3, T3) (P(T3| T2, T1) * P(W3 | W2, W1, T3))

  9. Next word prediction (alternate model) • W3* = argmaxW3 P(W3 | W1, W2, T1, T2) • = argmaxW3 ΣT3 P(W3, T3| W1, W2, T1, T2) • = argmaxW3 ΣT3 P(T3| W1, W2, T1, T2).P(W3 | W1, W2, T1, T2, T3) • =argmaxW3 ΣT3 P(T3| T1, T2).P(W3 | W1, W2, T3) • Drawbacks: • Extremely slow without any considerable improvement in accuracy

  10. Next word prediction

  11. Generative vs. Discriminative (Bigram)

  12. Generative vs. Discriminative • Generative model possibly gives better accuracy than Discriminative model because, we model Transition and Lexical probability separately in the former model. • Discriminative approach: • T* = argmax(T)Π P( ti | ti-1, wi-1)

  13. A* Results 1 Trigram Viterbi after applying Capitalization

  14. Node Expansions

  15. In Trigram Viterbi implementation, Capitalization is applied outside the algorithm. • This results in certain words being tagged differently by two algorithms. • From the graph, A* expands fewer nodes than Viterbi. However statistics for node expansion do not take into account the comparisons made in the Viterbi algorithm

  16. Beam search algorithm • An upper bound is maintained on the size of Open list of nodes • Results for size = 50: • Reduction in execution time : 56.62% • Accuracy : 78.28 % • Results for size = 25: • Reduction in execution time : 76.50% • Accuracy : 63.24 %

  17. Projection of Parse Trees

  18. Our Approach • From the English parse tree, we obtain the Noun Phrases. • Using an English-Hindi dictionary and the given translation, find corresponding groups in the Hindi sentence, and replace the English groups by the Hindi ones.

  19. Replace the remaining words by corresponding Hindi words from the sentence. • Now, apply hard-coded inversion rules to the tree. • Example – PP -> P NP in english is inverted as PP -> NP P in hindi

  20. S VP PP NP VP MD IN NP PP NP By NN must DT NNS IN NP VB NP law get all of JJ NN a proper full time education children NN compulsory school age

  21. S VP PP NP VP MD IN NP PP NP By NN must DT NNS IN NP VB NP kaanoon get sabhi of JJ NN pure samay ki shiksha bacchon NN anivarya School jaane ki umar

  22. S VP PP NP VP MD IN NP PP NP ke anusaar NN honi chahiye DT NNS IN NP VB NP kaanoon praapt sabhi ke JJ NN pure samay ki shiksha bacchon ko NN anivarya school jaane ki umar

  23. S VP PP NP MD VP NP IN NP PP honi chahiye NP NN ke anusaar IN NP VB NNS DT NN JJ NN praapt kaanoon pure samay ki shiksha ke sabhi bacchon ko anivarya umar school jaane ki

  24. Challenges • Translated sentences may have a different structure altogether. • It may not be possible to detect similar phrases. Example - • आपका बच्चा ड्रग्स या सॉल्वैंट्स लेने के चक्कर में क्यों पड़ गया है, यदि आप उसके कारणों को समझते हैं, तो इस विषय में अपने बच्चों से बात करना आपके लिए काफ़ी आसान होगा । • If you understand the reasons why a child can get involved with drugs and solvents, it's much easier for you to talk to your children about it.

  25. Drawbacks • Parse trees produced may be “approximate” • Due to morphological differences it may become difficult to produce tags for extra words created in transtaltion. Example – “ko”, “jaane ki” produced in the previous example.

  26. Yago • Yago fact table was queried to find relevant subject-predicate-object pairs. • BFS was used to find path from start node to the goal node. • This approach gives all paths of minimum length.

  27. Yago • Interesting relations: • Sachin Tendulkar and Rahul Dravid have both won the Arjuna Award. • Dev Anand was born in Mumbai, died in London. • Rohit Sharma and Vikram Pandit were born in Nagpur.

More Related