1 / 28

Atanas Georgiev Chanev

Atanas Georgiev Chanev. PhD student in Cognitive Sciences and Education, Univeristy of Trento Bachelor’s: FMI, University of Plovdiv, Bulgaria. A PP-Attachment Conundrum for Bulgarian.

Download Presentation

Atanas Georgiev Chanev

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Atanas Georgiev Chanev PhD student in Cognitive Sciences and Education, Univeristy of Trento Bachelor’s: FMI, University of Plovdiv, Bulgaria

  2. A PP-Attachment Conundrum for Bulgarian Based on the parser I have implemented (an extension of the Earley-Stolcke’s algorithm) and the results I have obtained a.k.a. the diploma work for my bachelor’s

  3. Note: I won’t discuss algorithms dealing with PP attachments. I’ll show that my approach fails to resolve PP Attachment ambiguities in most of the cases

  4. Contents: The problem The prerequisites The algorithm The grammar The results The PP-Attachment problem Future Work Acknowledgements Slides: 28

  5. The Problem: Parsing natural languages (Bulgarian) Shallow parsingVs.Full parsing

  6. What Is Syntax? POS tagging ? Phrase Structures Grammatical Relations Grammatical Functions ?

  7. Constituent Structures: Rules like: S-> NP VP; NP-> NP PP … Problem: Ambiguity An approach for resolving ambiguity in Bulgarian – Tanev’ 2001

  8. The Prerequisites: Morphological Processor for Bulgarian (Krushkov’ 97) POS Tagger (Tachev’ 2001) … I have encapsulated the Grammar in a separate section

  9. The Algorithm: Three steps: Predictor Scanner Completer The Earley Algorithm (Earley’ 70)

  10. Stolcke’s extension: Each rule is assigned two types of probabilities: inner probability and forward probability. They are calculated differently in each step (predictor, scanner, completer) Stochastic extension – Capable of solving ambiguities (Stolcke’ 93)

  11. Shallow trees are better? The deep trees do always have smaller probabilities!

  12. + Basic Unification: Basic Unification mechanism, based on agreement constraints A full unification as described in (Jurafsky, Martin’ 2001) performing at each step is too ineffective

  13. The Grammar: Two versions of the grammar, collected from a mini corpus of sentences in the newspaper articles register

  14. The mini corpus: 5331 Tokens > 450 sentences Grammatically and syntactically annotated

  15. The PPs: Two types of PPs: Modifying the verb – AdvPs Modifying the noun – PPs

  16. POS tags: Shte – future tense auxiliary or particle Govoreshtiqt – verb or adjective As in ‘govoreshtiqt student’

  17. How to Assign Probabilities to the Rules:

  18. The Results: Precision: 42.42% Recall: 66.00% F-measure: 51.65%

  19. How do I define precision and recall: Precision = (The number of correctly parsed sentences)/(The number of sentences given any predictions) Recall = (The number of sentences given any predictions)/(The number of tested sentences)

  20. The PP-Attachment Problem (+the next 4 slides): How many of the correctly parsed sentences contain PPs? How many of the mistaken sentences contain PPs?

  21. A considerable Amount of mistaken PPs: • BUT: • Sentences, which are not given any prediction contain PPs • AdvPs sometimes are not ambiguos – e.g. at the beginning of the sentence How many of the correctly parsed sentences contain PPs? 27.59% How many of the mistaken sentences contain PPs? 32.61%

  22. A bad parse:

  23. Another bad parse:

  24. A good parse:

  25. Conclusion: Stochastic Context-Free Grammars are not powerful enough to deal with the PP-Attachment Problem. (or at least using this approach)

  26. Future Work: Clause Splitter for Bulgarian A better grammar = a better corpus A better unification processor Semantic Constraints

  27. Acknowledgements: [1] Крушков, Хр., Моделиране и изграждане на машинни речници и морфологични процесори, Пловдив, Дисертация за присъждане на образователна и научна степен “Доктор”, ПУ "П.Хилендарски", Пловдив, 1997. [2]Тачев, Г. Стохастичен маркировчик на частите на речта, Дипломна работа, ПУ "П.Хилендарски", Пловдив, 2001. [3] Танев, Хр., Автоматичен анализ на текстове и решаване на многозначности в българския език, Дисертация за присъждане на образователна и научна степен “Доктор”, ПУ "П.Хилендарски", Пловдив, 2001. [4] Earley, J., An Efficient Context-free Parsing Algorithm, Communications of the ACM, 6(8):451-455, 1970. [5] Stolcke, A., An Efficient Probabilistic Context-free Parsing Algorithm That Computes Prefix Probabilities, Technical Report TR-93-065, International Computer Science Institute, Berkeley, CA, 1993. Revised 1994. [6] Jurafsky, D., Martin J. H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics,and Speech Recognition, Prentice Hall, New Jersey, 2001.

  28. Thank You!

More Related