1 / 30

REFERENTIAL CHOICE: FACTORS AND MODELING

REFERENTIAL CHOICE: FACTORS AND MODELING. Andrej A. Kibrik , Mariya V. Khudyakova, Grigoriy B. Dobrov , and Anastasia S. Linnik aakibrik@gmail.com Night Whites SPb February 28, 2014. Referential choice in discourse.

sol
Download Presentation

REFERENTIAL CHOICE: FACTORS AND MODELING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova,Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night WhitesSPb February 28, 2014

  2. Referential choice in discourse • When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including: • Full noun phrase • Proper name (e.g. Peter) • Description = common noun (with or without modifiers)(e.g. the tzar) • Mix: Peter the Great • Reduced NP, particularly a third person pronoun (e.g. he) 2 2

  3. Example Description Proper name Pronoun • The Victorian housethatMs. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window. Once inside, she spends nearly four hours Ø measuring and diagramming each room in the 80-year-old house, Ø gathering enough information to Ø estimate what it would cost to rebuild it. She snaps photos of the buckled floors and the plaster that has fallen away from the walls. Zero

  4. Research question • How is referential choice made?

  5. Why is this question important? • Reference is among the most basic cognitive operations performed by language users • Reference constitutes a lion’s share of all information in natural communication • Consider text manipulation according to the method of Biber et al. 1999: 230-232

  6. Referential expressions marked in green • The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost throughan open first-floor window.

  7. Referential expressions removed • The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn • to give her a boost throughan open first-floor window.

  8. Referential expressions kept • The Victorian house that Ms. Johnson is inspectinghas been deemed unsafeby town officials. Butsheasksa workman toting the bricks from the lawnto givehera boostthroughan open first-floor window.

  9. Types of referential devices: levels of granularity We mostly concentrate on the two upper levels in this hierarchy ◘╕ REG tradition: most attention to varieties of descriptive full NPs

  10. Multi-factorial character of referential choice • Multiple factors of referential choice • Distance to antecedent • Along the linear discourse structure (Givón) • Along the hierarchical discourse structure (Fox, Kibrik) • Antecedent role (Centering theory) • Referent animacy (Dahl) • Protagonisthood (Grimes) ......................................... Properties of the discourse context Properties of the referent 10 10

  11. Cognitive multi-factorial model of referential choice Discourse context Referent activation in working memory Referential choice Referent’s properties Factors of referential choice

  12. Rhetorical distance • Distance along the hierarchical discourse structure between • the current point in discourse, where referential choice is to be made • the antecedent • Measured in elementary discourse units • roughly equaling clauses • Rhetorical structure theory by Mann and Thompson (RST) • Very important factor • RST Discourse Treebank corpus (Marcu et al.)

  13. Example of a rhetorical graph from RST Discourse Treebank

  14. RefRhet and MoRA • RST Discourse Treebank + our annotation = RefRhet corpus • Subcorpus RefRhet 3 (2013-2014) • Annotation scheme MoRA(Moscow Referential Annotation)

  15. RefRhet 3 • 64 texts • 6294 markables • 1852 anaphor-antecedent pairs • 475 pronouns • 1377 full NPs • 706 descriptions • 671 proper names

  16. Candidate factors of ref. choice Some other are computed automatically ╕◘ Some values are drawn from MoRA annotation Factor-predicted variable Discourse context

  17. Windows of the MMAX2 program

  18. Some properties of the MoRA scheme • Wide range of activation factors and their values • E.g. multiple values of the “grammatical role” factor • Annotation of groups • complex markables serving as antecedents • and-coordinate • or-coordinate • prepositional (children with their parents) • discontinuous

  19. A discontinuous group

  20. Tasks for machine learning • Candidate factors: • All potential parameters implemented in corpus annotation • Factor-predicted variable: • Form of referential expression (np_form) • Two-way task: • Full NP vs. pronoun • Three-way task: • Definite description vs. proper name vs. pronoun • Accuracy maximization: • Ratio of correct predictions to the overall number of instances

  21. Machine learning methods (Weka, a data mining system) • Logical algorithms • Decision trees (C4.5) • Decision rules (JRip) • Logistic regression • Compositions • Boosting • Bagging • Quality control – the cross-validation method 21 21

  22. Results of machine learning on RefRhet 3 and MoRA

  23. Non-categorical referential choice (Kibrik 1999) min Referent activation max Cognitive plane: graded variable Linguistic plane: binary variable

  24. Non-categorical referential choice • In many instances, more than one referential options can be used • Referential choice is less than fully categorical (cf. Belz & Varges 2007, van Deemter et al. 2012: 173–179) • In the intermediate activation instances both the original text author and the algorithm: • more or less randomly make a categorical decision at the linguistic plane • those decisions do not have to always coincide • Therefore, no model can predict the actual referential choice with 100% accuracy

  25. Experiment: Understanding (allegedly non-categorical) referential expressions 9 texts, in which the algorithms have diverged in their prediction from the original referential choice 9 original texts (proper name) and 9 altered texts (pronoun) distributed between 2 experimental lists 60 participants 1 experimental question + 2 control question If the instances of divergence are explained by intermediate referent activation, the accuracy in experimental questions should not be lower than the accuracy in control questions 25

  26. Experiment: results Control questions – 84% Questions to proper names – 84% Questions to pronouns – 75% If we exclude questions #2 and #5, then the accuracy for questions to pronouns is 80%, not differing significantly from control and PN questions In general, the algorithm diverges from the original in the places where that is acceptable, that is, referent activation is intermediate 26

  27. Non-categorical referential choice • Sometimes referential choice allows more than one option • A proper model of referential choice must account for this property of human speakers • Our modeling procedures actually conform to this requirement

  28. Further studies • Explore logistic regression’s ability to evaluate the certainty of prediction • and attempt to correlate that with the human’s assessment of non-categorical referential choice • as well as with the theoretical notion of intermediate referent activation • Cheap data modeling • Secondary referential options, such as demonstrative descriptions • Genres and referential choice

  29. Conclusions • Multi-factorial approach • Corpus large enough for machine-learning modeling • Results of prediction close to theoretical maximum • Account of the non-deterministic character of referential choice • This approach can be applied to a wide range of other linguistic choices

  30. Thank you for your attention

More Related