1 / 35

Are Distributional Dimensions Semantic Features?

Are Distributional Dimensions Semantic Features?. Katrin Erk University of Texas at Austin Meaning in Context Symposium München September 2015 Joint work with Gemma Boleda. Semantic features by example: Katz & Fodor.

Download Presentation

Are Distributional Dimensions Semantic Features?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Are Distributional Dimensions Semantic Features? Katrin Erk University of Texas at Austin Meaning in Context Symposium München September 2015 Joint work with Gemma Boleda

  2. Semantic features by example: Katz & Fodor Different meanings of a word characterized by lists of semantic features

  3. Semantic features • In linguistics: Katz&Fodor, Wierzbicka, Jackendoff, Bierwisch, Pustejovsky, Asher, … • In computational linguistics/AI: Schank, Wilks, Masterman, Sowa… Schank, Conceptual Dependencies “drink” in preference semantics (Wilks): ((*ANI SUBJ) (((FLOW STUFF) OBJE) (MOVE CAUSE))

  4. Semantic features: Characteristics • Primitive (not themselves defined), unanalyzable • Small set • Lexicalized in all languages • Combined, they characterize semantics of all lexical expressions in all languages • Precise, fixed meaning, which is not part of language. • Wilks: not so • Individually enable inferences • Feature lists or complex graphs Compiled from:Wierzbicka, Geeraerts, Schank

  5. Uses of semantic features • Event structure in the lexical semantics of verbs (Levin): • change-of-state verbs:[ [ x ACT] CAUSE [BECOME [y <result-state>]] • Handle polysemy (Pustejovsky, Asher) • Characterize selectional constraints (e.g. in VerbNet) • Characterize synonyms, also cross-linguistically (application: translation) • Enable inferences:John is a bachelor John is unmarried, John is a man

  6. Are distributional dimensions semantic features? believe-v 0.794065 american-a 2.245667 kill-v 1.946722 consider-v 0.047781 seem-v 0.410991 turn-v 0.919250 side-n 0.098926 serve-v 0.479459 involve-v 0.435661 report-v 0.483651 little-a 1.175299 big-a 1.468021 water-n 1.806485 attack-n 1.795050 much-a 0.011354 …. Alligator: Computed from UKWaC+Wikipedia + BNC + Gigaword, 2 word window, PPMI transform

  7. Are distributional dimensions semantic features? • [The] differences between vector space encoding and more familiar accounts of meaning is easy to exaggerate. For example, a vector space encoding is entirely compatible with the traditional doctrine that concepts are ‘bundles’ of semantic features. Indeed, the latter is a special case of the former, the difference being that […] semantic dimensions are allowed to be continuous. (Fodor and Lepore1999: All at Sea in Semantic Space) (About connectionism and particularly Churchland, not distributional models)

  8. Are distributional dimensions semantic features? • If so, they either address or inheritmethodological problems: • Coverage of a realistic vocabulary • Empirically determining semantic features • Meaning creep: Predicates used in CyC did not stay stable in their meaning over the years (Wilks 2008)

  9. Are distributional dimensions semantic features? • If so, they inherit theoretical problems • Lewis 1970: “Markerese” • Fodor et al 1980, Against Definitions; Fodor and Lepore 1999, All at Sea in Semantic Space • Asymmetry between words and primitives: • What makes the primitives more basic? • Also, how can people communicate if their semantic spaces differ?

  10. Outline • Differences between distributional dimensions and semantic features • Redefining the dichotomy • No dichotomy after all • Integrated inference

  11. Semantic features: Characteristics • Primitive (not themselves defined), unanalyzable • Small set • Lexicalized in all languages • Combined, they characterize semantics of all lexical expressions in all languages • Precise, fixed meaning, not part of language. • Individually enable inferences • Feature lists or complex graphs

  12. Neither primitive nor with a fixed meaning • Not unanalyzable: Any distributional feature can in principle be a distributional target • Compare: Target and dimensions as a graph (with similarity determined on the basis of random walks): d1 dd1 target d2 d3

  13. Neither primitive nor with a fixed meaning • But are they treated as unanalyzed in practice? • Features in vector usually not analyzed further • SVD, topic modeling, prediction-based models: • induce latent features • exploiting distributional properties of features • Are latent features unanalyzable? No, linked to original dimensions • No fixed meaning, distributional features can be ambiguous

  14. Then is it“Markerese”? • Inference = deriving something non-distributional from distributional representations • Inference from relation to other words • “X cause Y”, “Y trigger X” occur with similar X, Y, hence they are probably close in meaning • “alligator” appears in a subset of the contexts of “animal”, hence they are probably animals • Inference from co-occurrence with extralinguistic information • Distributional vectors linked to images for the same target • Alligators are similar to crocodiles, crocodiles are listed in the ontology as animals, hence alligators are probably animals

  15. No individual inferences • Distributional representation as a whole, in the aggregate, allows for inferences using aggregate techniques: • Distributional similarity • Distributional inclusion • Whole-vector mappings to visual vectors

  16. No individual inferences • Feature-based inference possible with “John Doe” features: • Take text representation • Take apart into features that are individually almost meaningless • Aggregate of such features allows for inferences

  17. Outline • Differences between distributional dimensions and semantic features • Redefining the dichotomy • No dichotomy after all • Integrated inference

  18. Redefining the dichotomy • Not semantic features versus distributional dimensions:Individual features versus aggregate features • Individual features: • Individually allow for inferences • May be relevant to grammar • Are introspectively salient • Not necessarily primitive • Also hypernyms and synonyms • Aggregate features • May be individually almost meaningless • Allow for aggregate inference • Two modes of inference: individual and aggregate

  19. Individual features in distributional representations • Some distributional dimensions can be cognitively relevant features • Thill et al 2014: Because distributional models focus on how words are frequently used, they point to how humans experience concepts • Freedom: (features from Baroni&Lenci 2010) • positive events: guarantee, secure, grant, defend, respect • negative events: undermine, deny, infringe on, violate

  20. Individual features in distributional representations • Approaches that find cognitively plausible features distributionally: • Almuhareb & Poesio 2004 • Cimiano & Wenderoth 2007 • Schulte imWalde et al 2008: German association norms • Baroni et al 2010: STRUDEL • Baroni & Lenci 2010: Distributional memory • Devereux et al 2010: dependency paths extracted from Wikipedia

  21. Individual features in distributional representations • Difficult: only small fraction of human-elicited features can be retrieved • Baroni et al 2010: Distributional features tend to be different from human-elicited features • preference for “‘actional’ and ‘situated’ descriptions” • motorcycle: • elicited: wheels, dangerous, engine, fast • distributional: ride, sidecar, park, road

  22. Outline • Differences between distributional dimensions and semantic features • Redefining the dichotomy • No dichotomy after all • Integrated inference

  23. Not a competition • Use both kinds of features! • Computational perspective: • Distributional features are great • learned automatically • enable many inferences • Human-defined semantic features are great • less noisy • enable inferences with more certainty • enable inferences that distributional models do not provide • How can we integrate the two?

  24. Speculation: Learning both individual and aggregate features • Learner makes use of features from textual environment • Some features almost meaningless, others more meaningful • Some of them relevant to grammar (CAUSE, BECOME) • Both meaningful and near-meaningless features enter aggregate inference • Only certain features allow individual inference • (Unclear: This should not be feature lists, there is structure! But where does that fit in this picture?)

  25. Outline • Differences between distributional dimensions and semantic features • Redefining the dichotomy • No dichotomy after all • Integrated inference

  26. Inferring individual features from aggregates • Johns and Jones 2012: • Compute weight of feature bird for nightingaleassummed similarity of nightingale to known birds • Fagarasan/Vecchi/Clark 2015: • Learn a mapping from distributional vectors to vectors of individual features • Herbelot/Vecchi 2015: • Learn a mapping from distributional space to “set-theoretic space”, vectors of quantified individual features (ALL apes are muscular, SOME apes live on coasts)

  27. Inferring individual features from aggregates • Gupta et al 2015: • Regression to learn properties of unknown cities/countries from those of known cities/countries • Snow/Jurafsky/Ng 2006: • Infer location of a word in the WordNet hierarchy using a distributional co-hyponymy classifier

  28. Individual features influencing aggregate representations • Andrews/Vigliocco/Vinson 2009, Roller/Schulte imWalde 2013: Topic modeling, including known individual features of words in the text • Faruqui et al 2015: Update vector representation to better match known synonymy, hypernymy, hyponymy information

  29. Individual features influencing aggregate representations • Boyd-Graber/Blei/Zhu 2006: • WordNet hierarchy as part of a topic model. • Generate a word: choose topic, then walk down WN hierarchy based on the topic • aim: best WN sense for each word in context • Riedel et al 2013, Rocktäschel et al 2015: Universal Schema • Relation characterized by vector of Named Entity pairs (entity pairs that fill the relation) • Both human-defined and corpus-extracted relations • Matric factorization over union of human-defined and corpus-extracted relations • Predict whether a relation holds of an entity pair

  30. Conclusion • Distributional features are not semantic features: • Not primitive • Inference from relations between word representations, co-occurrence with extra-linguistic information • Not (necessarily) individually meaningful • Inference from the aggregate of features • Two modes of inference: individual and aggregate • Use both individual and aggregate features • How to integrate the two, and infer one from the other?

  31. References • Almuhareb, A., & Poesio, M. (2004). Attribute-based and value-based clustering: an evaluation (pp. 1–8). Presented at the EMNLP. • Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and distributional data to learn semantic representations. Psychological Review, 116(3), 463–498. • Asher, N. (2011) Lexical meaning in context: a web of words. Cambridge University Press. • Baroni, M., Murphy, B., Barbu, E., & Poesio, M. (2010). Strudel: A Corpus-Based Semantic Model Based on Properties and Types. Cognitive Science, 34(2), 222–254 • Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721. • Bierwisch, M. (1969) On certain problems of semantic representation. Foundations of Language 5: 153–84. • Boyd-Graber, J., Blei, D. M., & Zhu, X. (2007). A Topic Model for Word Sense Disambiguation. Presented at the EMNLP.

  32. References • Cimiano, Philipp and Johanna Wenderoth. 2007. Automatic acquisition of ranked qualia structures from the Web. In Proceedings of ACL, pages 888–895, Prague. • Devereux, B., Pilkington, N., Poibeau, T., & Korhonen, A. (2010). Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data. Research on Language and Computation, 7(2-4), 137–170. • Fagarasan, L., E. Vecchi, S. Clark (2015). From distributional semantics to feature norms: grounding semantic models in human perceptual data. Proceedings of IWCS. • Faruqui, M., Dodge, J., Jauhar, S., Dyer, C., Hovy, E., & Smith, N. (2015). Retrofitting Word Vectors to Semantic Lexicons. Presented at the NAACL. • Fodor, J., Garrett, M. F., Walker, E. C. T., & Parkes, C. H. (1980). Against definitions. Cognition, 8(3), 263–367. • Fodor, J., & Lepore, E. (1999). All at sea in semantic space: Churchland on meaning similarity. The Journal of Philosophy, 96(8), 381–403. • Geeraerts, D. (2009) Theories of Lexical Semantics. Oxford University Press.

  33. References • Gupta, A., Boleda, G., Baroni, M., & Pado, S. (2015). Distributional vectors encode referential attributes. Proceedings of EMNLP. • Herbelot, A., & Vecchi, E. M. (2015). Building a shared world:Mapping distributional to model-theoretic semantic spaces. Proceedings of EMNLP. • Jackendoff, R. (1990) Semantic Structures. MIT Press. • Johns, B. T., & Jones, M. N. (2012). Perceptual Inference Through Global Lexical Similarity. Topics in Cognitive Science, 4(1), 103–120 • Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39(2), 170. • Lewis, D. (1970). General semantics. Synthese, 22(1):18– 67. • Pustejovsky, J. (1991) The Generative Lexicon. Computational Linguistics 17(4).

  34. References • RapaportHovav, M., and B. Levin (2001). An event structure account of English resultatives. Language 77(4). • Riedel, S., Yao, L., McCallum, A., & Marlin, B. (2013). Relation Extraction with Matrix Factorization and Universal Schemas. Presented at the NAACL. • Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting Logical Background Knowledge into Embeddings for Relation Extraction. Presented at the NAACL. • Roller, S., & Schulte imWalde, S. (2013). A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities. Presented at the EMNLP. • Schank, R. (1969). A conceptual dependency parser for natural language. Proceedings of COLING 1969 • Schulte imWalde, S., A. Melinger, M. Roth, A. Weber (2008). An Empirical Characterisation of Response Types in German Association Norms. Research on Language and Computation 6(2):205-238, 2008.

  35. References • Snow, R., Jurafsky, D., & Ng, A. Y. (2006). Semantic taxonomy induction from heterogenous evidence (pp. 801–808). Presented at the ACL-COLING. • Sowa, J. (1992). Logical Structures in the Lexicon. In J. Pustejovsky & S. Bergler (Eds.), Lexical semantics and knowledge representation (LNCS, Vol. 627, pp. 39–60). • Thill, S., Pado, S., & Ziemke, T. (2014). On the Importance of a Rich Embodiment in the Grounding of Concepts: Perspectives From Embodied Cognitive Science and Computational Linguistics. Topics in Cognitive Science, 6(3), 545–558. • Wierzbicka, A. (1996) Semantics. Primes and Universals. Oxford University Press. • Wilks, Y. (2008). What would a Wittgensteinian computational linguistics be like? Presented at the AISB workshop on computers and philosophy, Aberdeen.

More Related