1 / 13

Models of Linguistic Choice

Models of Linguistic Choice. Christopher Manning. Explaining more: How do people choose to express things?. What people do say has two parts: Contingent facts about the world People in the Bay Area have talked a lot about electricity, housing prices, and stocks lately

alva
Download Presentation

Models of Linguistic Choice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models of Linguistic Choice Christopher Manning

  2. Explaining more: How do people choose to express things? • What people do say has two parts: • Contingent facts about the world • People in the Bay Area have talked a lot about electricity, housing prices, and stocks lately • The way speakers choose to express ideas within a situation using the resources of their language • People don’t often put that clauses pre-verbally: • That we will have to revise this program is almost certain • We’re focusing on linguistic models of the latter choice

  3. How do people choose to express things? • Simply delimiting a set of grammatical sentences provides only a very weak description of a language, and of the ways people choose to express ideas in it • Probability densities over sentences and sentence structures can give a much richer view of language structure and use • In particular, we find that the same soft generalizations and tendencies of one language often appear as (apparently) categorical constraints in other languages • Linguistic theory should be able to uniformly capture these constraints, rather than only recognizing them when they are categorical

  4. Probabilistic Models of Choice • P(form|meaning, context) • Looks difficult to define. We’re going to define it via features • A feature is anything we can measure/check • P(form|f1, f2, f3, f4, f5) • A feature might be “3rd singular subject”, “object is old information”, “addressee is a friend”, “want to express solidarity”

  5. Constraints = Features = Properties Discourse Person Linking Input: approve<1pl [new], plan [old]> f1 *Su/Newer f2 *3>1/2 f3 *Ag/Non-subj We approved the plan last week 1 0 0 The plan was approved by us last week 0 1 1 Explaining language via (probabilistic) constraints

  6. Explaining language via (probabilistic) constraints • Categorical/constraint-based grammar [GB, LFG, HPSG, …] • All constraints must be satisfied, if elsewhere conditions / emergence of unmarked, complex negated conditions need to be added. • Optimality Theory • Highest ranked differentiating constraint always determines things. Emergence of unmarked. Single winner: No variable outputs. No ganging up. • Stochastic OT • Probabilistic noise at evaluation time allows variable rankings and hence a distribution over multiple outputs. No ganging up. • Generalized linear models (e.g., Varbrul)

  7. A theory with categorical feature combination • In a certain situation you can predict a single output or no well-formed output • No model of gradient grammaticality • No way to model variation • Or you can predict a set of outputs • Can’t model their relative frequency • Categorical models of constraint combination allows no room for soft preferences and constraint combining together to make an output dispreferred or impossible (“ganging up” or “cumulativity”)

  8. Optimality Theory • Prince and Smolensky (1993/2005!): • Provide a ranking of constraints (ordinal model) • Highest differentiating constraint determines winner • “When the scalar and the gradient are recognized and brought within the purview of theory, Universal Grammar can supply the very substance from which grammars are built: a set of highly general constraints, which, through ranking, interact to produce the elaborate particularity of individual languages.” • No variation in output (except if ties) • No cumulativity of constraints

  9. Creating more ties • One way to get more variation is to create more ties by allowing various forms of floating constraint rankings or unordering of constraints • If you have lots of ways of deriving a form from underlying meanings, then you can count the number of derivations • Anttila (1997) • (I confess I’m sceptical of such models; inter alia they inherit the problems of ties in OT: they’re extremely unstable.)

  10. Stochastic OT (Boersma 1997) • Basically follows Optimality Theory, but • Don’t simply have a constraint ranking • Constraints have a numeric value on a scale • A random perturbation is added to a constraint’s ranking at evaluation time • The randomness represents incompleteness of our model • Variation results if constraints have similar values – our grammar constrains but underdetermines the output • One gets a probability distribution over optimal candidates for an input (over different evaluations) f1f2f3f4

  11. Stochastic OT (Boersma 1997) • Stochastic OT can model variable outputs • It does have a model of cumulativity, but constraints in the model are and can only be very weakly cumulative • We’ll look soon at some papers that discuss how well this works as a model of linguistic feature combination

  12. Generalized linear models • The grammar provides representations • We define arbitrary properties over those representations (e.g. Subj=Pro, Subj=Topic) • We learn weights wifor how important the properties are • These are put into a generalized linear model • Model: or

  13. Generalized linear models • Can get categorical or variable outputs • As probability distribution: • All outputs have some probability of occurrence, with the distribution based on the weights of the features. Ganging up. Emergence of the unmarked. • Optimizing over generalized linear models: we choose one for which the probability is highest: • arg maxj P(cj) • Output for an input is categorical. Features gang up. (However by setting weights far enough apart, ganging up will never have an effect – giving conventional OT.) Emergence of unmarked.

More Related