120 likes | 236 Views
This paper by Miriam Butt and Martin Forst delves into the application of Optimality Theory (OT) marks within the XLE framework for parsing and generation. It contrasts traditional OT with the structured hierarchy utilized in XLE, introducing both dispreference and preference marks to refine parsing outputs. Among key topics are the OPTIMALITYORDER for parsing, the use of special marks to influence grammar performance, and the significance of c-structure vs. f-structure in computational efficiency. Additionally, the authors discuss strategies for customizing grammars to optimize parsing and generation processes.
E N D
Grammar Engineering:OT Marks for Parse RankingGeneration Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014
OT Marks • OT = Optimality Theory • Classic OT only knows constraints, i.e. dispreferences. • OT as implemented in XLE uses both dispreference marks (default) as well as preference marks (prefixed with +) • Classic OT assumes a simple hierarchy of constraints • OT as implemented in XLE uses a “structured hierarchy”
OT Marks (cont’d) • OT marks can be introduced in lexicon entries and in rules • OT marks are projected to a separate projection, the o-structure • The o-structure (unlike the c- and the f-structure) is not really structured; just view it as a bag of OT marks OTMarkName $ o::*
OPTIMALITYORDER • Part of the grammar header • Can be modified for grammar customization • OPTIMALITYORDER is for parsing • GENOPTIMALITYORDER is for generation • OT marks can be organized into groups of equal rank OPTIMALITYORDER DisprefMark1 +PrefMark1 DisprefMark2 (DisprefMark3 DisprefMark4)
Ranking Parses with OT Marks • Start on the left of OPTIMALITYORDER • Keep parses with fewest instances of DisprefMark1; consider all others suboptimal • Among remaining parses, keep those with most instances of PrefMark1; consider all others suboptimal • Among remaining parses, keep those with fewest instances of DisprefMark2; consider all others suboptimal • Etc.
Special Marks in OPTIMALITYORDER • Without special marks in OPTIMALITYORDER all OT marks are used for ranking the parses after parsing proper has finished • Special marks can be introduced to make OT marks interact with parsing process • NOGOOD • CSTRUCTURE • STOPPOINT
NOGOOD OT Marks • If (part of) a lexicon entry or a rule projects an OT mark that is listed to the left of NOGOOD in OPTIMALITYORDER, that part of the grammar is deactivated. • Might be used for expensive constructions or particular readings of ambiguous lexical items which are known to be of no/little importance in the application domain.
CSTRUCTURE OT Marks • Intended for better performance • Resolving f-annotations is far more expensive computationally than determining possible c-structures • If we can discard certain c-structures early on, we do not even need to start resolving the associated f-annotations • Example: Guessed +MWE CSTRUCTURE
STOPPOINT OT Marks • Also intended for better performance • Only beneficial when used cautiously • (Parts of) lexical entries and rules marked with STOPPOINT OT marks are not used for first parsing attempt • If first attempt is unsuccessful, the parser activates those lexicon or rule parts and makes a second attempt • Example: Mark1 Mark2 STOPPOINT
Examples of Potential OT Marks • Prefer OBL interpretations of PPs over ADJUNCT interpretations The zookeeper waited for the gorilla. • Prefer ditransitive subcategorization frames over transitive ones The girl gave her brother money.
Generation • XLE can generate strings from well-formed f-structures. • GENOPTIMALITYORDER can be different from OPTIMALITYORDER, both wrt. OT marks used and wrt. their ranking • Transducers can also be different; typically, the generation tokenizer is more restrictive than the parsing tokenizer
Generation • For our purposes, we will parse the sentences from our exercises and regenerate. • Go to “Commands” menu of your f-structure window (bottom left) and select “Generate from this FS”