1 / 17

A Corpus-based Analysis for the Ordering of Clause Aggregation Operators

A Corpus-based Analysis for the Ordering of Clause Aggregation Operators. James Shaw Multimedia/Video Technology Department 755 College Road East Siemens Corporate Research, Inc. Princeton, NJ 08540 shaw@scr.siemens.com. Introduction.

annora
Download Presentation

A Corpus-based Analysis for the Ordering of Clause Aggregation Operators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Corpus-based Analysis for the Ordering of Clause Aggregation Operators James Shaw Multimedia/Video Technology Department 755 College Road East Siemens Corporate Research, Inc. Princeton, NJ 08540 shaw@scr.siemens.com

  2. Introduction • General goal: Automatic generation of concise and fluent complex sentences. Input propositions: • Jones is a female patient. • Jones has hypertension. • Jones has peptic ulcer. • Jones is 80 years old. • Jones … Output sentence: • Ms. Jones is an 80-year-old hypertensive patient with peptic ulcer underwent coronary artery bypass grafting. • Syntactic clause aggregation operators, i.e., • Paratactic constructions: • Conjunction transformations • Hypotactic constructions: • Adjective, prepositional phrase, reduced relative clause, and relative clause transformations COLING 2002

  3. Specific Goal • What is the correct ordering of applying clause aggregation operators in a domain independent natural language generation system? • In our first implementation of MAGIC system (McKeown97), the ordering of the operators is the following: • Paratactic operators first (conjunction transformations) • Hypotactic operators • Paratactic operators again • Why are the paratactic operators applied twice and hypotactic operators only once? • We cannot simply permute all the clause aggregation operators and find an optimal ordering. • Instead of finding an optimal ordering, our goal is to find an ordering which performs well. COLING 2002

  4. Why the ordering should be identified • Clause aggregation operators are not communicative. Applying one of the operators to input propositions prevents the application of others. • The ordering affects meaning: • Input propositions: • John drank cider • John ate oranges • (even though) John didn’t like fruits • Potential output sentences: • John drank cider and even though he didn’t like fruits, he ate oranges. • Even though John didn’t like fruits, he drank cider and ate oranges. COLING 2002

  5. Related Work • Syntactically simple expressions of embedding are to be preferred over more complex ones (Scott and de Souza90, Shaw98) • Rhetorical Structure Theory (Mann and Thompson 1988) • Cohesion analysis (Halliday and Hasan76) • Similar to other works in clause aggregation (Moser and Moore95, Rösner and Stede92) • Graphical tools to facilitate discourse annotation (O'Donnell00, Garside and Rayson97) • Automating the discourse annotation (Marcu00) • Ordering of applying the same operator, such as adjective transformation (Shaw99, Malouf00) COLING 2002

  6. Methodology • Collect a corpus with sentences containing paratactic and hypotactic constructions • De-aggregate those sentences into propositions • Specify rhetorical relations between the propositions • Specify a sequence of transformation operators to combine the de-aggregated proposition into the original sentences • Evaluate how well the proposed operator ordering works by check the sequence of transformation against our propose ordering of clause aggregation operators. COLING 2002

  7. Corpus Collection • Corpus is taken from from medical domain and Wall Street Journal. • Only sentence contain the conjunctor “and” are selected to increase the likelihood of encountering interactions between paratactic and hypotactic constructions. • Due to the amount of effort needed to annotate complex sentences, only 100 sentences from each domain are annotated. COLING 2002

  8. Corpus Annotation • Use XML as the markup language • Manual de-aggregation by the author • Each annotated sentence entry consists of 5 parts: • The original sentence. • A list of de-aggregated propositions after manual reconstruction of ellided constituents. These propositions are enclosed in propset. • The rhetorical relations which link the de-aggregated propositions or propset. • The sequence of transformations that can be used to reproduce the original sentence. • The annotator’s comments. COLING 2002

  9. Annotation Section 1 & 2:The original sentence & the propositions • Section 1 (Original sentence): “Local sports fans themselves, long known for their passive demeanor at games and propensity to leave early, don’t resist the image.” • Section 2 (Propositions) <propset id="pset32-1"> <prop id="p32-1">Local sports fans don't resist the image. </prop> <prop id="p32-2">Local sports fans are long known for their passive demeanor at games.</prop> <prop id="p32-3">Local sports fans are long known for their propensity to leave early.</prop> </propset> • Section 3 (Rhetorical relations): <focus entity=“local sports fans”/> <rst-rel id="r32-1" name="elab" nuc="p32-1" sat="p32-2" /> <rst-rel id="r32-2" name="elab" nuc="p32-1" sat="p32-3" /> COLING 2002

  10. Annotated Section 3 & 4:Rhetorical relations and operator sequences • Section 2 (Propositions): <propset id="pset32-1"> <prop id="p32-1">Local sports fans don't resist the image. </prop> <prop id="p32-2">Local sports fans are long known for their passive demeanor at games.</prop> <prop id="p32-3">Local sports fans are long known for their propensity to leave early.</prop> </propset> • Section 4 (Operator sequences): <trans id="tx32-1" name="conj-simp" nuc="p32-2" sat="p32-3" /> Local sports fans are long known for their passive demeanor at games andlocal sports fans are long known for their propensity to leave early. <trans id="tx32-2" name="rel-reduced-del-wh-be" nuc="p32-1" sat="tx32-1" /> Local sports fans themselves, who are long known for their passive demeanor at games and propensity to leave early, don’t resist the image. COLING 2002

  11. Annotated Section 5:Annotator’s comment • Our proposed aggregation operator ordering: • Adjective (conjunction optional) • Prepositional phrase (conjunction optional) • Reduced relative clause, including apposition (conjunction optional) • Relative clause (conjunction optional) • Transformations for other rhetorical relations (conjunction optional) • Simple conjunction • Complex conjunction • Section 5 (annotator’s comment): <seqorder valid="true" /> <conj id="c32-1" type="dist" /> • If the sequence of transformation does not differ from the proposed ordering, seqorderisassigned true. COLING 2002

  12. The concept of Propset • Issues • Simply de-aggregated propositions did not provide sufficient information to reproduce the original sentence. • John drank cider • John ate oranges • (even though) John didn’t like fruits They can be mapped to either one of the following sentences: • John drank cider and even though he didn’t like fruits, he ate oranges. • Even though John didn’t like fruits, he drank cider and ate oranges. • The number of rhetorical relations can be greater than number of propositions, up to for n propositions. COLING 2002

  13. The Benefits of using Propset • Propset allows annotators to do the following: • Group related propositions are more tightly related, i.e., a smoker quit 10 years ago. • Specify the scope of modifying propositions, as the earlier example. • Simply annotation for certain constructions, such as “say” and “believe”: [propset [prop John believed [propset [prop Tim invested in stock] [prop Tim invested in real estate] ] ] ] Instead of [prop John believed Tim invested in stock] [prop John believed Tim invested in real estate] • Minimize redundant specification of multiple modifying rhetorical relations COLING 2002

  14. Why minimize redundancy rhetorical relations? • The sentence: Even though John didn’t like fruits, he drank cider and ate oranges. • Input propositions: <prop id="p1-1">John drank cider. </prop> <prop id="p1-2">John ate oranges.</prop> <prop id="p1-3">(even though) John didn’t like fruits.</prop> Instead of <rst-rel id="r1-1" name="elab" nuc="p1-1" sat="p1-3" /> <rst-rel id="r1-2" name="elab" nuc="p1-2" sat="p1-3" /> <rst-rel id="r1-3" name=“join" nuc="p1-2" sat="p1-2" /> The annotated relations are <rst-rel id="r1-1" name=“join" nuc="p1-2" sat="p1-2" /> <rst-rel id="r1-2" name="elab" nuc="propset1-1" sat="p1-3" /> • The number of rhetorical relations is always n-1, n is the number of propositions. • Simplifies clause aggregation algorithm because one transformation maps to one rhetorical relation. COLING 2002

  15. Results • In our evaluation, we assume the operators applied earlier should result in constituents closer to the head than the constituents result from operators applied later. • 200-sentence corpus was de-aggregated manually (average sentence length is 23 words) • 763 propositions • 3.8 propositions per sentence • 2.6 transformations per sentence • 523 rhetorical relations, with 440 relations either Elaboration, Joint, or Sequence. • 20% of the annotated transformations cannot not be handled in by our operators. • 195 out of 200 original sentence can be re-synthesized using the proposed ordering. • Why such a good result? • The use of propset in the annotation removed many potential conflicts. COLING 2002

  16. What We Learned? • The use of propset during discourse annotation is very useful for discourse annotation. • The first application of paratactic operator is a sub-step of the hypotactic operation which combines satellite propositions with similar syntactic structures and modify the same entity in the nucleus proposition. • The correct ordering of operators is a 2-stage process • hypotactic operators (with conjunction operator optional as local optimization) • paratactic operators • Hypotactic operators are applied first because their operations are local in nature, “Bob is a reputable stock-broker [with deep pocket] who is interested in dot-coms.” In contrast, paratactic operators are is not local. They are sensitive to the surface position of identical constituents across all the propositions being combined -- directional constraint (Ross70, Shaw98b) . COLING 2002

  17. Conclusion • Even though researchers have studied rhetorical relations in conjunction with clause aggregation operations, the explicit use of propset in discourse annotation in such a context is new. • We explained why some paratactic operators are applied before hypotactic operators while others are applied afterward. • By imposing our proposed ordering onto de-aggregated propositions and try to re-synthesize the original sentences, we identified an ordering of clause aggregation operators which work well based on a human-written corpus. • Such ordering can be implemented and reused in domain independent natural language generation systems to create complex sentences that are also concise and fluent. COLING 2002

More Related