1 / 31

Annotating Attribution Relations Towards an Italian Discourse Treebank

Annotating Attribution Relations Towards an Italian Discourse Treebank. Silvia Pareti Irina Prodanof. Outline. Introduction Related works Goal and methodology Proposed scheme Some issues Pilot annotation Attribution figures Conclusion and future work. Introduction. Fiona. Fiona.

nemo
Download Presentation

Annotating Attribution Relations Towards an Italian Discourse Treebank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotating Attribution RelationsTowards an Italian Discourse Treebank Silvia Pareti Irina Prodanof

  2. Outline • Introduction • Related works • Goal and methodology • Proposed scheme • Some issues • Pilot annotation • Attribution figures • Conclusion and future work

  3. Introduction Fiona Fiona says “This afternoon it will rain” ATTRIBUTION in a text is ascribing the ownership of an attitude towards some linguistic material, i.e. the text itself, a portion of it or their semantic content, to an entity. Recognising attribution relations is fundamental for Information Extraction, (Multi Perspective) Question Answering, Opinion Mining etc. Different sources can differ in bias and reliability and this deeply affects the way we perceive information.

  4. Introduction Why should we identify the source of a portion of text? ODQA NLP techniques Information Retrieval Language Generation Answer selection Question comprehension Finding text fragments with the answer Answer generation • visualize only authoritative answers • collect different opinions, hearsay • discard second-hand or anonymous information • retrieve statements from a specific source over a given time span • …

  5. Introduction “È meglio vaccinarsi per l’influenza ‘suina’?” Is it better to get the swine flu vaccine?

  6. Introduction “È meglio vaccinarsi per l’influenza ‘suina’?” Is it better to get the swine flu vaccine? “The vaccine is useless.” orsetta90 blogger: not authoritative and not verifiable source “Everyone should get the vaccine.” Novartis Pharmaceuticals industry: authoritative but biased “Only persons having a higher risk of complication from influenza should get the vaccine .” Doctor association

  7. Related works Opinion holders identification projects Bethard et al. (2004) Consider just opinion propositions (source = agent) Kim and Hovy (2005) Identify all possible opinion holders agentive and NPs (no pronouns) Stoyanov and Cardie (2006) Identify NPs sources Choi et al.(2006) They do not consider implicit or multiple sources and test their system on the OPQA corpus Opinion recognition has limited coverage and not satisfactory precision: 60-70%

  8. Related works PDTB (Prasad et al., 2007) assertions, beliefs, facts, eventualities Attribution of discourse connectives and their arguments only Opinion Corpus (Wiebe, 2002) speech acts private states: opinions, beliefs, thoughts, feelings, emotions, goals, evaluations and judgements Attribution considered as an intra-sentential phenomenon GraphBank (Wolf and Gibson, 2005) attribution included as a directed coherence relation (satellite to nucleus) Attribution of discourse segments

  9. Goal and methodology Designing the addition of a level of annotation for attribution to the ISST (Italian Syntactic - Semantic Treebank) corpus. • more complete and independent analysis of attribution • development of an annotation schema • pilot annotation of a portion of the ISST • partial listing of possible attribution cues • evaluation

  10. Goal and methodology ANNOTATION ANALYSIS SCHEMA DEFINITION TOOL SELECTION EVALUATION • Selection of features to be annotated • Design of the schema • Annotation requirement definition • Match tool characteristics and annotation requirements • Setting the tool • Scope definition • Identification of characteristics and issues X • Evaluation of the schema applicability • Pilot annotation and detection of issues • Linguistic resource creation and release

  11. Proposed schema Markables relation SOURCE(S) CUE CONTENT(S) (SUPPLEMENT) -noun phrase -adjective -prep. phrase -verb -noun -adjective -preposition -prep. group -graphic marker -word -phrase -clause -sentence -entire article -cue modifier -indirect object -source of source -event specification

  12. Proposed schema • assertion (e.g. dire, osservare, sostenere) • belief (e.g. credere, pensare, dubitare) • fact (e.g. ricordare, sapere, sentire) • eventuality (e.g. permettere, proibire) • writer • other (e.g. il presidente, un uomo, Maria) • arbitrary (e.g. uno, la gente, tutti) • mixed • none • scopal change • factual • non-factual Features Attribution type Source type Factuality Scopal change

  13. Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora

  14. Some issues Source [Suesaid {that Marybelieves (that Gore won the election)}]. Fonti: [writer] {writer, Sue} (writer, Sue, Mary) (Wiebe, 2002:5 - with the addition of brackets) • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora Blinder, secondovociriferite dalNew York Times, sperava di succedere al presidente Greenspan quando a marzo scadrà la sua nomina. (ISST re070) Blinder, according to rumours reported by the New York Times, hoped to succeed to president Greenspan when in May his appointment will run over.

  15. Arbitrary Other Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora Tutti, incluse le autorità, conosconola loro provenienza, ma nessuno dice e fa nulla per prevenire il massacro di capi selvatici. (cs.morph020) Everyone, including the authorities, knows their provenance, but no one says and does anything to prevent the massacre of wild animals.

  16. Some issues (Ø) Ho saputodella squalifica di Garcianoda Maurizio Damilano, vi giuro, non pensavo di arrivare primo. (ISST cs071) (I) heard of the disqualification of Garciano from Maurizio Damilano, I swear, I didn’t imagine I would have came first. Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora Poi però, tramite la figlia che sta a Santiago, prima limita la portata del colloquio con Gaston Salvatore (“non è stata una vera intervista, solo una conversazione”), poi smentisce. (ISST period005) Afterwards however, through the daughter who lives in Santiago, first diminishes the importance of the colloquium with Gaston Salvatore (“it wasn’t a real interview, just a conversation”), then (she) denies.

  17. Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora La Fermenta, a sentirel' arabo, è organizzata in modo che oggi consegue un utile pari al 35 per cento del fatturato. Questo il vero traguardo che dovrà nel tempo raggiungere la Pierrel. Ma come? Con tagli di mano d'opera? Nemmeno per sogno, diceEl Sayed. (ISST els001) Fermenta, according to the Arabian, is organised so that it earns at present a profit of 35 per cent of the turnover. This is the real goal that in the long distance Pierrel will have to achieve. But how? Cutting down on workforce? No way, says El Sayed.

  18. Some issues Cue • Type definition • Multimodal cues • Scopal change

  19. Some issues Cue Eventuality • Type definition • Multimodal cues • Scopal change Assertion "Vi daremo le statistiche alla fine", promettonoi generali croati. (ISST cs030) “We’ll give you the statistics at the end”, promise the Croatian generals.

  20. Some issues Arlacchisorride: “Pura paranoia politica. Non ho partecipato ai lavori solo a causa di un impegno privato…”. (ISST re095) Arlacchi smiles: “Pure political paranoia. I didn’t participate in the works only because of a private appointment…” . Cue • Type definition • Multimodal cues • Scopal change "Sì - si adombraMatt - Un ruolo interessante: con Tarantino eravamo a buon punto, poi é arrivato Bruce. I suoi film incassano un po' più dei miei, no? Hanno scelto lui” …(ISST cs060) “Yes - Matt grows dark - An interesting role: with Tarantino we were at a good point, then Bruce arrived. His films cash in a bit more than mines, right? They chose him” …

  21. Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente proseguire. (ISST re075) If there is a majority at the Parliament able to seriously face a phase of reforms, also electoral, (I) think that the legislature could usefully continue. Some issues ? = tutti vorrebbero non accadessero Cue Strano destino, quello di Civitavecchia: finire spesso, troppo spesso, sulle pagine dei giornali per eventi misteriosi, oppure per fatti chenessunovorrebbeaccadessero nella sua città. (ISST cs090) Strange destiny, that of Civitavecchia: ending up often, too often, in the news because of mysterious events, or because of events that no one would like to happen in their town. • Type definition • Multimodal cues • Scopal change

  22. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora

  23. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora (Ø) Ho detto |che ero dalla sua parte| e |che ritenevo giusta la sua protesta|. (ISST cs063) (I) said |that I was on his side| and |that I considered his complaint fair|.

  24. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora "There's no question that some of those workers and managers contracted asbestos-related diseases," saidDarrell Phillips, vice president of human resources for Hollingsworth & Vose. "But you have to recognize that these events took place 35 years ago. It has no bearing on our work force today." (PDTB 0003)

  25. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora “…L’umanità deve proclamare uno storico sciopero ad oltranza fino alla distruzione di tutti gli armamenti nucleari.” Le parole registratedi Gheddafi, …(ISST cs039) “…The world should proclaim a non-stop strike till the destruction of all nuclear armaments.” Gheddafi’s recorded words,…

  26. Pilot annotation Tools GATE Knowtator Annotator MMAX2 Callisto … MMAX2 Base Data (original text) Scheme (annotation schema) Style (display structure) Customization (preferences) Markable (annotation) • Subcorpus: • 50 articles from the ISST • balanced • 37.000 word tokens • 461 attribution relations

  27. Pilot annotation

  28. Attribution figures Markables Source type Scopal change

  29. Attribution figures Attribution type and Factuality

  30. Conclusion and future work • Achievements: • more complete analysis of attribution • definition of an annotation schema • identification of issues and possible solutions • partial listing of possible attribution cues • annotation of a portion of the ISST corpus • Future work: • testing of the interannotator agreement for the proposed schema • redefinition of problematic or underspecified attributes • annotation of the whole ISST corpus • expanding the list of attribution cues • relation between attribution and discourse connectives/ anaphora/ …

  31. Conclusion and future work Thank you Discourse generation Researches on journalistic discourse Training tools for ODQA/ MPQA/ IE Testing algorithms for the recognition of attribution ANNOTATED CORPUS Statistical and combinatory analysis Development of corpora in other languages …

More Related