DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level. Ted Sanders Utrecht institute of Linguistics Universiteit Utrecht. Coherence in discourse.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
DiscAn: Towards a Discourse Annotation system for Dutch language corporaorwhy and how we would want to annotate corpora on the discourse level
Utrecht institute of Linguistics
Many tourists come to Switzerland. They want to see the mountains.
Many tourists come to Switzerland because they want to see the mountains.
John was happy. It was a Saturday.
We do not need explicit linguistic indicators
Coherence is a cognitive phenomenon
Coherence relations are conceptual relations that constitute coherence between discourse segments (minimally clauses)
Connectives, Cue Phrases and other lexical markers can but need not make this coherence explicit.
Coherence relations are the building blocks of discourse structure (causal, contrastive, additive)
The discourse level is largely lacking in annotated Dutch corpora
There is an international tendency towards discourse annotation:
And at the same time, we do have much data on Dutch:
Order: cause – consequence and vice versa
Subjectivity: want, puisque, since, denn vs. omdat parce que, because weil
Linguistic marking: yes/no, perspective etc.
Characteristics of the segments: propositional attitude, modality, tense, syntax…
Corpus connfragmnr s1s2 modality s1 modality s2 protags1 s2 relation
7omdat2502 176 176 11 irrelevant want feit 6 1 1 1 Irrelevant want feit Irrelevant want feit1
7omdat2502b 177 177 21 Spreker/auteur6211Expliciet aanwezigIrrelevant want feit1
7omdat2509 707 707 11irrelevant want feit6111Irrelevant want feitIrrelevant want feit1
7omdat2539 3320 3320 11irrelevant want feit6111Irrelevant want feitIrrelevant want feit1
7omdat2546 3810 3810 12irrelevant want feit33231Irrelevant want feitImpliciet19
7omdat2551 4357 4357 12irrelevant want feit31211Irrelevant want feitExpliciet aanwezig1
7omdat2525 2547 2547 31Spreker/auteur6211Expliciet aanwezigIrrelevant want feit1