1 / 18

Best Practices for Creating Effective DC Specifications

Learn the do's and don'ts of creating a DC specification for your project or annotation scheme. Discover the key elements of a good DC and how to define and name it correctly. Ensure your DC is reusable and follows language standards. Avoid common mistakes and create a DC that meets your needs.

epappas
Download Presentation

Best Practices for Creating Effective DC Specifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DC specificationsor“Do’s and don’ts” when creating a DC

  2. Your work wrt ISOcat • Create an entry • Link with an existing entry In both cases: the entries should be GOOD ones • But: what makes an entry a good one, one that you can use?

  3. What defines a matching DC? • It should ‘match’ with the way you use a specific notion in the annotation scheme, application, … at hand • It should come with the same profile • It should handle the same phenomenon, SpeakerID =/= SingerID

  4. Speaker vs Singer • SingerID and SpeakerID: siblings • SingerID is subclass of both Singer and ID (RELcat!) String→Name→Person→Singer → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are in se candidates for DCs

  5. (CLARIN) standards • Hardly any available (cf morning session) • We really should try to arrive at a series of sound DCs, useful for YOU and as many other people as possible

  6. What defines a good DC? • Meaningful definition Indefinite pronoun • Not: pronoun that is indefinite Unless both ‘pronoun’ and ‘indefinite’ are defined elsewhere AND it is mentioned explicitly which are involved AND these definitions are correct (for you)

  7. What defines a good DC? • Correct definition Personal pronoun • Not: pronoun refering to persons As That cat has five kittens. SHE … This table was very expensive but I like IT very much [Note: in a particular tagset the definition may be correct! In general it is not.]

  8. What defines a good DC? • Reusable definition Personal pronoun Not: In CGN a personal pronoun … Not: In Dutch a personal pronoun … Not: A personal pronoun (ik, ikke and ikzelf) is characterized by … A definition should be as neutral (project, language) as possible, while still valid for your purposes!

  9. Good DC  good name Sometimes confused: • Identifier (=/= PID) • Data Element Name • Name Re 1: should come in camelCaseFormat, start with alphabetical character (not 1stPerson, but firstPerson), in English, be meaningful (not EVON, but singularNeuterForm)), …

  10. Good DC  good name Re 2: field Data Element Name is proper place to mention abbreviations/tags used for a particular notion, and not just for English (N, NPlur, EVON) Re 3: In all Language Sections the correct full name(s) in the working language at hand are provided

  11. Flagged DCs • Never link with ‘deprecated’ DCs ! • In other cases the flags show whether the DC specification is correct from a technical point of view. • Note that only DCs with a green marking are qualified for standardization

  12. DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles • In case the profile you need is not yet available, contact Menzo

  13. What to include? • Cf slide on SingerID/SpeakerID • In general: all linguistically meaningful notions mentioned in your schema, manual, definition • Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name

  14. “Do’s & don’ts” Do’s: • Create a DCS for your scheme (name project, annotation scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !

  15. Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language (like language in which manual is written) • Sections in the various languages should match (+/- be translations of each other)

  16. Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! • foreignWord • Dutch language section example section: the, house, NOT: poster explanation section: een woord als ‘poster’ heeft Nederlandse diminutief: postertje, itt house (*housje, *houseje)

  17. Example sections Suppose you want to illustrate a Dutch phenomenon: • Ex.sec. in EN language section • Dutch ex with transl in English • Ex.sec. in DE language section • Dutch ex with transl in German • Ex.sec. in EN linguistic section • EN example • Ex.sec. in DE linguistic section • DE example with translation in English

  18. Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc

More Related