1 / 24

CLARIN-NL ISOcat workshop 2011 part 2

This workshop explores the issues surrounding the adoption of existing data categories in ISOcat and CLARIN standards, as well as how to deal with larger amounts of data. Topics include when to adopt an existing data category, the use of flagged data categories, the relationship between data categories and profiles, and considerations for including details in ISOcat.

larriaga
Download Presentation

CLARIN-NL ISOcat workshop 2011 part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN-NL ISOcat workshop 2011part 2 Ineke Schuurman Menzo Windhouwer

  2. Part A • Issues brought up by participants • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  3. Part B • ISOcat and CLARIN: Do’s and don’ts (version 0.1) • Introduction and discussion

  4. Part 1 • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  5. When (not) to adopt an existing DC • It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … • It should come with the same profile • It should handle the same phenomenon, SpeakerID =/= SingerID

  6. Speaker vs Singer String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are candidates Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)

  7. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  8. Standards • Within ISOcat currently there are little or no standards, Therefore • CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)

  9. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  10. Flagged DCs • Never link with ‘deprecated’ DCs ! (in case of doubt: consult with Ineke or Menzo) • In other cases the flags show whether the DC specification is correct from a technical point of view. • Note that only DCs with a green marking are qualified for standardization

  11. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  12. DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles • In case the profile you need is not yet available, contact Menzo and Ineke

  13. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  14. What to include? • Cf slide on SingerID/SpeakerID • In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B) • Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name

  15. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  16. TEI, metadata, webservice • TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use. • Metadata: new in CMDI? In that case definition in ISOcat to be provided as well • Webservice: to be taken care of in CMDI

  17. When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data

  18. Larger amounts? in such a case: contact Menzo Windhouwer (menzo.windhouwer@mpi.nl)

  19. Part B: do’s & don’ts Do’s: • Create a DCS for your scheme (name project, ann.scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !

  20. Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language manual • Sections in the various languages should match (+/- be translations of each other)

  21. Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))

  22. Example sections Suppose you want to illustrate a German phenomenon: • Ex.sec. in EN language section • German ex with transl in English • Ex.sec. in NL language section • German ex with transl in Dutch • Ex.sec. in EN linguistic section • EN example • Ex.sec. in NL linguistic section • NL example with translation in English

  23. Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc

  24. . -- End --

More Related