html5-img
1 / 20

Metadata

Metadata. Helen Aristar Dry Eastern Michigan University LINGUIST List. Outline. What is metadata? Why use OLAC metadata? How can you write OLAC metadata for your resources? Metadata in XML Using ORE. Preliminaries. Language documentation is valuable only if it is findable

audi
Download Presentation

Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List Symposium on Best Practice LSA, Boston, MA

  2. Outline • What is metadata? • Why use OLAC metadata? • How can you write OLAC metadata for your resources? • Metadata in XML • Using ORE Symposium on Best Practice LSA, Boston, MA

  3. Preliminaries • Language documentation is valuable only if it is findable • On the Internet, this means “findable by computational means” • Efficient search and retrieval of language resources requires the use of metadata Symposium on Best Practice LSA, Boston, MA

  4. Metadata is: • Structured data about data • Similar to catalogue information • Usually consists of a set of elements, each of which describes a property of the resource • The elements of a metadata set can be encoded in different “languages,” e.g., html, xml, rdf/xml Symposium on Best Practice LSA, Boston, MA

  5. An example • Title: Biao Min Data • Creator (depositor): David Solnit • Subject (linguistic field): Language Description • Subject (language): Biao Min • Date created: April 5, 1982 • Description: The Biao Min data on the E-MELD site includes over 3,000 lexical items. . . . . Symposium on Best Practice LSA, Boston, MA

  6. Example in HTML • <meta name=“DC.title“ content=“Biao Min Data” /> • <meta name=“DC.creator” content=“David Solnit” /> • <meta name=“DC.subject” content=“Language Description” /> • <meta name=“DC.subject” content=“Biao Min” /> • <meta name=“DCTERMS.created” content=“1982-04-05” /> • <meta name=“DC.description” content=“The Biao Min data on the E-MELD site includes over 3,000 lexical items. . . . .” /> Symposium on Best Practice LSA, Boston, MA

  7. Example in XML • <title> Biao Min Data </title> • <creator xsi:type="olac:role" olac:code="depositor"> David Solnit </creator> • <subject xsi:type="linguistic-field" olac:code="language_description"/> • <subject xsi:type="olac:language" olac:code="x-sil-BJE"> Biao Min </subject> Symposium on Best Practice LSA, Boston, MA

  8. Metadata • Different metadata specifications: MARC, METS, Dublin Core, IMDI, OLAC • IMDI & OLAC designed specifically for language documentation Symposium on Best Practice LSA, Boston, MA

  9. OLAC Metadata • Product of the Open Language Archives Community http://www.language-archives.org/ • Strengths: • Ease of creation • Search & retrieval via the protocols of the Open Archives Initiative Symposium on Best Practice LSA, Boston, MA

  10. Open Archives Initiative • Cross-disciplinary initiative for search and retrieval of metadata from multiple archives • Establishes protocols for “harvesting” metadata records of participating archives and making them available via “Service Providers.” • Supports formation of discipline-specific sub-communities such as OLAC (Open Language Archives Community) Symposium on Best Practice LSA, Boston, MA

  11. LINGUIST List = OLAC Gateway • LINGUIST List is the main service provider for OLAC • Harvests metadata from 27 major archives • Collects metadata from individual linguists about their language documentation • Offers search interface for over 30,000 records of language-related data • See: http://linguistlist.org/olac/ Symposium on Best Practice LSA, Boston, MA

  12. OLAC Metadata • OAI uses the Dublin Core (DC) metadata standard • 15 elements (each optional & repeatable) • Core vocabulary for refining elements (dcterms) • Sub-communities may qualify DC metadata to suit their specific needs • OLAC has qualified DC metadata to better describe language resources. Symposium on Best Practice LSA, Boston, MA

  13. OLAC Qualifies 5 of the 15 DC Elements • Language • Publisher • Relation • Rights • Source • Subject • Title • Type • Contributor • Coverage • Creator • Date • Description • Format • Identifier Symposium on Best Practice LSA, Boston, MA

  14. OLAC recommends 5 extensions: • Contributor • Role • Creator • Role • Language • OLAC language • Subject • OLAC Language • Linguistic Field • Type • Linguistic Data Type • Discourse Type Symposium on Best Practice LSA, Boston, MA

  15. Participant Role • Provides a controlled vocabulary for identifying the role of a Creator or Contributor more precisely. The vocabulary identifies approximately twenty roles that are common in the development of language resources. • Examples: depositor, signer, transcriber, respondent, editor, consultant, researcher. • Documentation: http://www.language-archives.org/REC/role.html Symposium on Best Practice LSA, Boston, MA

  16. Language Identification: • Provides codes for identifying all known languages, both living and extinct. • Applies to: Language, Subject Symposium on Best Practice LSA, Boston, MA

  17. Linguistic Field • Provides codes for identifying the content of a resource as relevant to a particular subfield of linguistic science • Applies to: Subject • Examples:anthropological_linguistics , applied_linguistics, cognitive_science, computational_linguistics , lexicography, discourse_analysis, Symposium on Best Practice LSA, Boston, MA

  18. Linguistic Data Type • Describes the resource as representing a recognized structural type of linguistic information • Applies to: Type • Examples: • Lexicon • Primary text • Language description • Dataset (Already in DCterms). Symposium on Best Practice LSA, Boston, MA

  19. Discourse Type • Provides a controlled vocabulary for identifying approximately ten discourse types. It is used with Type to identify the genre of a language resource (particularly a primary text). Types: Interactive Discourse, Report, Singing, Oratory, Narrative, Formulaic Discourse, Procedural Discourse, Language Play, Unintelligible Speech • http://www.language-archives.org/REC/discourse.html Symposium on Best Practice LSA, Boston, MA

  20. Writing metadata • See “metadata” in the E-MELD School of Best Practices: http://emeld.org/school/classroom/metadata • Or use the OLAC Repository Editor: See: http://linguistlist.org/ore/ Symposium on Best Practice LSA, Boston, MA

More Related