1 / 28

Resource Discovery (metadata and searching)

Resource Discovery (metadata and searching). Working Group Report. Issues discussed . What kinds of resources should EMELD provide search services for? What should the design be for an EMELD search interface? How can EMELD get good metadata into its search database?

diem
Download Presentation

Resource Discovery (metadata and searching)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Discovery(metadata and searching) Working Group Report

  2. Issues discussed • What kinds of resources should EMELD provide search services for? • What should the design be for an EMELD search interface? • How can EMELD get good metadata into its search database? • What level of metadata should be exposed?

  3. What resources? • Anything that might be of value to the endangered language's linguist. • Language data • Tools • Advice (including reviews) • People • "Gateway" websites

  4. What resources? • But, there's no reason to rely on this working group for "what". • A questionnaire distributed via Linguist

  5. What resources? • Two kinds of best practice resources • Resources with best practice metadata • These resources can be discovered • Non-digital resources encouraged • Digital resources discouraged, but allowed

  6. What resources? • Best practice digital resources • All digital resources encouraged to be of this type • Benefits • Enhanced search features (due to document interoperability) • Special "BP globe of approval"

  7. What resources? • Side Note • Best Practice "approval" system should be tied into a larger system through which digital resources could be listed as "publications" • A topic for another working group? (Perhaps OLAC?)

  8. What resources? • Issues which need to be addressed • Metadata for resources interesting to linguists but which are not linguistic data • Needed: Best practice metadata standards for • Tools • Advice • People • ... • Test: EMELD could see how it would classify everything in BPU.

  9. How to search? • Assumption: Metadata and data is distributed • Query Language • Metadata: OLAC standard • Data from interoperable documents: A new standard

  10. How to search? • Resource Query Language Ideal • A generalized query protocal used across the linguistics community • A series of "methods" to be defined can be called on these resources to retrieve structured linguistic data matching query parameters

  11. How to search? • Problems implementing ideal • No clear sense as to what "methods" are needed. • One solution: Examine results from questionnaire

  12. How to search? • Problems implementing ideal • Very few repositories allow their data to be accessed in a generalized way • First step: Encourage documentation of repository data access systems and develop a metadata standard for this

  13. How to search? • Long term implementation issues • An OLAC Query Language Protocol • A well-defined linguistic query language • A system for "packaging" queries • Linguistic data search registry • Linguistic sites register they are data access sites • They also register implemented search methods • EMELD will archive best-practice documents for data access for data creators not capable of implementing the query protocol

  14. How to search? • Pilot project • Take some small subset of resources • Data inputted via Field • Nijmegen? SIL? AIATSIS? AILLA? • Take FIELD search out of FIELD • Search over that small set of resources • Ideally, keep both resources in separate databases to begin to develop query interchange protocol

  15. How to search? • Another project: Grammatical thesaurus • Develop a grammatical thesaurus that gives common synomyns for a given grammatical term (Ex. oral stop, plosive) • This could then be used to allow a user's search to be expanded to include synonyms for a given term. • In all likelihood, there are other applications of this.

  16. How to search? • Search interface • EMELD should implement a VISER-like service for access to its database • There are two distinct kinds of searches • Resource location • Resource data search

  17. How to search? • Search interface • The details of the search interface implemented by EMELD are hard to conceive of until more resources can be accessed through it • A questionnaire can help with this area too. • EMELD could ask people to try the search and evaluate it • Starting with the people in this room

  18. Getting the data • Sticks • EMELD Ambassadors • Assisted by Linguist Spider

  19. Getting the data • Carrots • Support harvesting metadata in document headers for submitted URL's. • Resources with best practice metadata can be referenced using some standard EMELD URI which can be used as a reference • These resources could be posted and advertised on Linguist • (but consult Baden first)

  20. Getting the data • Juiciest Carrots (Best Practice resources only) • "Preferred" EMELD URI's • Marked as such in a search • Could undergo "advanced" search techniques • Be peer-reviewed and vetted by LDRA • (Linguistic Digital Resource Association)* *This organization does not exist, as far as I know.

  21. Granularity • Right now there are no recommendations for the granularity of exposed metadata records • Large archives, for example, have hierarchical structure, one level of which must be isolated (the IMDI session, for example) • Cutting-edge archives don't work well with the resource=object model. Their resources are "created" based on the user's needs

  22. Granularity • The lack of recommendations on this issue inhibits metadata creation • Granularity makes a big difference as to what content is searchable • Two different audience's in need of advice • "Real" archives (a.k.a. trusted repositories) • Individuals

  23. Granularity • Recommendation: EMELD should encourage IMDI and OLAC to devise best-practice recommendations for granularity

  24. The questionnaire • Two broad kinds of questions: • What kinds of things would you like? • What kinds of would you hate hate? (Dafydd's Corollary)

  25. The questionnaire • Part one: Search capabilities • How do you want to conduct your search (google-style, directory-style, pull-down menus...)? • What kinds of searches are you doing already on other sites? • Search within results? (We wanted this.) • Thesaurus-based search

  26. The questionnaire • Part Two: Search content • Free entry (like Google) • Feature-based entry • Statistical questions • Phonetic characters • Geographical search • Time search • ...

  27. The questionnaire • Part Three: Results • Google-like results • Journal abstract search-like results • Restricted results (only return web sites, .pdf documents, ...) • ...

  28. The questionnaire • Format • Online submission • Combination multiple choice (for the uncreative) and free form (for the creative) • Encourage people to envision the search of the year 2503

More Related