Discovery Metadata

Discovery Metadata Metadata Standards vs. New Requirements Lambert Heller, TIB Open Science Lab 11th euroCRIS Strategic Seminar Brussels, September 9-10, 2013

About me Lambert Heller • Social scientist & librarian • involved with OA & CRIS for Hannover University until 2012 • since 2013 “Open Science Lab” at TIB; WGL “Science 2.0” • Co-author of book “Opening Science” (Springer, October 2013) • @Lambo at Twitter

Agenda • Scholarly objects on Uni Hannover website – some examples • Ways to manage discoverable “patchwork metadata” • Possible challenges & stuff to discuss …

1. Scholarly objects on Uni Hannover website Some examples (all not to uncommon) • Linked and/or embedded YouTube videos of individual lectures (consider technical metadata for videos!) • Own YouTube channels of professors, but also twitter accounts • Structured “recommended literature” lists next to own articles • Whole institutes websites as wikis, with collaborative work of faculty members & alumni in it (example) • In Wikis and elsewhere: Connections between objects like e.g. videos and traditional materials, “clusters” • … • Bottom line: Complex “patchwork” of heterogenous, connected scholarly objects. Objects hosted in many places, often changing, collaboratively authored etc.

2. Ways to manage discoverable “patchwork metadata” • Individual researchers websites vs. CRIS approach • Facebook-like business models, e.g. ResearchGate • Aggregation based scientist profile / network services • Possible future directions for aggregation services

a. Individual researchers websites vs. CRIS approach • Individual institute and / or researcher websites • up and running since inception of WWW (think CERN!) • won’t go away with CRIS, sometimes mix with it • will stay unpredictable, chaotic metadata patchwork • New(ish) breed of institutional CRIS databases / portals • projected + driven by research administration staff • staff has authority to use (some) existing data • tries to put CRIS into scientists’ record preparation loop • …sometimes incentives (money) • almost always cuts off the patchwork richness, and many faculty / institution “unrelated” data as well

b. Facebook-like business models, e.g. ResearchGate • will work for many, but most probably never for all scientists • …“not all” = dealbreaker for expectation of complete answers to any queries • but at least a lesson to take from them: recency, simplicity, puts researcher in control! (To the extent possible in the FB-like business model.)

c. Aggregation based scientist profile / network services • BiomedExperts – was based on literature harvesting; predecessor / building block of Elsevier SciVal Experts • Direct2Experts – network of CTSA member institutions (and others) mostly in biomedical area; institutions open + prepare their researcher metadata for harvesting • Several new nationwide / funder approaches, not only aimed at discovery, but research measurement as well

d. Possible future directions for aggregation services • AgriVIVO et al. – harvesting from institutional, heterogenous data for better discovery in one research area; may well be reproducible for some more communities? • ORCID – bulk registration of scientists through institutions, scientists then ultimately in controll of their data; certainly an important building block for future aggregation services • Proposal currently prepared by L3S and TIB: Large scale web crawl, then analyzing with few assumptions upfront – may result in reusable “web observatory” style collections / snapshots of institutional, heterogenous data?

3. Possible challenges & stuff to discuss … • Aggregators & assessment tools will be widely scattered – let’s make them linked & discoverable, too! • “Social Media” will be a workbench – get the details! • Let’s extend & apply trusted vocabulary for “social web” type object relations

a. Aggregators & assessment tools will be widely scattered – let’s make them linked & discoverable, too! • As good metadata of scholarly objects gets openly available, services collecting, computing and comparing data become abundant. This may even help to get rid of reliance on single research measurements. (Cf. San Francisco Declaration on Research Assessment, DORA) • Problem: Archives / hosts (e.g. uni repositories, publishers, Wikimedia) won’t include / link to every meaningful service. • Challenge: Establish a standard similar to “semantic pingback” so archives / hosts get structured info on how metadata is used by 3rd parties. Users can be given multiple options in which context to view / compare the object they are interested in. • Example: Algorithms and services comparing / rating individual Wikipedia contributions and contributors. (Examples 1,2,3) • Example: Aggregation services mentioned on slides before.

b. “Social Media” will be a workbench – get the details!

b. “Social Media” will be a workbench – get the details! • Problem: Blogs, wikis etc. often perceived as anonymous heap of data that can be queried in total to derive some altmetrics number on a given DOI (or similar) • Instead, researchers’“SM” profiles will deliver dynamic, rich information often (but not always) connected to their traditional resarch products • Challenge: Get meaningful, rich metadata on level of single objects (and changes to collaborative objects) into CRIS. By definition: No trusted archive = no DOI. So maybe we need a layer (think URL shorteners) of HTTP-Handles for each object that deliver machine readable, well linked metadata (e.g. JSON-LD) upon request. • Examples: Make each Wikipedia-Edit, Github-Pull Request, MathOverflow forum post… easily linkable, countable, citable.

c. Let’s extend & apply trusted vocabulary for “social web” type object relations • Problem: Stating new object types (e.g. program code, blog entries, wiki contributions…) are e.g. “research data” and append them to an article, or just to append “SM”-profiles to scientists’ profile pages, is not sufficient for discovery. • Challenge: We need ontologies to relate objects to each other and to “traditional objects” like journal articles. We should work with and extend upon David Shottons CiTO ontology. Model implementations will be the tricky and interesting part. • Example: The “Viewed / cited / saved / discussed” ontology of article level metrics, proposed in NISO ISQ 2013, v.25, no.2

Thank you for your attention!

Discovery Metadata

Discovery Metadata

Presentation Transcript

Discovery Metadata for Special Collections Concepts, Considerations, Choices

Using RDF/OWL Technologies for Discovery and Use Metadata

Conversational Case Base Recommender Systems for Metadata Discovery

Resource description, discovery, and metadata for Open Educational Resources

Parameter Visualizations as Metadata to Facilitate Discovery

DDI: Metadata to support collection processes, discovery, and comparability

Metadata-Centric Discovery Service

Improving Data Discovery in Metadata Repositories through Semantic Search

METADATA

Metadata for the Web From Discovery to Description

Metadata

An introduction to the MEDIN Discovery Metadata Standard

An introduction to the MEDIN Discovery Metadata Standard

Automatic Metadata Discovery from Non-cooperative Digital Libraries

Metadata

Resource description, discovery, and metadata for Open Educational Resources

Open Discovery: Collaborative Approaches to Metadata

Using RDF/OWL Technologies for Discovery and Use Metadata

Conversational Case Base Recommender Systems for Metadata Discovery

Discovery Metadata