1 / 20

NEEO Technical Workshop 2 Exchange of usage metadata

NEEO Technical Workshop 2 Exchange of usage metadata. Sciences Po, Paris January 15th, 2009. Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels. Plan. Reminder of planning Problem description First proposal – OAI exchange of SWUP

anais
Download Presentation

NEEO Technical Workshop 2 Exchange of usage metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels

  2. Plan • Reminder of planning • Problemdescription • First proposal – OAI exchange of SWUP • Currentimplementation (OAI/SWUP) at ULB (DSpace) • Proposalvariants / issues

  3. Reminder

  4. EO usage data service Current ideas for EO: • how many times every item in the IR has been read • which item (and by extrapolation which author, department, …) is the most popular within the institution or within a given research domain • an evolution on the usage of the IR in its whole • search results get ranked on frequency of download of the object files In more advanced environments, mining of the usage data could yield other very interesting value-added services, like: • the creation of a network of (clusters of) related publications: publications that are read by the same person within a certain amount of time can be considered to be similar in some way • recommender systems, in which the end user gets a recommendation on which other publications are of possible interest in relation to a document he wishes to retrieve

  5. Information of interest • an identification of the object file that was downloaded • an identification of the corresponding item • an indication of the date and time at which this item was downloaded • an identifier of an end user who downloaded this item • an indication of what type of usage has been done (abstract view, download request) • identification of the service from where the usage request was made by the end user • identification of the web page from which the request was initiated • application that has sent the request • Example: • http://bib11.ulb.ac.be:8080/dspace/handle/2013/781 • Downloadrequest for sameobject file from EO portal searchresult for phrase "wage dispersion and firm performance"

  6. Need for harmonization • This information isstored on the IR platform in log files of all sorts, with all sorts of formatting (Apache log, DSpace log, …) • Wewant to getthis information in the EO gatewayin a normalizedway: • Decide on exchange format • Decide on way to exchange • First proposal: • SWUP ContextObject • OAI-PMH

  7. OpenURLContextObject • An OpenURLContextObject is defined as a data structure that holds information on the following 6 entities: • Referent: this entity corresponds to the resource which this ContextObject is about • ReferringEntity: an entity that references the Referent • Requester: an entity that describes the resource that requests services pertaining to the Referent • ServiceType: type of service requested • Resolver: a resource that can deliver the requested services • Referrer: a resource that generates the ContextObject

  8. OpenURLContextObject • Each of these entities is described through descriptors, which can be of 4 different types: • identifier: identifier for the entity • metadata-by-val: metadata about the entity; the metadata is included ‘by-value’ in the ContextObject • metadata-by-ref: metadata about the entity; the metadata is available at a network location • private-data: metadata about the entity; the format is not defined within the OpenURL Framework (but rather defined within a specific community) • SWUP = proposal on how to use the OpenURLContextObject concepts to describe usage events

  9. WARNING • This is a draftproposal • There are outstanding issues

  10. Information mapped to SWUP • an identification of the object file that was downloaded; and an identification of the corresponding item • Referent • an indication of the date and time at which this item was downloaded • Contextobject attribute • an identifier of an end user who downloaded this item • Requester • an indication of what type of usage has been done (abstract view, download request) • ServiceType • identification of the service from where the usage request was made by the end user • Referrer • identification of the web page from which the request was initiated • ReferringEntity • application that has sent the request • ?

  11. Example • See guidelines

  12. Exchange of SWUPs • OAI-PMH

  13. Implementation in DSpace • University Of Minho (PT) • Statisticsadd-on module • Automaticallytransforms a DSpace log entry into a specificdatabase entry • [ Massagingwithindatabasepermitting all sorts of usage reports ] • ULB • Minimal adaptation: HTTP Referer and User-Agent added to dbentry • Example of database entries • OAICat software: OAI-PMH DP • Crosswalkwhichtransformsdb entry into SWUP ContextObject • http://bib15.ulb.ac.be:8080/dspace-oai-downloads/request?verb=ListRecords&metadataPrefix=swup • More info: http://www.bibhost.ulb.ac.be/RDIB/DISpace/DIfusion%201.4.2/Statistics/index.html

  14. Proposalvariants / issues • Other information of interest • application that has sent the request (User Agent) • Referrer • the repository to which the request was sent • Resolver • baseUrl of the fileserver • use? • URL of the request • use? • geographical info in requester • unnecessary? Can be determined in EO gateway, based on IP address of requester (if not encrypted) • OAI identifier • use? • institution identifier • Is this not already available on the EO Gateway?

  15. Proposalvariants / issues • “Primary” identifier is the one of the object file • JISC: publication • Irrelevant discussion? The two identifiers need to be there, however encoded • “For the publication a new namespace is introduced: http://identifier.economistsonline.org/. The idea is that acting on the URI of the publication results in a redirection to the metadata as stored in the EO gateway.” • Using original+enriched metadata, instead of original metadata from IR?

  16. Proposalvariants / issues • Could be a big XML payload • minimally needed • Identifier of the request • Datetime of the request • Referent: identifier for item and object file • Requester: (encrypted) IP address • Referrer: identifier for User Agent or originating web service • ReferringEntity: identifier for originating web page • ServiceType: identifier • Resolver: identifier for repository • Alternative format to SWUP: one line containing all information (as a variant of the Combined Log Format)

  17. Proposalvariants / issues • Alternatives for exchange: • HTTP / FTP Get of files containing one-line log entries • OAI exchange of files containing one-line log entries • HTTP / FTP Get of OAI-ListRecords-Reponse formatted files containing SWUP ContextObjects • File nomenclature? • Option 2 requires administration of files (filename - datetime)? • If file exchange, size is less of an issue: we should go for XML formatted information? • Filtering out double clicks • No agreement on double click period (COUNTER, Eprints, LogEC). • What do we do in EO?

  18. Proposalvariants / issues • Filtering out robot requests • We must set up (and maintain) a filtering algorithm to beusedby all partners fordistinguishingrealdownloadsfromdownloadsby machines. • Authoritative list of robots? • List of regularexpressions, rules • Remove all HEAD requests • Some bots canberecognizedbytheirip-address • Discover bots frommining EO database withusage log entries: • bots canbeactiveday and nigth, • bots generatemuch more eventsthanhumanbeings • bots regularvisit the sameURLs • LogECeliminatesuserswhoaccess more than 10% of all items in RePEc withinonemonth

  19. Proposalvariants / issues • Exchange of IP addresses of requesters • Infringement on privacylaws? • How to anonymizerequester information? Level of encryption?

More Related