1 / 28

Subject Repositories European collaboration in the international context 28-29 January 2010

Subject Repositories European collaboration in the international context 28-29 January 2010. Workshop Technical infrastructure & interoperability Benoit Pauwels Université Libre de Bruxelles, Belgium. Workshop plan. Theme 1: The Economists Online network of data providers

damien
Download Presentation

Subject Repositories European collaboration in the international context 28-29 January 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subject RepositoriesEuropean collaboration in the international context 28-29 January 2010 Workshop Technical infrastructure & interoperability Benoit Pauwels Université Libre de Bruxelles, Belgium

  2. Workshop plan • Theme 1: The Economists Online network of data providers • General infrastructure of the EO solution • DIDL/MODS: the EO metadata exchange format • RDF/XML Admin file: decentralized administration • Enrichment of metadata • Theme 2: Economists Online and RePEc • Pullingmetadatafrom RePEc • Pushing metadata to RePEc • Contribute to LogEC • Use CitEC

  3. Workshop plan • Theme (45’) • Introduction (BP, 20’) • 3 topics for brainstorming (breakout groups,10’) • Breakout groups reporting back (all, 15’)

  4. The Economists Online network of data providers • Theme 1: The Economists Online network of data providers • General infrastructure of the EO solution • DIDL/MODS: the EO metadata exchange format • RDF/XML Admin file: decentralized administration • Enrichment of metadata

  5. Metadata Logs Objects OAI-PMH HTTP Meresco Harvester Crawler Metadata Lucene SRU RePEc OAI-PMH RSS EO portal Homemade - FOSS Exporter engine Homemade - FOSS Other portals

  6. Metadata Logs Objects OAI-PMH HTTP Metadata exchange format DIDL / MODS NEEO specs Meresco Harvester Crawler Metadata Usage metadata exchange format SWUP OFI Comm Profile Lucene SRU RePEc OAI-PMH RSS EO portal Homemade - FOSS Exporter engine Homemade - FOSS Other portals

  7. Technical decisions

  8. Metadata exchange format • XML container structure that can hold semantically distinct metadata • descriptive metadata • object files (by-ref) • splash page • enriched metadata • JEL • full text (by-ref) • datasets (by-ref) • [ references ] • RePEc handle and metadata (by-ref) • DIDL • Based on existing container structure defined by SurfShare • “info:eu-repo” vocabularies (objectfileaccessRights, version, ...)

  9. Metadata exchange format • Granular descriptive metadata • MODS (3.2) • Based on existing metadata structure defined by SurfShare • “info:eu-repo” vocabularies (publication type, • Unambiguous identification of authors • DAI – Digital Author Identifier • National or institution-unique persistent identifier • Solutions not specific to the NEEO project; continuous aim of standardization at a level that surpasses the project

  10. DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Descriptor/modified Item[1..∞] (of type descriptiveMetadata) Descriptor/type (« descriptiveMetadata ») Descriptor/Identifier (persistent identifier) Descriptor/modified Component/Resource -- representation by value (XML) Item[0..∞] (of type objectFile) Descriptor/type (« objectFile ») Descriptor/Identifier (persistent identifier) Descriptor/modified Component/Resource -- representation by ref. (URL) Item[0..1] (of type humanStartPage) Descriptor/type (« humanStartPage ») Component/Resource -- representation by ref. (URL) • EO Data model • Publication isdescribed as a complex (compound) object • persistent identifier • Aggregation of 3 types of components • descriptiveMetadata (MODS) • objectFiles • humanStartPage • Extensible • additional items canbestoredwithin the complexobject • MODS • contains Digital Author Identifier (DAI) of EO author

  11. Metadata exchange format • Implementations in NEEO • DIDL application profile • MODS application profile • Vocabularies in DIDL and MODS • Technical guidelines for project partners • Solutions: home-made or with external support • ARNO: home-made • Dspace: home-made, AtMire • Eprints: home-made, ECS-University Of Southampton • Fedora: METS/MODS -> DIDL/MODS • DigiTool: METS/MARC -> DIDL/MODS

  12. Decentralized registry service • XML-RDF file • FOAF + NEEO-specific vocabulary • maintained by each data provider on a local web server • information of institution : name, description, ... • OAI baseURL + OAI sets to harvest • EO authors: photograph, full name, affiliation, DAI • HTTP get and validated by EO Gateway at regular intervals • Automated harvesting process • Made visible through portal • New partner • Create admin file • Ask for registration at economistsonline@uvt.nl , declaring location and validating admin file • If valid, you’re in

  13. Metadata Logs Objects OAI-PMH HTTP Meresco Harvester Crawler Metadata Lucene SRU RePEc OAI-PMH RSS EO portal Homemade - FOSS Exporter engine Homemade - FOSS Other portals

  14. Metadata Logs Objects OAI-PMH HTTP Meresco Enrichment service Harvester Crawler OAI-PMH Metadata Lucene SRU SRU RePEc OAI-PMH RSS/Atom EO portal Homemade - FOSS Exporter engine Homemade - FOSS Other portals

  15. Metadata enrichment • “Automated” enrichment – JEL, full-text • ES gets records to be enriched from EO, over SRU • Based on date of request for enrichment of certain type and version • Based on flag set in EO record • ES creates enrichment record(s) • ES makes enrichment records available to EO, over OAI-PMH • EO harvests enrichment records from ES and integrates into original record • EO reuses enrichment information in its services: index & present • “Manual” enrichment – datasets • Partner enters permalink of publication on DVN platform • EO PMH-harvests DDI from DVN, and stores by-ref information

  16. Enriched publication IR / ES EO DIDL[1] Item[1] PDF HTML Descriptor/Identifier (persistent identifier) TXT Descriptor/modified Item[1..∞] (of type descriptiveMetadata) Dataset DDI Item[0..∞] (of type objectFile) Item[0..1] (of type humanStartPage) LinkedData / SemanticWeb / ORE ready Item[0..∞] (of type text) Item[0..∞] (of type enrichedMetadata) Item[0..∞] (of type dataset) Review Descriptor/Identifier (persistent identifier) Descriptor/modified Item[0..∞] (of type review) Item[1..∞] (of type descriptiveMetadata) Item[0..∞] (of type objectFile)

  17. Theme 1: The Economists Online network of data providers • BO Group 1: DIDL/MODS • Scalable? Implementation by 100s of partners • Local experiences from existing partners: implementation issues you want to share? • Can this become a standard for exchange of metadata of IR contained publications? Where does this stand next to (flavours of) DC, SWAP,...? • BO Group 2: XML Admin file • Scalable? Implementation by 100s of partners • Local experiences from existing partners: implementation issues you want to share? • DAI? • BO Group 3: Enrichment model • Extensibility: vocabulary for semantics of components • Manual enrichment: need for enriched submission form, making it easy for people to make enriched publications • Automated (JEL, full text): sustainable?

  18. Workshop plan • Theme 2: Economists Online and RePEc • Pullingmetadatafrom RePEc • Pushing metadata to RePEc • Contribute to LogEc • Use CitEc

  19. RePEc model • RePEc archives contain RePEc series contain Working papers, Articles, Books, Book chapters, Software • Manually maintained by research centres, journal publishers, university departments all over the world • +/- 900 archives, more than 4000 series • ReDIF metadata format • Network accessible over FTP or HTTP • Aggregation by RePEc services: • EconPapers • IDEAS • Central PMH-accessible aggregated archive of AMF formattedmetadata

  20. RePEc model • Template-type: ReDIF-Paper 1.0 • Author-Name: Capron, Henri • Author-Email: hcapron@ulb.ac.be • Author-Name: Meeusen, Wim • Author-Email: wim.meeusen@ua.ac.be • Author-Name: Dumont, Michel • Author-Person: pdu51 • Author-Name: Cincera, Michele • Author-Person: pci5 • Title: National innovation systems: pilot study of the Belgian innovation system • Creation-Date: 1998 • Publication-Status: Published as a report for the Belgian Federal Office for Scientific, Technical and Cultural Affairs (OSTC) • File-URL: http://bib17.ulb.ac.be:8080/dspace/bitstream/2013/941/1/mc-0048.pdf • File-Format: application/pdf • Handle: RePEc:dul:ecoulb:2013-941

  21. RePEc model compared to IR model • Very similar • BUT • RePEc model: • Harvests only from “official” publisher repositories • Therefore: 1 work exists once in RePEc and it is guaranteed the one and only “official” manifestation of the work • IR model: • holds publications for which institution is typically not the publisher • 1 work  1 official manifestation + multiple author manifestations • one work can exist in: • one or more repositories • as different publication types • with different descriptive metadata • with different object files attached • with different object file metadata • Pushing and pullingmetadata records from RePEc and IR into one system isbound to raiseproblems

  22. Pull metadata from RePEc • EO harvests AMF formatted metadata records from http://oai.repec.openlib.org/ • Overlap !! • Same records are harvested from IR and RePEc • Solution: • XML Admin file contains directive <not-from-repec-series> • Permits to specify which RePEc series do not need to be harvested from RePEc, since already delivered through IR • BUT: • IR contains articles produced by its authors • These articles are contained in a journal RePEc series • Overlap in EO cannot be avoided

  23. Push metadata to RePEc • EO sets up “RePEc:ner” archive, containing ReDIF-X formatted records • ReDIF-X • All records are delivered as “ReDIF-Paper”, but with extra fields denoting the “real” publication status and version of text • Overlap !! • Most institutions already maintain RePEc series: these records must not be pushed by EO • XML Admin file controls which series to feed in this “ner” archive • <feed-repec> • boolean: to feed or not to feed • <feed-repec-series> • If not given: all records with fulltext that are not working papers are mapped to one series for that institution • RePEc series  OAI setspec of DIDL/MODS record • BUT • IR inherent problem of multiple copies/versions is pushed to RePEc

  24. Push metadata to RePEc: ReDIF-X Template-type: ReDIF-Paper 1.0 Title: Block investments and the race for corporate control in Belgium Author-Name: Chapelle, Ariane Language: en Note: info:eu-repo/semantics/published X-PublishedAs-Type: article X-PublishedAs-Article-Year: 2004 X-PublishedAs-Article-Journal: CorporateOwnership & Control X-PublishedAs-Article-Volume: 2 X-PublishedAs-Article-Issue: 1 Order-URL: http://dipot.ulb.ac.be:8080/dspace/handle/2013/9943 File-URL: http://dipot.ulb.ac.be:8080/dspace/bitstream/2013/9943/1/ac-0007.pdf File-Format: application/pdf File-Version: authorVersion Handle: RePEc:ulb:ecoulb:2013/9943

  25. LogEc • Aim: track abstract views and download clicks of publications presented through RePEc services (EconPapers, IDEAS, ... Economists Online) • NOT: tracking of usage at the level of the archives • Downloads of publications contained in RePEc archives, initiated through a Google user do not show up in LogEc • How: • EO logs clicks abstract views and download clicks of object files • On a monthly basis, EO transforms these log entries into requested LogEc format, using “rstat.pl” • 2009-10 EconomistsOnline RePEc:aah:aarhec:1987-21 a: 65.55.207.69 66.235.124.10 d: 66.235.124.10 • RePEc handle of publication is necessary •  EO partners delivering content to RePEc directly (and that EO therefore doesn’t harvest from RePEc but from the IR) must include the RePEc handle in the DIDL/MODS record

  26. LogEc RePEc EO DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Descriptor/modified Item[1..∞] (of typedescriptiveMetadata) Item[0..∞] (of type objectFile) Item[0..1] (of type humanStartPage) RePEc (AMF metadata) Item[0..∞] (of type descriptiveMetadata) RePEc handle Descriptor/modified byRef

  27. CitEc • Aim: citation analysis for RePEc publications • How: • Analyze text: extract and parse list of references from publications • References are checked whether available in RePEc • Cites: • references to other RePEc publications • Textual references • CitedBy • Co-citations • EO publications (from our IRs) are pushed to RePEc and are therefore pulled through the CitEc processing • EO has access to the resulting CitEc data, and presents this through the EO portal (not yet, will be in Feb 2010) • RePEc handle of publication is necessary •  EO partners delivering content to RePEc directly (and that EO therefore doesn’t harvest from RePEc but from the IR) must include the RePEc handle in the DIDL/MODS record

  28. Theme 2: Economists Online and RePEc • BO Group 1 : Push/pull to/from RePEc • ReDIF-X data structure • Duplicates; different versions of identical publication • BO Group 2: Publishing models • Advantages/disadvantages of RePEc publishing model as opposed to IR publishing model • Push the twomodelstogether? Do weneed to foreseespecific services in the gateway or portal to makethesetwo live together in peace? • BO Group 3: Future RePEc/EO services • What services should EO and RePEc jointly be looking at in the future in the interest of the economics researcher ?

More Related