1 / 18

Methodologies and approaches for repository aggregation

Methodologies and approaches for repository aggregation. Pat Lockley University of Nottingham 19 th April 2010. I’ve got a brand new combined harvester and I’ll give you the key. Pat Lockley University of Nottingham 19 th April 2010. The theory.

keith
Download Presentation

Methodologies and approaches for repository aggregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methodologies and approaches for repository aggregation Pat Lockley University of Nottingham 19th April 2010

  2. I’ve got a brand new combined harvester and I’ll give you the key Pat Lockley University of Nottingham 19th April 2010

  3. The theory • Out in the world are lots of repositories using RSS feeds

  4. The theory continued………. • So one site could bring all those feeds together, without the need to upload

  5. The theory continues………. At Nottingham we have Xerte Online Toolkits This allows for content to be created online, and as one of it’s features allows for the simple automated creation of DCMI rich RSS feeds Xerte Online Toolkits is free and open source

  6. The idea Lots of Toolkits installs could easily create a standard RSS to be harvested in different ways by different people

  7. The process… Robot #1 So we built a harvester, a bit like a basic web robot This would go off and get the RSS feeds, download them to a server, and look in the data for OER materials. Given RSS feeds are a standard, this would be an easy task…..

  8. The 2nd robot… Sadly, even between DCMI rich RSS and normal RSS there are differences The link node, which contains the URL of the OER piece is sometimes empty Sometimes other nodes are used So gradually the robot got smarter…..

  9. The 3rd robot… Now we had to tell which feed type was which…. Establishing a fingerprint Knowing what your “fetching” Getting as much metadata as possible But…..

  10. The 4th robot… Metadata comes in many forms, so the robot needs to be aware Subject Category Author Creator Description Related content

  11. The 5th robot… Taking a preference Dealing with conflict Dealing with spam Dealing with bad metadata

  12. The 6th robot… Don’t forget we have users, all this metadata needs to be searchable Does the user care? Results driven approach? How to search best? Search evaluation? Does it need an explanation?

  13. The 7th robot… What to do when the RSS isn’t even RSS 80 RSS feeds 20 aren’t valid 5 aren’t XML

  14. The 8th robot… RSS for humans or machines All of the content? Some of the content? How do we want to talk?

  15. The 9th robot… Is there more content in other forms? OPML? RSS? OAI? SRU? Thinking beyond the field?

  16. The 10th robot… Making it all make sense Effort to make an aggregator Harmonisation Handling new challenges Scope

  17. The 11th robot… Making it smart Harvests every day (approximately 25 new items a day) Knows which items have been deleted Knows which items have moved Knows what people are looking for

  18. Contacts Pat Lockley - Xpert Julian Tenney - Xerte Steven Stapleton – Berlin OER

More Related