1 / 7

Australian web domain harvests 2005, 2006 & 2007

Australian web domain harvests 2005, 2006 & 2007. Igor Ranitovic Internet Archive engineer With Petabox rack For Australian domain harvest. PANDORA : Domain Harvesting. Australian domain harvest .au domain, located on Australian servers Internet Archive 1 st harvest June/July 2005

Download Presentation

Australian web domain harvests 2005, 2006 & 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Australian web domain harvests2005, 2006 & 2007

  2. Igor Ranitovic Internet Archive engineer With Petabox rack For Australian domain harvest

  3. PANDORA : Domain Harvesting • Australian domain harvest • .au domain, located on Australian servers • Internet Archive • 1st harvest June/July 2005 • 4 weeks, 185m files, 6.69 TBs • 2nd harvest Aug/Sept 2006 • 5 weeks, 596m files, 19.04 TBs • 3rd harvest Aug/Sept 2007 • 4 weeks, 516m files, 18.47 TBs

  4. Comparative statistics PANDORA DomainHarvests

  5. PANDORA : Domain Harvesting

  6. PANDORA : Domain Harvesting • Some pros – • Retains linkages and context • Large scale – more bytes for the buck • Less selectively discriminate • Some cons – • High dependence on the crawler technology • Domain and geo-location bias (.au, geoIP) • Limitations in timeliness, quality assurance, scoping, site complexity, deep web • Legal and access issues to resolve

  7. PANDORA : Australia’s Web Archive • Enormous growth and volume of material • Everyone can be creators and publishers • Virtually instantaneous publication • Dynamic content and format • Multiplicity of formats • Technology dependent • Hyperlinked and interconnected • Highly accessible but hard to identify • Ephemeral • Interactivity, re-use, personalisation (web 2.0)

More Related