1 / 26

Caught in the Web: Web Archiving at U of A Libraries

Caught in the Web: Web Archiving at U of A Libraries. Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta. Official children’s site of the 2000 Sydney Olympics - MIA: http://www.olympics.com/eng/kids/index.html?/eng/kids/home.html.

suzuki
Download Presentation

Caught in the Web: Web Archiving at U of A Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta

  2. Official children’s site of the 2000 Sydney Olympics - MIA: http://www.olympics.com/eng/kids/index.html?/eng/kids/home.html

  3. GeoCities: 1995-2009 http://www.pcworld.com/article/163765/so_long_geocities_we_forgot_you_still_existed.html

  4. Mind the Gap - UK “If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics - perhaps exacerbated by the current economic climate that is killing companies - the memory of the nation disappears too. Historians and citizens of the future will find a black hole in the knowledge base of the 21st century.” Quote: http://www.guardian.co.uk/technology/2009/jan/25/internet-heritage

  5. Digital Special Collections Special Collections in ARL Libraries – March 2009A Discussion Report from the ARL Working Group on Special Collections “New definitions need to be created for determining the scope of digital special collections, so that stakeholders can understand the nature of special collections professionals’ responsibilities. These include a responsibility for harvesting and preserving endangered web sites, wikis and other dynamic information resources.”

  6. Looking ahead… • 234 million – The number of websites as of December 2009. • 47 million – Added websites in 2009. • 126 million – The number of blogs on the Internet (as tracked by BlogPulse). • 27.3 million – Number of tweets on per day (November, 2009) • 350 million – People on • 4 billion – Photos hosted by (October 2009). • 12.2 billion – Videos viewed per month on in the US (November 2009). http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/

  7. Does the web matter? Only if our cultural, historical, political, economic, and social memories matter. • Valuable BUT vulnerable – e.g. foundation losses funding; can only afford digital publishing. • Research and analysis – longitudinal view requires a complete picture. • SOMEONE needs to take responsibility for it.

  8. Web Archiving Web Archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for automated collection. Wikipedia, “Web Archiving”

  9. how web archiving works • A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner.

  10. ARCHIVE-IT

  11. Web Archive Admin Screen

  12. HCF Collection

  13. Seed Management

  14. Reports

  15. Reports

  16. File Type Report

  17. Blocked Content Robots.txt

  18. Web Archive Launch Page

  19. Exposing Hidden Content

  20. U of A Web Archive • Partner with Internet Archive on the use of Archive-It • Three targets: (criteria: thematic, regional, event-based, organizational) • Heritage Community Foundation (collection at risk) • University of Alberta websites 3) Western Canadian materials (e.g. political websites)

  21. A few resources • University of Alberta Web Archive: < www.archive-it.org/home/ualwebarchive > • Archive-it! and Wayback Machine <www.archive.org/web/web.php> • IIPC – International Internet Preservation Consortium • Use Cases for Access to Internet Archives, IIPC Access Working Group, <netpreserv.org> • Special Collections in ARL Libraries, Report March 2009 • GoC Web Archive <http://www.collectionscanada.gc.ca/webarchives/index-e.html>

  22. thanks Geoff Harder Digital Initiatives Coordinator geoffrey.harder@ualberta.ca Kenton Good Web Development Librarian kenton.good@ualberta.ca

More Related