Archive-It: Archiving & Preserving Digital Content
Internet Archive • We are a Digital Library • Founded in 1996 by Brewster Kahle • Located in San Francisco California
www.archive.org • Largest publicly available web archive in existence • Accessible starting in 2001 • 400 Billion+ URLs • 80+ million websites • Content in 40+ Languages • Collect a snapshot of the web every 60-90 days 361 Billion pages saved
Web Archiving Service: Archive-It Archive-It is a subscription service launched in February 2006 • Web based application that allows users to create, manage, access and store collections of digital content • The service is a fully hosted solution, and includes access and storage. • Provides tools for selection and scoping including cataloging with metadata • Ability to capture content using 10 different time frequencies • Archived content includes: html, text, videos, audio, social media, PDF, images, online newspapers • Can browse archived content 24 hours after a capture is complete; and full text search is available within 7 days • Restricted access options are available
What is Web Archiving? Web archiving is the process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re use. A web archive is a collection of archived URLs grouped by theme, event, subject area, or web address.
Challenge: a lot of data Amount of content that is being archived Amount of data being created by content providers http://www.helenbrowngroup.com/2011/02/rescue-from-the-digital-firehose/gushing-firehose-by-joseph-robertson/ http://www.chaitalag.com/new/s/tubig
Challenge: What to archive? …What is important to you? What do you want people to know about? What are your organization’s collecting activities? Vision?
Archive-It Use Cases • Create a thematic/topical web archive on a specific subject or event. • Different perspectives and social commentary (tweets, blogs, comments). • Can include Spontaneous Events • Often related to traditional collecting activity around the same focus • Mandate to capture/preserve institutional memory and history. Construct an historical record of an institution’s web presence over time. • Support an electronic records system to meet records retention requirements. • Capture publications that aren’t being deposited in print form. • Closure crawls
Access to Public Collections Partners: • Can view through private web application with login/password General Public: • Can view from Archive-It website: http://www.archiveit.org/ • Landing Pages: view from organization’s website with a branded page that links back to Archive-It hosted data • Integration with existing systems and catalogs
Storage & Preservation Multiple ways to Store and Preserve Storage: • 2 copies of the archived data (primary and back-up) are stored at San Francisco Data Center • Collections transferred to the General Archive as a third copy • A copy of archived data can be shipped on a hard drive • Ability to download files from Internet Archive servers Digital Preservation: • 2008: LOCKSS • 2013: Duracloud
Web Archiving Life Cycle Model http://www.archive-it.org/publications
Questions & Answers Lori Donovan email@example.com Thank you!