1 / 27

What is the Internet Archive

Creating Digital Web Archives through Collection, Collaboration, and Curation Kristine Hanna Director of Archiving Services For the Fulbright Academy Workshop – Jan 24-25, 2011. What is the Internet Archive. We are a Digital Library Mission Statement: Universal access to human knowledge

colman
Download Presentation

What is the Internet Archive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating Digital Web Archives through Collection, Collaboration, andCurationKristine HannaDirector of Archiving ServicesFor the Fulbright Academy Workshop – Jan 24-25, 2011

  2. What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Largest public web archive in existence founded in 1996 by Brewster Kahle in San Francisco California In 2007 officially designated a library by the state of California

  3. What is in the General Archive • 200+ billion web pages, aggregated from 85 million websites in over 40 languages • Books and Texts • Films and Videos • Audio and the Spoken Word • Still Images • NASA Images • Open Library • Software and Educational tools

  4. Storage and Preservation Multiple copies in multiple places • Data stored on over 4000 servers. Standard storage boxes, open source design • 2 Copies Online (primary and back up) in San Francisco Bay Area Data Centers • 3rd Copy at Sun Microsystems in Santa Clara, California • Partial mirrors in Egypt, France and Netherlands • Partners can receive copies of their data

  5. Data Repository

  6. Why are We doing this? • Hundreds of billions of people around the world have grown accustomed to using the web as their primary resource to acquire information. • The availability of this electronic information is taken for granted and it is a fallacy that if something is on the web it will be there forever. • There’s an essential need for people to understand that the web represents who we are. It’s our culture and our social fabric, and we don’t want to lose it.

  7. Why our Partners are doing this? • Construct an historical record of an institution’s web presence over time by archiving main website and other sites on which the institution is mentioned. • Collaborate with other institutions and share research • Assemble a comprehensive data base of information on a topic, photograph or individual with different perspectives. Capture social commentary - tweets, blogs, comments. • Maintain strong electronic records management system. • Create a web archive on a specific topic, subject or event • Capture and archive "at risk" digital content on a spontaneous event

  8. Who are our Partners? Over 200 partners in 25 countries and 47 U.S. States: National Libraries and Federal Instiutions U.S. National Archives (NARA) U.S. State Archives/Libraries University Libraries Museums and Art Libraries Local (city) Institutions and Public Libraries Historical Societies

  9. Open Source Technology developed by the Internet Archive & IIPC How do we collect the Content? Heritrix: Web crawler – captures pages. Wayback Machine: Renders pages– makes it possible to view those pages and surf the web as it was. NutchWAX: Search engine – provides full-text search

  10. Web Archiving Services/Models WWW crawls: broad snapshots run in house by crawl engineers Contract Crawls: focused and curated crawls run in house by crawl engineers Archive-It: Web based application that allows partners to create, manage and preserve collections of highly curated digital content. • Functions include: selection and scoping, harvesting, cataloging with metadata, full text search, reports and analysis of collections • Ability to capture content using ten different crawl frequencies • Content includes: text, html, video, audio, social networking, PDF, still images, newspapers

  11. Stanford University, Islamic & Middle Eastern Collection Purpose: harvest and preserve Iranian Blogs • Archiving over 300 blogs written by and for Iran and the Iranian people • Includes coverage of 2009 Iranian elections

  12. Stanford University Islamic and Middle Eastern Collection

  13. University of Texas at Austin:LANIC Purpose: Archive documents from 18 different countries and 300 government ministries and presidencies. Content includes: • full-text versions of official documents • original video and audio recordings of key regional leaders • thousands of annual and "state of the nation" reports • Specific collections for Latin American elections and political parties

  14. Minister of Defense, Chile

  15. American University of Cairo Collections: • American University in Cairo website • Coptic Religion & Culture • Egyptian Arts, Culture & Society • Egyptian Business • Migration and Refugee Studies

  16. Egypt Today is the leading current affairs magazine

  17. The Egyptian Organization For Human Rights

  18. Tunisian Unrest 2011 Archiving Blogs, News sites, Social media Websites suggested by curators as subject matter experts (Bnf) http://www.archive-it.org/public/collection.html?id=2323

  19. Tunisia Watch A website that focuses on Tunisian issues/events

  20. Nawaat.org - a blog run by Tunisians

  21. Slim Amamou’s Twitter page

  22. Access to Collections Partners: • Can view through private web application or access page with login/password General Public: • Can view from Archive-It website or General Archive website • Can view from Partners website - links back to Archive-It hosted data • Partners can host data from their servers -Restricted and private access options are available

  23. What’s next at Internet Archive? • Collaboration and Partnerships • Digital Stewardship • Continue to develop services that help memory institutions and further our mission • Forge new global partnerships • Develop a preservation policy/access model • Digital Archive

  24. Thank You! Kristine Hanna Internet Archive Director, Archiving Services kristine@archive.org 415 561 6799 x 5

More Related