1 / 18

The FDLP Web Archive

The FDLP Web Archive. Dory Bower Archive-It Partner Meeting November 18, 2014. FDLP History and Dissemination of Government Publications.

correia
Download Presentation

The FDLP Web Archive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014

  2. FDLP History and Dissemination of Government Publications • Act of 1813: Congress first authorized legislation to ensure the provision of one copy of the House and Senate Journals and other Congressional documents to certain universities, historical societies, state libraries, etc. • The Printing Act of 1895: Formed the basis for Title 44, created centralization of printing, binding and distribution of US Government documents, establishing the role of the FDLP, and transfer of the Office of the Superintendent of Documents to the GPO. The first Monthly Catalog of US Government Publications printed at the GPO this year. • Title 44, US Code: Mandate for Public Printing and Documents. Chapter 19 deals with the Depository Library Program. Title 44 has seen many changes over the last century.

  3. FDLP History and Dissemination of Government Publications

  4. FDLP History and Dissemination of Government Publications • GPO Electronic Information Access Enhancement Act of 1993: Establishes a means of enhancing electronic public access to a wide range of Federal electronic information. • 1996: Launch of Catalog of US Government Publications (CGP), the online counterpart for the Monthly Catalog of US Government Publications. Publications dating from July 1976 – Present. • 1998: LSCM begins use of PURLS for persistent access to electronic copies of government publications • 2011: Begin use of Archive-It for automated harvest of government websites

  5. Government “Publications”

  6. Web Archiving Options Decision process for FDLP Web archiving Standard PURL: Individual publications and less complex web sites, using Teleport software Archive-It: Content rich websites Partnership: Hard to harvest sites, database sites or real time information

  7. Collection Development Develop and build website level collection • Must be within scope of FDLP • Not distributing through print • Government information disseminated through web and not cataloged • Avoid duplication of effort with other institutions or already in FDsys • Work with the collection development staff with their many years of experience to help determine needs

  8. Collection Development • Pilot sites: 3 sites to begin testing workflow • SuDoc Y3 sites: commissions, committees, independent agencies • Special Collections • Native American Resources • Nominated sites

  9. Collection Development Nominations • Document Discovery http://usgpo.wufoo.com/forms/document-discovery/ • AskGPO http://www.gpo.gov/askgpo/ • Team email fdlpwebarchiving@gpo.gov

  10. Collection Development The Decision making process • Sent out to team on email, or discuss in weekly meeting • Much discussion within the FDLP web archiving team which represents many areas of LSCM • Is it within scope of FDLP and other collection development parameters • Decide by which means to archive

  11. Collection Development Moving forward • Y3s almost complete • Working with Collection Development staff with their extensive experience to determine needs • Move from smaller to larger sites • Non-standard sites (fatherhood.gov, read.gov) • Special Collections • Regular frequency of crawls • Working with other Federal collecting Institutions

  12. Archive-It Workflow • Notification to Agency • Webmaster – 48 hours intent to crawl • Full disclosure of what we are doing • Chosen for inclusion into FDLP • Will ignore the robots.txt [however only do so when necessary] • Begin seed list, test crawls, QA, modifications • Concentrate a lot of time on test crawl

  13. Archive-It Workflow • Run and QA production crawl • Run patch crawls • Submit lots of questions • Best playback possible • Maximize user experience and account • Make live on Archive-It and submit for metadata

  14. FDLP Web Archive Collection size: • 3.5 TB, over 24 million documents crawled • 56 agencies represented on AIT • 65 records on CGP (analytical cataloging) • FDLP Project page http://www.fdlp.gov/377-projects-active/2020-web-archiving Resources: • 10 contributors

  15. Access Two locations for Access • Archive-It • Search for “GPO” or “FDLP” • Catalog of Government Publications (CGP) • Identifiable through “INTERNET” in SuDoc number • Expert search of wcat=web archiving retrieves all • Would like to find better access to whole collection and eliminate this search

  16. FDLP Web Archive https://archive-it.org/home/FDLPwebarchive

  17. Catalog of Government Publications

  18. Questions? fdlpwebarchiving@gpo.gov

More Related