1 / 10

P ort T ownsend L eader H istorical N ewspaper A rchive Keith Darrock

P ort T ownsend L eader H istorical N ewspaper A rchive Keith Darrock. H IS TORY. PORT TOWNSEND LEADER. Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>. Schema >> Dublin Core. Issue & headlines: April 1, 1910 Page three

homer
Download Presentation

P ort T ownsend L eader H istorical N ewspaper A rchive Keith Darrock

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Port Townsend Leader Historical Newspaper Archive Keith Darrock

  2. H IS TORY PORT TOWNSEND LEADER • Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>

  3. Schema >> Dublin Core Issue & headlines: April 1, 1910 Page three May Roberts company to open engagement tonight Cubs and soldiers to play practice game Schooner Inca is coming to Puget Sound Keywords: drama; baseball; Named individuals: Roberts, May Rassmussen, Captain Vessels: schooner Inca; Publisher: The Leader Company Place of publication: United States--Washington (State)--Port Townsend Type of publication: Newspaper Frequency: Daily except Monday Title notes: The Port Townsend daily leader (1904-1916); Continues: Morning leader. Continued by: Port Townsend leader (1916). Image format: GIF image Scanning data: Scanned from 35 mm silver negative microfilm by OCLC Preservation Resources; GIF images are 1600 pixels wide, type 89A with 2 added colors derived from 600 dpi bitonal TIFF images. Source of other formats: Microfilm: Port Townsend Public Library, 1220 Lawrence St., Port Townsend, WA 98368, 360-385-3181. Microfilm and bound originals: Jefferson County Historical Society, 210 Madison St., Port Townsend, WA 98368, 360-385-1003. Rights: Use of this image is restricted to non-commercial, public access and does not include the right to create text versions. Example taken from: http://content.lib.washington.edu/cgi-bin/viewer.exe?CISOROOT=/ptleader&CISOPTR=3978

  4. Content Standards • MIG: Metadata Implementation Group >> http://www.lib.washington.edu/msd/mig/default.html • Provides guidelines for creating a collection >> • Dublin Core Field Properties Table >> http://www.lib.washington.edu/msd/mig/advice/default.html • Date field mm/dd/yyyy • Issue & headlines • Vessels • Named individuals • Keywords • Page notes • Leader historical archive does not follow strict content standards in terms of controlled vocabulary >>

  5. Digitization Standards • Put onto microfilm as a Washington State Library project >> • Scanned by OCLC into images >> • Images originated in 600 dpi TIFF format >> • Finalized as 1600 pixels wide GIF format>> • Uploaded to UW servers via CONTENTdm clients

  6. Harvested into a Federated Search Tool? • The Port Townsend Leader archive has not been harvested by OAIster yet… • However, many collections within the UW digital collection have • The Port Townsend Leader archive can be found in OCLC’s CONTENTdm Collection of Collections http://collections.contentdm.oclc.org/

  7. Software Used to House Records & Digitized Works • CONTENTdm>> • Originally developed by the University of Washington >> • 2001 Digital Media Management, Inc was created. System made available to outside entities >> • OCLC purchased in 2006, now owns and manages.

  8. Who’s Responsible for Indexing? • The Port Townsend Public Library manages volunteer(s) to hand index certain fields. Including: • Issue and headlines • Keywords • Named individuals • Vessels • Using a controlled vocabulary? Sometimes, including the first three years 1903-06 and sporadically there after. Actual LCSH headings, probably not. • *Automation will not solve the need for human indexing within the date range and subject fields

  9. Automation – Can We Do It? • In over ten years, human indexing has only completed ten years of content. However, this is a lot of work, over 7,000 images so far! >> • Need a more efficient solution? >> • Use OCR (ABBYY FineReader) software to extract the image’s text in batches >> • Add new field >> Text, that contains all text (searchable-YES) • Two files; image & OCR (text) combined via CONTENTdm>> upload all to UW main server >>

  10. Challenges to Automation >> • Working with volunteers, need library staff involvement >> • Making compound or complex objects >> • Still need subject terms & date applied by a human indexer >> • Having volunteers use an actual controlled vocab. >> • Time to do it all? I Automation >>

More Related