Cs 502 computing methods for digital libraries
Sponsored Links
This presentation is the property of its rightful owner.
1 / 34

CS 502: Computing Methods for Digital Libraries PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

CS 502: Computing Methods for Digital Libraries. Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library. What are Digital Images?. Electronic snapshots taken of a scene or scanned from documents samples and mapped as a grid of dots or picture elements (pixels)

Download Presentation

CS 502: Computing Methods for Digital Libraries

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


CS 502: Computing Methods for Digital Libraries

Lecture 9

Conversion to Digital Formats

Anne Kenney, Cornell University Library


What are Digital Images?

  • Electronic snapshots taken of a scene or scanned from documents

  • samples and mapped as a grid of dots or picture elements (pixels)

  • pixel assigned a tonal value (black, white, grays, colors), represented in binary code

  • code stored or reduced (compressed)

  • read and interpreted to create analog version


Four Scanning Methods

Bitonal

Grayscale

Special

Treatment

Color


Digital Image Quality is Governed By:

  • resolution and threshold

  • bit depth

  • image enhancement

  • color management

  • compression

  • system performance

  • operator judgment and care


Resolution

  • determined by number of pixels used to represent the image

  • expressed in dots per inch (dpi)--actually dots/sq. inch

  • increasing resolution increases level of detail captured and geometrically increases file size


Effects of Resolution

600 dpi

300 dpi

200 dpi


Threshold Setting in Bitonal Scanning

defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white


Effects of Threshold

threshold = 60

threshold = 100


Bit Depth

  • number of bits used to represent each pixel, typically 8 bits or more per channel

  • representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel

    00000000 = black

    11111111 = white


Bit Depth

  • increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size

  • affects resolution requirements


Effects of Grayscale on Image Quality

3-bit gray

8-bit gray


Image Enhancement

  • can be used to improve image capture

  • use raises concerns about fidelity and authenticity


Effects of Filters

no filters used

maximum enhancement


Image Editing


Compression

  • reduces file size for processing, storage, transmission, and display

  • image quality may be affected by the compression techniques used and the level of compression applied


Compression Variables

  • lossless versus lossy compression

  • proprietary vs. open schemes

  • level of industry support

  • bitonal vs. gray/color


Common Compression Schemes

  • bitonal

    • ITU Group 4: lossless

    • JBIG (ISO 11544): lossless

    • CPC: Lossy

    • DigiPaper

  • grayscale/color

    • LZW, lossless

    • JPEG: lossy

    • Kodak Image Pac, “visually lossless”

    • Fractal and Wavelet compression


Effects of JPEG Compression

300 dpi, 8-bit grayscale

uncompressed TIFF

JPEG 18.5:1 compression


Compression Observations

  • the richer the file, the more efficient and sustainable the compression

  • the more complex the image, the poorer the compression


Equipment used and its performance over time

  • scanners offer wide range of capabilities to capture detail, dynamic range, and color

  • scanners with same stated functionality can produce different results

  • calibration, age of equipment, and environment affect quality


Equipment used and its performance over time

  • attributes and capabilities of monitor and/or printer are also factors

  • assess quality visually and computationally

    • use targets

    • control QC environment

    • increasing availability of software to assess resolution, tone, color, artifacts


Image Capture:

Create digital objects rich enough to be useful over time in the most cost- effective manner.


How to determine what’s good enough?

  • Connoisseurship of document attributes

  • Objective characterizations

  • Translation between analog and digital

    • measurement to scanning requirement to corresponding image metrics

    • e.g., detail sizeresolution MTF

    • tonal range bit depth signal-to-noise ratio


Case Study

  • Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics

  • 600 dpi 1-bit capture adequately preserves informational content of text-based materials


Ensuring Full Informational Capture: “No More, No Less”

desired point of capture

image quality and utility

cost


Create One Scan To Serve Multiple Uses

  • Derive alternative formats/approaches to meet current and future information needs

  • Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost

  • Understand technical links affecting presentation and utility of derivatives


User Requirements

  • completeness

  • legibility

  • speed of delivery

  • “cooked” files


Derivatives from a Digital Master

  • the richer the image, the better the derivative

    • a derivative from a rich file is superior in quality to one from a poorer scan

    • the richer the image, the better the image processing


monitor: 800 x 600 pixels

800

600

document at 60 dpi

480 pixels x 600 pixels

2,000

pixels

1,600 pixels

document at 100 dpi

800 pixels x 1,000 pixels

document: 8” x 10”, 200 dpi

(1,600 x 2,000 pixels)


Compression/File Format Comparison

for Derivative Files

GGIF Compressed

6:1 (NARA)

6:1 (NARA)

JPEG Compressed

20:1 ( LC) Compressed

20:1 (LC)

TIFF Uncompressed


Alternatives for Displaying Oversize Images

  • File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix

  • User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality


Recommendations Coalescing

  • Intent of conversion drives decisions

    • issues of access considered at conversion

    • notion of long-term utility and cross-institutional resources gaining ground

  • Access images will change with:

    • changing user needs and capabilities

    • changes in technologies: file formats, technical infrastructure,compression, web browsers, processing programs, scaling routines


  • Login