The question of quality
Download
1 / 32

The Question of Quality - PowerPoint PPT Presentation


  • 234 Views
  • Updated On :

The Question of Quality Week 9 Most of this presentation is based on the work of Marcos Goncales as cited in the references Goals for this class Consider quality in digital libraries How do we define quality How do we measure quality How does quality control impact a user?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Question of Quality' - Antony


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The question of quality l.jpg

The Question of Quality

Week 9

Most of this presentation is based on the work of Marcos Goncales as cited in the references


Goals for this class l.jpg
Goals for this class

  • Consider quality in digital libraries

    • How do we define quality

    • How do we measure quality

    • How does quality control impact a user?

  • The role of logging

    • Helpful information

    • Privacy issues

  • The status of DL logging


Understanding quality in a dl l.jpg
Understanding Quality in a DL

  • Quality indicators: proposed descriptions of quantities or observable variables that may be related to quality

    • “measures” = stronger term. Requires validation

    • Gonçalves et al provide analysis of quality conditions and recommend specific quantities to be used.

      • Dimensions of quality

      • Proposed indicators

      • Application to DL concerns


Getting the data l.jpg
Getting the data

  • Where does the data come from?

    • Logging

    • Surveys

    • Focus Groups

  • Know what information is needed, then choose the method most likely to provide the data.

    • More about the sources of data after we see what we need to know.


What are we looking for l.jpg
What are we looking for?

  • Consider that we are concerned about the quality of the following characteristics of a DL:

    • Data objects

    • Metadata

    • Collection

    • Catalog

    • Repository

    • Services

  • What characteristics do we want each of those to have?


Slide6 l.jpg

Dimensions of Quality

Services

Collection

Digital Objects

Catalog

Repository

Metadata Specification


Dimensions of quality l.jpg

Digital Object

Accessibility

Pertinence

Preservability

Relevance

Similarity

Significance

Timeliness

Metadata Specification

Accuracy

Completeness

Conformance

Collection

Completeness

Catalog

Completeness

Consistency

Repository

Completeness

Consistency

Services

Composability

Efficiency

Effectiveness

Extensibility

Reusability

Reliability

Dimensions of Quality


What information do we need related to digital objects l.jpg
What information do we need - related to Digital Objects

  • Accessibility

    • What collection?

    • # of structured streams

    • Rights management metadata

    • Communities to be served

  • Pertinence

    • Context

    • Information content

    • Information need


Information need digital objects continued l.jpg
Information need - Digital Objects, continued

  • Preservability

    • Fidelity (lossiness)

    • Migration cost

    • Digital object complexity

    • Stream formats

  • Relevance

    • Feature frequency

    • Inverse document frequency

    • Document size

    • Document structure

    • Query size

    • Collection size


Information need digital objects continued10 l.jpg
Information need - Digital Objects, continued

  • Similarity

    • All the same features as in relevance

    • Also: citation/link patterns

  • Significance

    • Citation/link patterns

  • Timeliness

    • Age

    • Time of latest citation

    • Collection freshness


Information need metadata specification l.jpg
Information need - Metadata Specification

  • Accuracy

    • Accurate attributes

    • # attributes in the record

  • Completeness

    • Missing attributes

    • Schema size

  • Conformance

    • Conformant attributes

    • Schema size


Information collection and catalog l.jpg
Information - Collection and Catalog

  • Completeness of the Collection

    • Collection size

    • Size of an “ideal” collection

  • Completeness of the Catalog

    • # of digital objects with no metadata

      • Item level metadata

    • Size of the collection

  • Catalog Consistency

    • # of metadata specifications per digital object


Information about the repository l.jpg
Information about the Repository

  • Completeness

    • # of collections

  • Consistency

    • # of collections

    • Catalog/collection match

      • How well do the catalogs match the collections?

      • Are the catalogs for all the collections at the same level of detail?


Service information need l.jpg
Service Information Need

  • Composability (ability to be combined to form new services)

    • Extensibility

    • Reusability

  • Efficiency

    • Response time

  • Effectiveness

    • Precision/recall (of search)

    • Classification


Service information continued l.jpg
Service Information, continued

  • Extensibility

    • # extended services

    • # services in the DL

    • # lines of code per service manager

  • Reusability

    • # reused services

    • # services in the DL

    • # lines of code per service manager

  • Reliability

    • # service failures

    • # accesses


Making more concrete l.jpg
Making more concrete

  • Each of the measures listed gives an idea of the information need

  • Exactly what do we measure?

  • How do we combine numbers obtained to get a usable result?

  • Following pages describe specific measures and formulas for combining those.


Digital object accessibility l.jpg
Digital Object Accessibility

  • Basic requirement

    • If a user cannot access the DO, there is little point in having it in the DL

    • Identified measures:

      • Collection, # structured streams, rights management metadata, communities

    • Say it another way:

      • Is it present in a collection in the repository?

      • Is there a service that can retrieve and display the content?

      • Is the rights management open enough for access by this user?


Digital object accessibility formally l.jpg
Digital Object Accessibility - formally

Define dox= a specific digital object

Accessibility = Acc(dox, acy) =

  • 0, if there is no collection C in the DL repository R such that dox  C

  • Otherwise, acc = (∑z  struct_streams(dox) rz(acy))/ |struc_streams(dox)|

  • where rz(acy)) is a rights management rule defined as

    • 1, if

      • Z has no access constraints, or

      • Z has access constraints and acy  cmz,

        • Where cmz,  Soc(1) is a community that has the right to access z; and

    • 0, otherwise

This does not deal with accessibilty related to accessing the streams


An illustration l.jpg
An illustration

  • NDLTD is the Networked Digital Library of Theses and Dissertations

    • Some institutions requre that all theses and dissertations be stored in this DL

    • Student chooses how visible to make the document.

      • Parts of the document may be visible while other parts are not

      • The document, or parts of it, may be visible to a restricted community.


Accessiblity case l.jpg
Accessiblity case

  • etdx is a specific electronic thesis or dissertation of interest

  • acc(etdx) is

    • 0 if it is not in the collection

    • Otherwise (∑z  struct_streams(etdx) rz(acy))/ |struc_streams(dox)|

      • Where rz(acy) = 1

        • if etdx is marked “world wide access” or etdx ismarked “local institution only” and acy  C where C is defined as identifiable members of the local institution

      • = 0 otherwise


With the numbers l.jpg
With the numbers

  • An example from VT

  • For authors name beginning with A:

    • Unrestricted ETDs: 164

    • Restricted ETDs: 50

    • Mixed ETDs: 5

      • Percent unrestricted: 0.5, 0.5, 0.167, 0.1875, 0.6)

  • Overall measure of accessibility outside VT:

    • (164 *1 + 50 * 0 + .5 + .5 + .167 + .1875 + .6)/219

    • 0.76


Solidifying pertinence l.jpg
Solidifying Pertinence

  • How do we measure something like pertinence?

  • Relation between the information content of a digital object and the need of the user

  • Depends on the user’s situation -- background, current context, etc.


Pertinence l.jpg
Pertinence

  • Inf(doi) represents the information content of digital object I

  • IN(acj) is the Information Need of actor (user) acj

  • Context (acj, k) the combined effects of social factors that determine the pertinence of doi to acj at time k

  • Two communities of actors

    • Users whose information needs we try to satisfy

    • External Judges who are responsible for judging the relevance of a document in response to a query.

    • Non overlapping groups


Pertinence formula l.jpg
Pertinence formula

  • Pertinence (doi, acj, k): Inf(doi) X IN(acj) X Context(acj, k) defined as

    • 1 if Inf(doi) is judged by acj to be informative with regard to IN(acj) in context Context(acj, k)

    • 0 otherwise

  • Rather complex way to say that the information is relevant if either the user or a qualified independent judge says it is


Preservability l.jpg
Preservability

  • Property of a digital object that describes its state relative to changes in hardware and software, representation format standards

    • Ex new recording technologies (replacement of VHS video tapes by DVDs)

    • New versions of software such as Word or Acrobat

    • New image standards such as JPEG 2000


Digital preservation techniques l.jpg
Digital preservation techniques

Most commonly used

  • Migration

    • Transform from one format to another

      • Ex. Open the document in one format and save in another or do an automated transformation

  • Emulation

    • Reproducing the effect of the environment originally used to display the material

      • Keep an old version of the software, or have new software that can read the old format

  • Wrapping

    • Keep the original format, but add enough human-readable metadata so that it can be decoded in the future

      • Note that the material is not directly usable

  • Refreshing

    • Copy the stream of bits from one location to another

      • Particularly suitable for guarding against the physical deterioration of the medium


Preservability issues l.jpg
Preservability issues

  • Obsolescence

    • How out of date is the digital object?

      • Many versions of the software?

      • Old storage media?

    • Difficult to migrate

      • Appropriate tools? Expertise?

  • Fidelity

    • How different is the migrated version from the original?

    • Distortion = loss of information

  • Preservability of a digital object in a digital library is a function of the fidelity of the migration and the obsolescence of the object

  • Preservability(doi, dl) = (fidelity of migrating (doi, formatx, formaty), obsolescence(doi, dl))


Preservability factors l.jpg
Preservability factors

  • Capital direct costs

    • Software

      • Developing software to create new versions of the object or obtaining licenses for new versions of the original software

    • Hardware

      • For processing the migration and for storing the results

  • Indirect operating costs

    • Monitoring digital objects for migration needs

    • Maintaining up-to-date intellectual property rights

    • Storage

    • Staff training


Calculating obsolescence l.jpg
Calculating Obsolescence

  • obsolence(doi, dl) = cost of converting/migrating the digital object, doi, within the context of a specific digital library


Calculating fidelity l.jpg
Calculating fidelity

No distortion: must yield a fidelity of 1.0

  • fidelity is the inverse of distortion.

    fidelity(doi, formatx, formaty) =

    1/(distortion(mp(formatx, formaty)) + 1.0)

  • One common measure of distortion

    • mean squared error (mse)

      • Let {xn} be a stream of doi and {yn}

      • mse({xn}, {yn}) = ∑Nn-1(xn - yn)2 / N

        Use mse for distortion:

        fidelity(doi, formatx, formaty) = 1/(mse({xn}, {yn}) = ∑Nn-1(xn - yn)2 / N + 1.0)


A preservation scenario l.jpg
A Preservation Scenario

From Gonçales, adopted from one of his sources

  • Librarian learns that special collection of 1,000 digital images, stored in TIFF v5.0, is in danger of obsolescence because the latest version of the display software does not support that version.

  • Librarian decides to migrate all images to JPEG 2000, now the de facto image preservation standard, recommended by the Research Libraries Group (RLG)

  • Librarian does search for options, finds a tool costing $500, that converts TIFF 5.0 to JPEG 2000

  • About 20 hours needed to order, install, learn, apply the software to all images. Hourly rate of $66.60 per library employee.

  • To save space, choose to use a compression rate that produces average mse = 8 per image.

  • Preservability of each image = preservability (image-TIFF5.0, dl) = (1/9, ($500 +$66.60 *20)/1000) = (0.11, $1.83)

Obsolescencecost

Fidelityloss

Both numbers are costs and lower is better


References l.jpg
References

  • Gonçalves, M. A., Moreira, B. L., Fox, E. A., and Watson, L. T. “Quality Model for Digital Libraries”.


ad