interoperability n.
Skip this Video
Loading SlideShow in 5 Seconds..
Interoperability PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 72

Interoperability - PowerPoint PPT Presentation

  • Uploaded on

Interoperability. Kevin Hegg and Andreas Knab 2008 ARLIS/NA-VRA Summer Educational Institute July 11, 2008 . The Challenge.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Interoperability' - max

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Kevin Hegg and Andreas Knab

2008 ARLIS/NA-VRA Summer Educational Institute

July 11, 2008

the challenge
The Challenge
  • How do we connect disparate systems and applications so that users can discover, access and exchange digital content and cataloging data from a coherent interface using their preferred set of desktop tools?
broad categories systems tools 1 of 2
Broad Categories: Systems & Tools (1 of 2)
  • Digital Assess Management (DAM) with Discovery/Access/Presentation (DAP)Examples: Almagest, ARTstor, CONTENTdm, Luna Insight, MDID
  • Institutional RepositoriesExamples: Digitools, DSpace, Fedora, VITAL
  • Online image collections and digital libraries — freely accessible contentExamples: American Memory (Library of Congress), Bristol Biomedical Image Archive, Earth Science World Image Bank, Metropolitan Museum of Art, Museum of Modern Art (MOMA), NASA Image eXchange, New York Public Library Digital Gallery
  • Online image collections and digital libraries — subscriptionExamples: AccuNet/AP Multimedia Archive, Art Museum Image Gallery (Wilson), ARTstor, CAMIO (OCLC), images MD (Current Medicine, Inc.)
  • Content aggregators/gatewaysExamples: HarvestRoad Hives, IMLS-DCC, MERLOT, OAIster
broad categories systems tools 2 of 2
Broad Categories: Systems & Tools (2 of 2)
  • Media-sharing communitiesExamples: Flickr, Picasa, Wikimedia Commons, YouTube
  • Course Management SystemsExamples: ANGEL, Blackboard, Moodle, Sakai, WebCT
  • Federated search enginesExamples: Central Search (Serials Solution), LibraryFind (Open Source), MetaLib, Muse, WebFeat
  • Internet search enginesExamples: Google Image Search, Live Search (Microsoft)
  • Stand-alone applications and browser-based toolsExamples: Amaznode, ARTstor Offline Image Viewer (OIV), Collex, Cross Media Annotation System, Digital Library eXtension Service (DLXS), Image Innovations Image Manager, iTunes University, PowerPoint (Microsoft), Pachyderm, Scholar's Box, VireoCat, VUE
  • Social networkingExamples: Facebook, MySpace
data exchange mechanisms1
Data Exchange Mechanisms
  • Protocols, standards, specifications, interfaces, guidelines, etc. used to facilitate the orderly discovery and exchange of data
  • Three methods for exchanging data:
    • Linking/redirecting: Digital content is stored on remote system. User is directed to remote system to access content (e.g. federated searches)
    • On request with optional caching: Digital content is downloaded from remote system as needed and presented to user (e.g. MDID remote collections)
    • Harvesting (bulk import): Entire collection of digital content is copied from remote system in advance and served to user locally (e.g. Allan Kohl’s AICT collection)
z39 50
  • “ISO 23950: Information Retrieval: Application Service Definition and Protocol Specification”
  • Maintained by Library of Congress
  • Defines procedures and formats that a client may use to search a remote database, to learn about the results of the search, and to manipulate and retrieve search results
  • Complicated and difficult to implement
  • Used commonly by libraries to facilitate federated searches
sru srw
  • “Search and Retrieval via URL/Web Service”
  • Maintained by Library of Congress
  • Companion protocols used to formulate and execute Internet search queries and to retrieve the query results as a record set
  • Query results are formatted in MARCXML or Dublin Core
  • Relatively simple and easy to implement
  • ARTstor’s XML Gateway is built on SRU/SRW
oai pmh
  • “Open Archives Initiative (OAI) Protocol for Metadata Harvesting”
  • Developed and maintained by The Open Archives Initiative
  • Allows data providers (repositories) to expose metadata to client applications (harvesters) and facilitates the aggregation of metadata from more than one repository
  • Fairly easy to implement
  • LOC’s American Memory repository implements OAI-PMH
oai ore beta
OAI-ORE (Beta)
  • “Open Archives Initiative (OAI) – Object Re-use and Exchange”
  • Developed and maintained by The Open Archives Initiative
  • A companion standard to OAI-PMH that will allow repositories to exchange digital objects and applications to consume digital objects residing in repositories
  • Fairly easy to implement
  • OAI released public beta June 2008
oki osid
  • “Open Knowledge Initiative (OKI) Open Service Interface Definition”
  • Developed and maintained by the Open Knowledge Initiative (housed at MIT)
  • Describes a set of programmatic interfaces used to achieve interoperability among different repositories built on a variety of evolving technologies
  • Implemented by a variety of systems and tools – including the Museum of Fine Arts, Boston; the National Library of Australia; ARTstor; Sakai; Fedora; Dspace; Pachyderm; and VUE
proprietary apis
Proprietary APIs
  • An API is “the interface that a computer system or application provides in order to allow requests for service to be made of it by other computer programs, and/or to allow data to be exchanged between them”*
  • APIs are frequently used to facilitate data sharing and digital object reuse
  • Examples of systems and tools that use APIs to exchange data and/or digital objects: ARTstor, Blackboard, CONTENTdm, Facebook, Flickr, MDID, Photobucket, Photoshop Express, Picasa, PowerPoint, and YouTube


data exchange overview
Data Exchange Overview

Data Exchange Overview

  • Metadata is “data about data” and is used to structure and describe any kind of content, including of course digital objects
  • Metadata helps us discover, evaluate, retrieve and use digital objects
  • Metadata standards are an important component in data exchange
    • Result sets must be structured and understandable
    • Metadata standards provide structure and meaning to content
  • Many data exchange mechanisms support multiple metadata standards and most support Dublin Core
the challenge1
The Challenge
  • It is very unlikely that two unrelated repositories will catalog data in exactly the same way using the same metadata standards
  • How does one system or tool process information from another system? How does the user search a remote collection in a targeted
  • The solution: A simple, generic metadata standard that supports a wide variety of cross-disciplinary resources
dublin core overview
Dublin Core Overview
  • “The Dublin Core metadata element set is a standard for cross-domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find”*
  • The Simple DC consists of 15 optional and repeatable DC elements:
















  • The Qualified DC extends or refines DC elements in order to narrow the meaning of the DC elements


dublin core principles
Dublin Core Principles
  • The One-to-One Principle: “Create one metadata description for one and only one resource”*
    • For example: Do not describe a JPEG of the Mona Lisa as if it were the original painting. Do not confuse the creator of the JPEG with the painter of the original
  • The Dumb-down Principle: Translate qualified DC to simple DC so that a user can ignore qualifiers and treat a description as if it were unqualified
  • Appropriate values: Construct your metadata in such a way that it makes sense to a user outside of your context (e.g. to someone who is not a curator or art historian)

* Marty Kurth. Basic DC Semantics. (p. 20)

sample dublin core record
Sample Dublin Core Record*


  • A crosswalk “maps the elements in one metadata scheme to the equivalent elements in another scheme”*
  • The challenge: Many if not most systems and repositories do not use Dublin Core to catalog data
    • Digital image repositories often use the VRA Core or proprietary schemas to catalog image records
    • MDID allows the curator to define a custom catalog structure for each collection
    • ARTstor aggregates data from many sources and thus is not able to use standardized metadata
  • The solution: Because many data exchange mechanisms support Dublin Core (e.g. SRU/SRW, OAI-PMH, OKI OSID, MDID API), use a crosswalk to map your data to Dublin Core


dublin core vra crosswalk
Dublin Core/VRA Crosswalk*

*Claremont Colleges Digital Library (

  • ASCII represents characters with numbers between 32 and 127 (7 bits)
  • Computers use 8 bits, so there is room for an additional 128 characters
more history
More History
  • Problem: everybody used the additional 128 characters for something different
  • Solution: ANSI standard for code pages
  • Examples
    • Israel: code page 862
    • Greece: code page 737
  • All code pages still use the same characters below 128, but different ones above
more history1
More History
  • Problem: Only one code page possible at any time
  • Problem: Some languages use more different characters than fit in 8 bit
  • Solution: Unicode
  • Each letter or symbol is represented by a code point (a number)
  • No limits on numbers
  • Examples
    • A is U+0041
    • ڬ is U+06AC
    • ♫ is U+266B
  • There are many different ways to store these code points (numbers) in a file
  • UCS-2/UTF-16
    • Stores every character in two bytes
    • Problem: byte order
  • UTF-8
    • Stores every character in one to six bytes
    • Compatible with ASCII/ANSI for first 128 characters
  • If the expected encoding does not match the actual encoding, characters will be interpreted incorrectly or not at all
  • Make sure to save your content in an encoding that is understood by the program you want to read it with
  • UTF-8 generally is the best option
  • Reference:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)by Joel Spolsky

many options
Many options
  • Databases
    • Microsoft Access
    • FileMaker Pro
  • Spreadsheets
    • Microsoft Excel
  • Text based
    • CSV (comma separated values)
    • TSV (tab separated values)
    • XML (structured text)
options for exchanging data
Options for exchanging data
  • Usually text based
  • CSV and TSV are functionally equivalent
    • Simple spreadsheet format
    • Multi-valued fields or hierarchies are difficult to represent
  • XML
    • Easily handles multi-valued fields or hierarchies
    • Structure must be understood
criteria for picking an image format
Criteria for picking an image format
  • Lossy vs. Lossless
  • Compressed vs. Uncompressed
  • Archival vs. Presentation
  • Offline vs. Online
  • Exchangeable vs. Proprietary
  • Common vs. Uncommon
lossy vs lossless
Lossy vs. Lossless
  • Tradeoff between file size and image quality
  • Archival images should always be lossless
  • Image quality with lossy formats will get progressively worse with editing or processing
  • Most formats are lossless
  • JPEG is lossy
compressed vs uncompressed
Compressed vs. Uncompressed
  • Tradeoff between file size and processing time
  • Processing time is usually no longer an issue
  • Compressed does not mean lossy
  • Examples:
    • TIFF can be compressed or uncompressed
    • JPEG is compressed and lossy
    • PNG is compressed and lossless
archival vs presentation
Archival vs. Presentation
  • Archival images should always be lossless and highest possible quality and resolution
  • Presentation images need to be small for quick network transfer
  • Presentation image quality and size can be lower depending on presentation equipment
  • As equipment gets better, better presentation images can be derived from archival images
offline vs online
Offline vs. Online
  • Online images need to be smaller and in a format that is supported by a web browser
  • Offline images can be larger and in formats that may require specific software to view or process
  • Example
    • Adobe Photoshop files are great for post-processing, but cannot be delivered in the browser
exchangeable vs proprietary
Exchangeable vs. Proprietary
  • Proprietary image formats may be harder to exchange with others
  • Examples
    • PSD files require Adobe Photoshop
    • MrSID files require proprietary plug-ins
    • TIFF or JPEG are universally understood
common vs uncommon
Common vs. Uncommon
  • Some file formats or format options are more popular and widely supported than others
  • Examples
    • Transparent GIFs work in any browser
    • Transparent PNGs only work in newer browsers
    • 8-bit RGB TIFFs work in almost every program
    • 16-bit RGB TIFFs or grayscale TIFFs do not
reformatting data in excel
Reformatting Data in Excel
  • Cells
    • A spreadsheet is made up of cells organized in columns and rows, respectively identified by letters (A, B, C, …) and numbers (1, 2, 3, …)
    • A cell is referred to by its column letter and row number, for example “A1” or “D25”
  • Formulas in Excel
    • Always start with an equal sign “=“
    • Adjust their cell references when being moved or copied to other cells
excel user interface
Excel User Interface

Current cell

Formula in current cell

Cell value

Dragging little square copies cell formula into adjoining cells

useful excel formulas
Useful Excel Formulas
  • Combine cell values
    • CONCATENATE(cell, cell, …)
      • Or use “&” operator: cell & cell & …
      • Combines cell values
  • Extract parts of a cell value
    • LEFT(cell, length)
    • RIGHT(cell, length)
    • MID(cell, start, length)
useful excel formulas1
Useful Excel Formulas
  • Find the position of a string in a cell value
    • FIND(text, cell) or FIND(text, cell, start)
  • Length of a cell value
    • LEN(cell)
  • Append “.jpg” to a record identifier to create a filename

Enter formula once and then drag it down across all rows

  • Splitting date ranges into start and end date

Grab up to 10 characters

create a powerpoint slideshow from a set of images
Create a PowerPoint slideshow from a set of images
  • Use the Insert>Photo Album feature
  • Example: MDID can export a slideshow package containing a set of JPEG files
convert image files between formats
Convert image files between formats
  • For online delivery, images usually have to be in JPEG format
  • Archival images are usually TIFF
  • Need to convert many images at once
xat image optimizer
XAT Image Optimizer
  • Converts between many image formats
  • Flexible quality and compression settings
  • Available at
  • Command-line program
    • Harder to use
    • Easier to automate
  • Available at
adobe photoshop
Adobe Photoshop
  • Can read and write many formats
  • For batch processing, need to create an Action that saves a file in the desired format
adobe photoshop actions
Adobe Photoshop Actions

New Folder

New Action

adobe photoshop actions1
Adobe Photoshop Actions
  • To create a “Save As…” action
    • Create a new image or open an existing image
    • Create the new action, recording will start
    • Go to File>Save As...
    • Set the desired options and file format
    • Save the file
    • Stop the recording
open xml cataloging data files in microsoft excel
Open XML cataloging data files in Microsoft Excel
  • Microsoft Excel can read certain XML files and open them as spreadsheets
  • Allows easier editing and processing
  • Example: File exported from MDID collection
xml data file
XML data file
  • Simple record structure – no repeated values
opening xml file in excel
Opening XML file in Excel
  • Open the XML file and select “As an XML table”
xml data file in excel
XML data file in Excel
  • Spreadsheet can be edited and saved in different formats for further processing or import
problematic xml data file
Problematic XML data file
  • Repeated fields cause records to be spread over multiple rows
mdid interoperability update
MDID & Interoperability Update
  • About to release RFP to solicit bids from software companies to begin work on one or more of the following:
    • ARTstor <> MDID
    • Flickr <> MDID
    • Display MDID images and slideshows in Blackboard
    • Download MDID slideshows as PowerPoint presentations
    • Upload PowerPoint presentations into MDID
    • Build an OKI OSID plug-in for MDID
  • We plan on extending grant work through 2009