Interoperability
Download
1 / 72

Interoperability - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Interoperability. Kevin Hegg and Andreas Knab 2008 ARLIS/NA-VRA Summer Educational Institute July 11, 2008 . The Challenge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Interoperability' - max


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Interoperability

Interoperability

Kevin Hegg and Andreas Knab

2008 ARLIS/NA-VRA Summer Educational Institute

July 11, 2008


The challenge
The Challenge

  • How do we connect disparate systems and applications so that users can discover, access and exchange digital content and cataloging data from a coherent interface using their preferred set of desktop tools?




Broad categories systems tools 1 of 2
Broad Categories: Systems & Tools (1 of 2)

  • Digital Assess Management (DAM) with Discovery/Access/Presentation (DAP)Examples: Almagest, ARTstor, CONTENTdm, Luna Insight, MDID

  • Institutional RepositoriesExamples: Digitools, DSpace, Fedora, VITAL

  • Online image collections and digital libraries — freely accessible contentExamples: American Memory (Library of Congress), Bristol Biomedical Image Archive, Earth Science World Image Bank, Metropolitan Museum of Art, Museum of Modern Art (MOMA), NASA Image eXchange, New York Public Library Digital Gallery

  • Online image collections and digital libraries — subscriptionExamples: AccuNet/AP Multimedia Archive, Art Museum Image Gallery (Wilson), ARTstor, CAMIO (OCLC), images MD (Current Medicine, Inc.)

  • Content aggregators/gatewaysExamples: HarvestRoad Hives, IMLS-DCC, MERLOT, OAIster


Broad categories systems tools 2 of 2
Broad Categories: Systems & Tools (2 of 2)

  • Media-sharing communitiesExamples: Flickr, Picasa, Wikimedia Commons, YouTube

  • Course Management SystemsExamples: ANGEL, Blackboard, Moodle, Sakai, WebCT

  • Federated search enginesExamples: Central Search (Serials Solution), LibraryFind (Open Source), MetaLib, Muse, WebFeat

  • Internet search enginesExamples: Google Image Search, Live Search (Microsoft)

  • Stand-alone applications and browser-based toolsExamples: Amaznode, ARTstor Offline Image Viewer (OIV), Collex, Cross Media Annotation System, Digital Library eXtension Service (DLXS), Image Innovations Image Manager, iTunes University, PowerPoint (Microsoft), Pachyderm, Scholar's Box, VireoCat, VUE

  • Social networkingExamples: Facebook, MySpace



Data exchange mechanisms1
Data Exchange Mechanisms

  • Protocols, standards, specifications, interfaces, guidelines, etc. used to facilitate the orderly discovery and exchange of data

  • Three methods for exchanging data:

    • Linking/redirecting: Digital content is stored on remote system. User is directed to remote system to access content (e.g. federated searches)

    • On request with optional caching: Digital content is downloaded from remote system as needed and presented to user (e.g. MDID remote collections)

    • Harvesting (bulk import): Entire collection of digital content is copied from remote system in advance and served to user locally (e.g. Allan Kohl’s AICT collection)


Z39 50
Z39.50

  • “ISO 23950: Information Retrieval: Application Service Definition and Protocol Specification”

  • Maintained by Library of Congress

  • Defines procedures and formats that a client may use to search a remote database, to learn about the results of the search, and to manipulate and retrieve search results

  • Complicated and difficult to implement

  • Used commonly by libraries to facilitate federated searches


Sru srw
SRU/SRW

  • “Search and Retrieval via URL/Web Service”

  • Maintained by Library of Congress

  • Companion protocols used to formulate and execute Internet search queries and to retrieve the query results as a record set

  • Query results are formatted in MARCXML or Dublin Core

  • Relatively simple and easy to implement

  • ARTstor’s XML Gateway is built on SRU/SRW


Oai pmh
OAI-PMH

  • “Open Archives Initiative (OAI) Protocol for Metadata Harvesting”

  • Developed and maintained by The Open Archives Initiative

  • Allows data providers (repositories) to expose metadata to client applications (harvesters) and facilitates the aggregation of metadata from more than one repository

  • Fairly easy to implement

  • LOC’s American Memory repository implements OAI-PMH


Oai ore beta
OAI-ORE (Beta)

  • “Open Archives Initiative (OAI) – Object Re-use and Exchange”

  • Developed and maintained by The Open Archives Initiative

  • A companion standard to OAI-PMH that will allow repositories to exchange digital objects and applications to consume digital objects residing in repositories

  • Fairly easy to implement

  • OAI released public beta June 2008


Oki osid
OKI OSID

  • “Open Knowledge Initiative (OKI) Open Service Interface Definition”

  • Developed and maintained by the Open Knowledge Initiative (housed at MIT)

  • Describes a set of programmatic interfaces used to achieve interoperability among different repositories built on a variety of evolving technologies

  • Implemented by a variety of systems and tools – including the Museum of Fine Arts, Boston; the National Library of Australia; ARTstor; Sakai; Fedora; Dspace; Pachyderm; and VUE


Proprietary apis
Proprietary APIs

  • An API is “the interface that a computer system or application provides in order to allow requests for service to be made of it by other computer programs, and/or to allow data to be exchanged between them”*

  • APIs are frequently used to facilitate data sharing and digital object reuse

  • Examples of systems and tools that use APIs to exchange data and/or digital objects: ARTstor, Blackboard, CONTENTdm, Facebook, Flickr, MDID, Photobucket, Photoshop Express, Picasa, PowerPoint, and YouTube

* http://en.wikipedia.org/wiki/Application_programming_interface


Data exchange overview
Data Exchange Overview

Data Exchange Overview



Metadata1
Metadata

  • Metadata is “data about data” and is used to structure and describe any kind of content, including of course digital objects

  • Metadata helps us discover, evaluate, retrieve and use digital objects

  • Metadata standards are an important component in data exchange

    • Result sets must be structured and understandable

    • Metadata standards provide structure and meaning to content

  • Many data exchange mechanisms support multiple metadata standards and most support Dublin Core


The challenge1
The Challenge

  • It is very unlikely that two unrelated repositories will catalog data in exactly the same way using the same metadata standards

  • How does one system or tool process information from another system? How does the user search a remote collection in a targeted

  • The solution: A simple, generic metadata standard that supports a wide variety of cross-disciplinary resources


Dublin core overview
Dublin Core Overview

  • “The Dublin Core metadata element set is a standard for cross-domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find”*

  • The Simple DC consists of 15 optional and repeatable DC elements:

Title

Creator

Subject

Description

Publisher

Contributor

Date

Type

Format

Identifier

Source

Language

Relation

Coverage

Rights

  • The Qualified DC extends or refines DC elements in order to narrow the meaning of the DC elements

* http://en.wikipedia.org/wiki/Dublin_Core


Dublin core principles
Dublin Core Principles

  • The One-to-One Principle: “Create one metadata description for one and only one resource”*

    • For example: Do not describe a JPEG of the Mona Lisa as if it were the original painting. Do not confuse the creator of the JPEG with the painter of the original

  • The Dumb-down Principle: Translate qualified DC to simple DC so that a user can ignore qualifiers and treat a description as if it were unqualified

  • Appropriate values: Construct your metadata in such a way that it makes sense to a user outside of your context (e.g. to someone who is not a curator or art historian)

* Marty Kurth. Basic DC Semantics. http://dublincore.org/resources/training/dc-2006/Tutorial1.pdf (p. 20)


Sample dublin core record
Sample Dublin Core Record*

* http://www.pictureaustralia.org/schemas/pa/pa-slvic-example.xml


Crosswalks
Crosswalks

  • A crosswalk “maps the elements in one metadata scheme to the equivalent elements in another scheme”*

  • The challenge: Many if not most systems and repositories do not use Dublin Core to catalog data

    • Digital image repositories often use the VRA Core or proprietary schemas to catalog image records

    • MDID allows the curator to define a custom catalog structure for each collection

    • ARTstor aggregates data from many sources and thus is not able to use standardized metadata

  • The solution: Because many data exchange mechanisms support Dublin Core (e.g. SRU/SRW, OAI-PMH, OKI OSID, MDID API), use a crosswalk to map your data to Dublin Core

* http://en.wikipedia.org/wiki/Crosswalk_(metadata)



Dublin core vra crosswalk
Dublin Core/VRA Crosswalk*

*Claremont Colleges Digital Library (http://ccdl.libraries.claremont.edu/inside/CCDLmetadata.pdf)


Unicode encodings character sets

Unicode, Encodings, Character Sets



History
History

  • ASCII represents characters with numbers between 32 and 127 (7 bits)

  • Computers use 8 bits, so there is room for an additional 128 characters


More history
More History

  • Problem: everybody used the additional 128 characters for something different

  • Solution: ANSI standard for code pages

  • Examples

    • Israel: code page 862

    • Greece: code page 737

  • All code pages still use the same characters below 128, but different ones above


More history1
More History

  • Problem: Only one code page possible at any time

  • Problem: Some languages use more different characters than fit in 8 bit

  • Solution: Unicode


Unicode
Unicode

  • Each letter or symbol is represented by a code point (a number)

  • No limits on numbers

  • Examples

    • A is U+0041

    • ڬ is U+06AC

    • ♫ is U+266B


Encodings
Encodings

  • There are many different ways to store these code points (numbers) in a file

  • UCS-2/UTF-16

    • Stores every character in two bytes

    • Problem: byte order

  • UTF-8

    • Stores every character in one to six bytes

    • Compatible with ASCII/ANSI for first 128 characters



Encodings1
Encodings

  • If the expected encoding does not match the actual encoding, characters will be interpreted incorrectly or not at all

  • Make sure to save your content in an encoding that is understood by the program you want to read it with

  • UTF-8 generally is the best option



Unicode1
Unicode

  • Reference:

    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)by Joel Spolskyhttp://www.joelonsoftware.com/articles/Unicode.html



Many options
Many options

  • Databases

    • Microsoft Access

    • FileMaker Pro

  • Spreadsheets

    • Microsoft Excel

  • Text based

    • CSV (comma separated values)

    • TSV (tab separated values)

    • XML (structured text)


Options for exchanging data
Options for exchanging data

  • Usually text based

  • CSV and TSV are functionally equivalent

    • Simple spreadsheet format

    • Multi-valued fields or hierarchies are difficult to represent

  • XML

    • Easily handles multi-valued fields or hierarchies

    • Structure must be understood



Criteria for picking an image format
Criteria for picking an image format

  • Lossy vs. Lossless

  • Compressed vs. Uncompressed

  • Archival vs. Presentation

  • Offline vs. Online

  • Exchangeable vs. Proprietary

  • Common vs. Uncommon


Lossy vs lossless
Lossy vs. Lossless

  • Tradeoff between file size and image quality

  • Archival images should always be lossless

  • Image quality with lossy formats will get progressively worse with editing or processing

  • Most formats are lossless

  • JPEG is lossy


Compressed vs uncompressed
Compressed vs. Uncompressed

  • Tradeoff between file size and processing time

  • Processing time is usually no longer an issue

  • Compressed does not mean lossy

  • Examples:

    • TIFF can be compressed or uncompressed

    • JPEG is compressed and lossy

    • PNG is compressed and lossless


Archival vs presentation
Archival vs. Presentation

  • Archival images should always be lossless and highest possible quality and resolution

  • Presentation images need to be small for quick network transfer

  • Presentation image quality and size can be lower depending on presentation equipment

  • As equipment gets better, better presentation images can be derived from archival images


Offline vs online
Offline vs. Online

  • Online images need to be smaller and in a format that is supported by a web browser

  • Offline images can be larger and in formats that may require specific software to view or process

  • Example

    • Adobe Photoshop files are great for post-processing, but cannot be delivered in the browser


Exchangeable vs proprietary
Exchangeable vs. Proprietary

  • Proprietary image formats may be harder to exchange with others

  • Examples

    • PSD files require Adobe Photoshop

    • MrSID files require proprietary plug-ins

    • TIFF or JPEG are universally understood


Common vs uncommon
Common vs. Uncommon

  • Some file formats or format options are more popular and widely supported than others

  • Examples

    • Transparent GIFs work in any browser

    • Transparent PNGs only work in newer browsers

    • 8-bit RGB TIFFs work in almost every program

    • 16-bit RGB TIFFs or grayscale TIFFs do not



Reformatting data in excel
Reformatting Data in Excel

  • Cells

    • A spreadsheet is made up of cells organized in columns and rows, respectively identified by letters (A, B, C, …) and numbers (1, 2, 3, …)

    • A cell is referred to by its column letter and row number, for example “A1” or “D25”

  • Formulas in Excel

    • Always start with an equal sign “=“

    • Adjust their cell references when being moved or copied to other cells


Excel user interface
Excel User Interface

Current cell

Formula in current cell

Cell value

Dragging little square copies cell formula into adjoining cells


Useful excel formulas
Useful Excel Formulas

  • Combine cell values

    • CONCATENATE(cell, cell, …)

      • Or use “&” operator: cell & cell & …

      • Combines cell values

  • Extract parts of a cell value

    • LEFT(cell, length)

    • RIGHT(cell, length)

    • MID(cell, start, length)


Useful excel formulas1
Useful Excel Formulas

  • Find the position of a string in a cell value

    • FIND(text, cell) or FIND(text, cell, start)

  • Length of a cell value

    • LEN(cell)


Examples
Examples

  • Append “.jpg” to a record identifier to create a filename

Enter formula once and then drag it down across all rows


Examples1
Examples

  • Splitting date ranges into start and end date

Grab up to 10 characters


Create a powerpoint slideshow from a set of images
Create a PowerPoint slideshow from a set of images

  • Use the Insert>Photo Album feature

  • Example: MDID can export a slideshow package containing a set of JPEG files


Convert image files between formats
Convert image files between formats

  • For online delivery, images usually have to be in JPEG format

  • Archival images are usually TIFF

  • Need to convert many images at once


Xat image optimizer
XAT Image Optimizer

  • Converts between many image formats

  • Flexible quality and compression settings

  • Available at http://www.xat.com/io/





Imagemagick
ImageMagick

  • Command-line program

    • Harder to use

    • Easier to automate

  • Available at http://www.imagemagick.org/


Adobe photoshop
Adobe Photoshop

  • Can read and write many formats

  • For batch processing, need to create an Action that saves a file in the desired format


Adobe photoshop actions
Adobe Photoshop Actions

New Folder

New Action


Adobe photoshop actions1
Adobe Photoshop Actions

  • To create a “Save As…” action

    • Create a new image or open an existing image

    • Create the new action, recording will start

    • Go to File>Save As...

    • Set the desired options and file format

    • Save the file

    • Stop the recording




Open xml cataloging data files in microsoft excel
Open XML cataloging data files in Microsoft Excel

  • Microsoft Excel can read certain XML files and open them as spreadsheets

  • Allows easier editing and processing

  • Example: File exported from MDID collection


Xml data file
XML data file

  • Simple record structure – no repeated values


Opening xml file in excel
Opening XML file in Excel

  • Open the XML file and select “As an XML table”


Xml data file in excel
XML data file in Excel

  • Spreadsheet can be edited and saved in different formats for further processing or import


Problematic xml data file
Problematic XML data file

  • Repeated fields cause records to be spread over multiple rows



Mdid interoperability update
MDID & Interoperability Update

  • About to release RFP to solicit bids from software companies to begin work on one or more of the following:

    • ARTstor <> MDID

    • Flickr <> MDID

    • Display MDID images and slideshows in Blackboard

    • Download MDID slideshows as PowerPoint presentations

    • Upload PowerPoint presentations into MDID

    • Build an OKI OSID plug-in for MDID

  • We plan on extending grant work through 2009


ad