Aparsen metadata for preservation curation and interoperability
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

APARSEN Metadata for preservation, curation and interoperability PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on
  • Presentation posted in: General

APARSEN Metadata for preservation, curation and interoperability. Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and STFC. Digital Preservation. Ensure that digitally encoded information are understandable and usable over the long term

Download Presentation

APARSEN Metadata for preservation, curation and interoperability

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Aparsen metadata for preservation curation and interoperability

APARSENMetadata for preservation, curation and interoperability

Workshop on Research Metadata in Context

7-8 Sept 2010, Nijmegen

David Giaretta

APA and STFC


Digital preservation

Digital Preservation

  • Ensure that digitally encoded information are understandable and usable over the long term

    • Long term could start at just a few years

  • Easy to make claims

    • Difficult to provide proof

  • Reference Model for Open Archival Information System (ISO 14721)

    • The basic standard for work in digital preservation

    • Defines terminology and compliance criteria


Definitions oais

Not just BIT preservation

Definitions (OAIS)

Not just rendering

  • Long Term Preservation:The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term.

  • Long Term:A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future.

Information not just DATA or Documents

Authenticity


Basic concept

Basic concept

  • Digital preservation had been dominated by libraries and (state) archives

  • However there was a focus there on “rendered objects” and

  • Tendency to think data is an “easy” add-on

    HOWEVER

  • Need to deal with DATA – processed to new things, not just rendered

  • Need to follow OAIS – finer grained view

  • Need to test and prove that things work

“metadata”

“CASPAR banned the use of the term metadata unless absolutely necessary”


Aparsen metadata for preservation curation and interoperability

Data…

Level 2 GOME Satellite instrument data


Contains numbers need meaning

Contains numbers – need meaning


To process to this

...to process to this


Or this

...or this


Through complex processing schemes

...through complex processing schemes


Just format

Just Format?

sfqsftfoubujpo jogpsnbujpo svmft

You have a file

JHOVE tells you it is WORD version 7


With some extra information

..with some extra information..

representation information rules

Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters


Examples cont

Examples (cont)

  • “504b0304140000000800f696….”

  • “This is a ZIP file which contains Word files, each of which contains an encoded message which needs the key ‘!D$G^AJU*KI’ to decode it using encryption method SHA7”


Examples cont1

Examples (cont)

  • LaTex file containing an EPS (Encapulated Postscript) version of an image

  • Web page containing Java Applet generating random numbers

  • SWISS-PROT data

  • Foreign Language emails


Xml enough can stare at this and probably understand it

XML enough? – can stare at this and probably understand it

<family>

<father>John</father>

<mother>Mary</mother>

<son>Paul</son>

</family>


But what about this

..but what about this?

<VOTABLE version="1.1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1"

xmlns="http://www.ivoa.net/xml/VOTable/v1.1">

<RESOURCE>

<TABLE name="6dfgs_E7_subset" nrows="875">

<PARAM arraysize="*" datatype="char" name="Original Source" value="http://www-wfau.roe.ac.uk/6dFGS/6dfgs_E7.fld.gz">

<DESCRIPTION>URL of data file used to create this table.</DESCRIPTION>

</PARAM>

<PARAM arraysize="*" datatype="char" name="Comment" value="Cut down 6dfGS dataset for TOPCAT demo usage."/>

<FIELD arraysize="15" datatype="char" name="TARGET">

<DESCRIPTION>Target name</DESCRIPTION>

</FIELD>

<FIELD arraysize="11" datatype="char" name="DEC" unit="DMS">

<DATA>

<FITS>

<STREAM encoding='base64'>

U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm

b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg

ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg

ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv

IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg


Aparsen metadata for preservation curation and interoperability

Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.


Aparsen metadata for preservation curation and interoperability

Figure 8 Some aspects of acousmatic production


Aparsen metadata for preservation curation and interoperability

Complex

Simple

Dynamic

Static

Complex

Simple

Static

Dynamic

Rendered

Rendered

Non-Rendered

Non-Rendered


Information model representation information

Information Model & Representation Information

Information

Object

1+

interpreted

interpreted

using

Data

Representation

1+

using

Object

Information

Physical

Digital

Object

Object

1+

Bit

Sequence

The Information Model is key

Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY

(this knowledge will change over time and region)


Representation information network

Representation Information Network


Modules and dependencies defining the designated community

Modules and Dependencies:defining the Designated Community

FITS FILE

MULTIMEDIA

PERFORMANCE DATA

FITS

DICTIONARY

FITS

STANDARD

DICTIONARY

SPECIFICATION

C3D

DirectX

MAX/MSP

FITS

JAVA s/w

PDF

STANDARD

3D scene

data files

3D motion

data files

motion to music

mapping strategy

XML

SPECIFICATION

PDF

s/w

JAVA VM

UNICODE

SPECIFICATION

README.txt

ENGLISH

LANGUAGE

TEXT EDITOR

WINDOWS XP


Aparsen metadata for preservation curation and interoperability

FITS FILE

DDL DESCRIPTION

FITS STANDARD

FITS DICTIONARY

DDL

SOFTWARE

FITS JAVA SOFTWARE

DICTIONARY SPECIFICATION

DDL

DEFINITION

PDF STANDARD

JAVA VM

PDF SOFTWARE

XML SPECIFICATION

UNICODE SPECIFICATION


Aparsen metadata for preservation curation and interoperability

In principle we could use this, plus the Dictionaries in order to understand the keywords in order to extract the numbers

If we can run this then we can use this in a generic application to extract the numbers

If we cannot run the Java Virtual Machine then we use this source code to re-write in another programming language such as C

If we can run this then we can run the Java software to extract the numbers

If we cannot run this then we can use an emulator or use its RepInfo to re-create a Java VM

If we cannot run the DDL software then we can look at the DDL definition and write some software to extract the numbers


Aparsen metadata for preservation curation and interoperability

  • Rep

  • Info

  • Virtualisation

/DISCIPLINE


Virtualisation

Virtualisation


Aparsen metadata for preservation curation and interoperability

Height

Width

Bits per Pixel

2-D array

Height

Width

Bits per Pixel

Co-ordinate system

Time

2-D image

Height

Width

Bits per Pixel

Astronomical co-ordinate system

Time – EPOCH

Bandpass

2-D astronomical image


Aparsen metadata for preservation curation and interoperability

Number of columns

Names of columns

Number of rows

Value in cell at any row, column

General Table

Time series

Science data table

Number of columns

Names of columns

Number of rows

Value in cell at any row, column

Time corresponding to any row

Number of columns

Names of columns

Number of rows

Value in cell at any row, column

Type of column value

Column “metadata”

Table “metadata”


Aparsen metadata for preservation curation and interoperability

Root node

Get the Root

Get the number of children for a node

Get child number “i”

Node 1

Node 2

Node 3

Node 4

Node 5

Node 6

Node 6

Node 7

Node 8

Node 9


Aparsen metadata for preservation curation and interoperability

Image

Earth Observation Image

Artistic

Image

Cultural Heritage

Image

Astronomical Image

Optical Astronomical Image

X-ray Astronomical Image


Aparsen metadata for preservation curation and interoperability

described

by

delimited

by

Archival Information Package

Package Description

Packaging Information

derived

from

identifies

Preservation Description

Information

further described by

Content Information


Aparsen metadata for preservation curation and interoperability

Preservation

Description

Information

Access Rights

Information

Reference

Information

Provenance

Information

Context

Information

Fixity

Information


Aparsen metadata for preservation curation and interoperability

has

Representation

Information

Provenance

has


Aparsen metadata for preservation curation and interoperability

Cost sharing

DRM

  • USE DATA

  • Use application to find data in Repository

  • Create DIP with enough RepInfo for the user (via DC profile)

  • Obtain more RepInfo from Registry if necessary

Preservable infrastructure


Aparsen metadata for preservation curation and interoperability

APARSEN

Technical

2000

Spreading excellence

4000

Management

5000

Integration

1000

Economic/Legal

3000

3100: Digital Rights & access management

5100: Financial

management

2100: Preservation Services

4100: External W/S & symposia

1100: Common Vision

2200: Identifiers & citabillity

4200: Formal qualifications

1200: Staff and experience exchange

3200: Cost /benefit data collection and modelling

5200: Technical

co-ord.

2300: Storage solutions

4300: Training courses

1300: Common standards

5300: Evaluate impact of the Network of Excellence

3300: Peer Review & 3rd party Certification

2400: Authenticity & Provenance

4400: Awareness raising

1400: Common testing environments

3400: Brokerage services

2500: Interoperability & intelligibility

4500: Liaison with other stakeholders

3500: Data policies and governance

2600: Annotation,

Reputation & data quality

1500: Internal W/S & symposia

4600: International liaison

2700: Scalability

3600: Business cases

1600: Common tools, software repository and market place

JPA

Spreading excellence

JPA

Research

JPA

Integration


Aparsen metadata for preservation curation and interoperability

Technical

2000

Economic/Legal

3000

2100: Preservation Services

3100: Digital Rights & access management

2200: Identifiers & citabillity

3200: Cost /benefit data collection and modelling

2300: Storage solutions

2400: Authenticity & Provenance

3400: Brokerage services

2500: Interoperability & intelligibility

3300: Peer Review & 3rd party Certification

2600: Annotation,

Reputation & data quality

3500: Data policies and governance

2700: Scalability

3600: Business cases

JPA

Research


Aparsen metadata for preservation curation and interoperability

Persistent ID resolver

RepInfo Registry

Authenticity tools

Processing Context

Certification

Orchestration/Brokering

Knowledge Gap Manager

Persistent ID resolver

RepInfo Registry

Authenticity tools

Processing Context

Certification

Orchestration/Brokering

Knowledge Gap Manager

Storage

Compute Resource

Local Authentication

Local Authorisation

WAN

LAN

Router

Switch

Cable

Interconnects

Gateways

Management

WAN

LAN

Router

Switch

Cable

Translators

Thesauri

Cross-references

Discipline repositories

Storage

Compute Resource

Local Authentication

Local Authorisation

Resource Registries

Process ID

Scheduler

Shibboleth

Repositories

Users

Automated systems

Repositories

Users

Automated systems

Discipline repositories

Translators

Thesauri

Cross-references

FUTURE

  • Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved

  • Non-maintainability of essential hardware, software or support environment may make the information inaccessible

  • The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

  • Access and use restrictions may not be respected in the future

  • Loss of ability to identify the location of data

  • The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future

  • The ones we trust to look after the digital holdings may let us down


Links

Links

  • CASPAR – http://www.casparpreserves.eu

  • CASPAR Source code - http://sourceforge.net/projects/digitalpreserve/

  • OAIS Reference Model -http://public.ccsds.org/publications/archive/650x0b1.pdf

  • and the updated draft is available from http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx

  • CASPAR Validation report http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar-validation-evaluation-report/at_download/file

  • PARSE.Insight:

    • www.parse-insight.eu

  • Alliance for Permanent Access:

    • www.alliancepermanentaccess.eu

  • Digital Curation Centre:

    • www.dcc.ac.uk


Aparsen metadata for preservation curation and interoperability

END


  • Login