ensuring that digital data last n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Ensuring that digital data last PowerPoint Presentation
Download Presentation
Ensuring that digital data last

Loading in 2 Seconds...

play fullscreen
1 / 19

Ensuring that digital data last - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Ensuring that digital data last. The priority of archival form over working form and presentation form Gary Simons SIL International. A paradox of writing history. The more advanced the writing technology, the less durable the written product.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ensuring that digital data last' - aileen-sargent


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ensuring that digital data last

Ensuring that digital data last

The priority of archival form over working form and presentation form Gary Simons SIL International

Symposium on Best Practice

LSA, Boston, MA

a paradox of writing history
A paradox of writing history
  • The more advanced the writing technology, the less durable the written product.
  • From most durable to least durable:
    • Clay tablets and stone
    • Velum
    • Papyrus
    • Paper
    • Digital word processing

Symposium on Best Practice

LSA, Boston, MA

storage media are ephemeral
Storage media are ephemeral
  • Life expectancy of digital storage media:
    • Magnetic tape: 10 to 20 years
    • CD-R (write once)
      • Manufacturers say: 100 to 200 years
      • Independent lab says: 30 years
    • CD-RW (write many times)
      • Manufacturers say: 25 years

Symposium on Best Practice

LSA, Boston, MA

hardware devices are ephemeral
Hardware devices are ephemeral
  • Removable media on personal computers advance over 25 years:
    • 8-inch floppies
    • 5.25-inch floppies
    • 3.5-inch floppies
    • Zip drives
    • CD-Rs
    • DVD-Rs

Symposium on Best Practice

LSA, Boston, MA

software formats are ephemeral
Software formats are ephemeral
  • Software vendors change file formats and functionality with each version.
  • When we use a proprietary single vendor format, we lose access to the data when the software is obsolete.
  • For instance,
    • Microsoft Word files from the 1980s cannot be read by current versions of Word

Symposium on Best Practice

LSA, Boston, MA

an impending digital dark age
An impending “Digital Dark Age”
  • Future historians may see our present age as another Dark Ages since so much information documenting our current civilization is recorded digitally and will have vanished.
  • If linguists fail to act in time, our digital data records are in danger of dying out before the endangered languages we are seeking to document.

Symposium on Best Practice

LSA, Boston, MA

what s a linguist to do
What’s a linguist to do?
  • Do two things to ensure that digital data endure long into the future:
    • Put the materials into an enduring file format.
    • Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed.

Symposium on Best Practice

LSA, Boston, MA

forms contrasted by function
Forms contrasted by function
  • Working form
    • The form in which information is stored as it is created and edited.
  • Presentation form
    • The form in which information is presented to the public.
  • Archival form
    • The form in which information isstored for access long into the future.

Symposium on Best Practice

LSA, Boston, MA

the problem
The problem
  • Popular working forms (like Microsoft Word or database applications) are not suitable archival forms.
  • Popular presentation forms (like dynamic web pages) are not suitable archival forms.
  • Linguists tend to focus on working form and presentation form; they must look beyond these to create enduring work.

Symposium on Best Practice

LSA, Boston, MA

unacceptable practice
Unacceptable practice
  • The form that is archived is a binary working form that requires a specific piece of software, e.g.,
    • .DOC, .XLS, .PPT, .MDB
    • A format supported by homemade software
  • The information will cease to exist when the required software ceases to work on the hardware in use.

Symposium on Best Practice

LSA, Boston, MA

minimally acceptable practice
Minimally acceptable practice
  • The form that is archived is a presentation form based on an open format supported by multiple vendors, e.g.,
    • HTML, PDF
  • The good news
    • A snapshot of how you presented the information will persist.
  • The bad news
    • It is a dead end format—the information is not repurposeable.

Symposium on Best Practice

LSA, Boston, MA

best practice
Best practice
  • The form that is archived preserves all of the information (including its structure) in such a way that it is portable and repurposeable.
    • Descriptive XML markup
  • An XML archival form is not a dead end:
    • It may be reloaded into a working form.
    • it may regenerate new presentation forms.

Symposium on Best Practice

LSA, Boston, MA

a sample presentation form
A sample presentation form
  • From a dictionary of Sikaiana, Solomon Islands

aha[na] the shell tool used for measuring the spaces between mesh in nets (seu manu, kupena).

ahaa (from PPN *afaa) [n] a cyclone, a tidal wave.

aaha 1. [vt] to open up, to push apart, as in pushing apart branches in order to look through. 2. [vt] to open up a new settlement or start a new garden. 3. [vt] to start, to begin a new project or way of life. Tapa mai a koe ko hano i mua ki aaha te ala o te taina, 'you called upon me to go first (to school) to open the way for my brother (MS)'.

Symposium on Best Practice

LSA, Boston, MA

unacceptable practice1
Unacceptable practice
  • If you archive a .DOC file, this is what future generations will see when they open it:

Symposium on Best Practice

LSA, Boston, MA

minimally acceptable practice1
Minimally acceptable practice
  • If you archive an HTML presentation, this is what future generations will see:

<P><B>aha</B> <I>[na]</I> the shell tool used for measuring the spaces between mesh in nets (<I>seu manu, kupena</I>).</P><P><B> ahaa</B> (from PPN *afaa) <I>[n]</I> a cyclone, a tidal wave.</P><P><B> aaha</B> 1. <I>[vt]</I> to open up, to push apart, as in pushing apart branches in order to look through. 2. <I>[vt]</I> to open up a new settlement or start a new garden. 3. <I>[vt]</I> to start, to begin a new project or way of life. <I>Tapa mai a koe ko hano i mua ki aaha te ala o te taina,</I> 'you called upon me to go first (to school) to open the way for my brother (MS)'. </P>

Symposium on Best Practice

LSA, Boston, MA

best practice1
Best practice
  • If you archive descriptive XML markup, this is what future generations will see:
  • Future generations (though they lack our current working tools) will be able to:
    • See and understand the information
    • Load it into their own working tools
    • Create modern presentation forms

Symposium on Best Practice

LSA, Boston, MA

is xml just one more ephemeral format
Is XML just one more ephemeral format?
  • No! It’s as rock solid as ASCII.
  • ASCII was adopted in 1963; 40 years later it is at the heart of operating sys-tems, email, the web — it won’t change.
  • XML uses ASCII notation to essentially extend ASCII by solving two of its inherent limitations:
    • Via Unicode it encodes text in any language
    • Via tags it encodes the structure of information

Symposium on Best Practice

LSA, Boston, MA

is xml just one more theory
Is XML just one more theory?
  • No! It has become part of the fabric of the global information infrastructure.
  • It’s a family of open standards from the Worldwide Web Consortium.
  • All major vendors (e.g. Microsoft, IBM, Sun, Oracle) have embraced it.
  • Hundreds of small vendors and open-source projects have developed tools.

Symposium on Best Practice

LSA, Boston, MA

what s linguistics to do
What’s linguistics to do?
  • The community needs to recognize the fleeting value of digital presentation forms and embrace archival forms.
    • Grants should require best practice archiving, not just “dissemination”.
    • Reward archival language documentation.
    • Get into league with libraries and archives.
  • Only by taking steps like these can we ensure that our digital data will endure.

Symposium on Best Practice

LSA, Boston, MA