1 / 20

Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007

Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah. Digital formats. Why do you want your data in digital format? Digital NonDigital Dictionary database notecards in a shoebox under the bed

tahlia
Download Presentation

Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah

  2. Digital formats • Why do you want your data in digital format? Digital NonDigital Dictionary database notecards in a shoebox under the bed • examples of increased functionality of digital formats • even in Word format, you can use 'find' instead of flipping through pages

  3. What are Best Practices and why should I care? • Why follow BP • interoperability/data sharing • protect valuable data from loss (obsolescence) • make sure your data outlives you • Finding out BP: resources • E-MELD (http://www.emeld.org) • OLAC (http://www.language-archives.org/) • DELAMAN (http://www.delaman.org/) • Edata (http://www.endangereddata.org)

  4. Quick and Dirty Best Practice Recommendations Audio • uncompressed, .wav or .aiff, minimum 44.1khz/16bit Text • XML, tagged, with valid DTD • Unicode • indexed to audio Metadata • have some

  5. Getting there from here • I've accepted BP, now what?

  6. Getting there from here • I've accepted BP, now what? • My computer won't read my old Wordstar file. What program can I use?

  7. Getting there from here • I've accepted BP, now what? • My computer won't read my old Wordstar file. What program can I use? • I have a PC, but all of my data was entered on a Mac

  8. Getting there from here • I've accepted BP, now what? • My computer won't read my old Wordstar file. What program can I use? • I have a PC, but all of my data was entered on a Mac • My data is in [insert database name here] which is not supported outside [insert obsolete OS here]. It's fine for me, but others can't use it.

  9. General physical format issues • Analog recordings • cassette • reel-to-reel • wax cylinder • Field notes • Field Notes, Notebooks • Notecards • Annotated descriptive materials • Outdated computer media/drives

  10. Digitizing analog recordings • outsourced • audiotechnical experts • equipment and staff limitations • equipment procurement and maintenance problems • in-house • equipment and space appropriate as part of CAIL's ongoing mission • valuable training for students

  11. Field Notes • Issues • notes may not be in any logical order, even if they are well-catalogued • need to design a digital data structure that represents all of the written information • Options • scanning as images • scanning/OCR to text files • manual data entry

  12. Outdated computer media • General problems • QUIRKY QUOTE NEEDED • Floppy disks • consult an expert • building a system that can read old disks and create modern media such as CD or DVD-RAM

  13. Software Issues • More difficult to diagnose • Usually hand-in-hand w/hardware issues • If your floppy disk is obsolete, the data on it will likely need some updating too • Fast-paced world of software development • Even files from older versions of the same program may not transfer properly to current software

  14. Case Studies • Hypercard • Shoebox 3.0 • Word processors/spreadsheets • MS Word • Excel • WordPerfect • Plain text (.txt) (Not a total pain in the ass)

  15. Hypercard data Floppy > CD > Hypercard data > Hypercard-to-text custom tool (VN) > Word Docs >FMPro Database >Print reference 1) Read the floppy 2) Analyze the data (what format is it in, what kind of data is it?) 3) Get the data into a transportable format 4) Structure the data 5) Use the data

  16. Shoebox Shoebox 3.0 --> Toolbox --> XML -->XSLT--> HTML Online Mocho dictionary 1) Figured out the structure of the database (Shoebox 3.0) • ascertain data collector's conventions if possible 2) Migrated to the newest form of the software (Toolbox) 3) Export XML or text version 4) Write XSLT document to create HTML (online/book tutorial) 5) Basic online web version

  17. (Word) transcriptions Word transcriptions > Excel > autoglossing tool > Excel dictionary > XML or presentation format • Interlinear text documents • Tool (VN) to import document to Excel data template • Visual Basic tool (VN) to autogloss morphemes • from Excel dictionary (or other database) • Corpus tool • Export • XML for archival format • XSLT presentation formats

  18. Don't forget Metadata • What is metadata? • Documenting your documents • Why metadata? • saves lots of trouble later • Resources • IMDI • OLAC • Your local archivist

  19. Minimal metadata These are enough to be getting on with, but always follow your archivist's recommendations • Language • Speaker • Time and place of recording • Collector • Transcriber • Software version and revisions • Transcription conventions / abbreviations

  20. (A bit of) What we have learned Save time. Hire a Genius (e.g. Vivian Ngai) • Initiative goes a long way • Knowing your end goal (desired end data format, best practices) makes the intermediate steps more focused • Ask. There are too many people to mention who have answered questions and suggested solutions

More Related